From 1aaddd6ca2c25d91fa04ae6a1626b385299d7d52 Mon Sep 17 00:00:00 2001 From: Soheila Zangeneh Date: Mon, 29 Aug 2022 11:19:14 -0400 Subject: [PATCH 01/34] Add automl regression model eval first draft --- ..._tabular_regression_model evaluation.ipynb | 1273 +++++++++++++++++ ..._tabular_regression_model_evaluation.ipynb | 1273 +++++++++++++++++ 2 files changed, 2546 insertions(+) create mode 100644 notebooks/community/model_evaluation/automl_tabular_regression_model evaluation.ipynb create mode 100644 notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model evaluation.ipynb new file mode 100644 index 000000000..9eba72c37 --- /dev/null +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model evaluation.ipynb @@ -0,0 +1,1273 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI regression model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML` \n", + "- Vertex AI `Pipeline`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Import a pre-trained `AutoML model resource` into the pipeline\n", + "- Run a `batch prediction` job\n", + "- Evaulate the AutoML model using the `regression evluation component`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "{TODO: Include a paragraph with Dataset information and where to obtain it.} \n", + "\n", + "{TODO: Make sure the dataset is accessible to the public. **Googlers**: Add your dataset to the [public samples bucket](http://goto/cloudsamples#sample-storage-bucket) within gs://cloud-samples-data/vertex-ai, if it doesn't already exist there.}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "\n", + "{TODO: Update the list of billable products that your tutorial uses.}\n", + "\n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "{TODO: Include links to pricing documentation for each product you listed above.}\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook. You can skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. \n", + "\n", + "{TODO: Suggest using the latest major GA version of each package; i.e., --upgrade}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17\n", + "! pip3 install $USER kfp google-cloud-pipeline-components --upgrade -q\n", + "# TODO: Add remaining package installs here" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com). {TODO: Update the APIs needed for your tutorial. Edit the API names, and update the link to append the API IDs, separating each one with a comma. For example, container.googleapis.com,cloudbuild.googleapis.com}\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Compute service account storage object creator and viewer permissions!!!**\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n", + "\n", + "{TODO: replace the `TIMESTAMP` with `UUID` in official notebooks}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "\n", + "{TODO: Adjust wording in the first paragraph to fit your use case - explain how your tutorial uses the Cloud Storage bucket. The example below shows how Vertex AI uses the bucket for training.}\n", + "\n", + "When you submit a training job using the Cloud SDK, you upload a Python package\n", + "containing your training code to a Cloud Storage bucket. Vertex AI runs\n", + "the code from this package. In this tutorial, Vertex AI also saves the\n", + "trained model that results from your job in the same bucket. Using this model artifact, you can then\n", + "create Vertex AI model and endpoint resources in order to serve\n", + "online predictions.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all\n", + "Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"model-eval-testaip-7yeib57l\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account {TODO: Include these cells if the notebook specifies a service account}\n", + "\n", + "{TODO: What uses service account in the notebook; e.g., You use a service account to create Vertex AI Pipeline jobs.}. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for {TODO; e.g., Vertex AI Pipelines}\n", + "\n", + "Run the following commands to grant your service account access to {TODO; i.e., read and write pipeline artifacts} in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import google.cloud.aiplatform as aiplatform\n", + "from google_cloud_pipeline_components.experimental.evaluation import (\n", + " ModelEvaluationRegressionOp, \n", + " ModelEvaluationFeatureAttributionOp,\n", + " ModelImportEvaluationOp)\n", + "from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + "from google_cloud_pipeline_components.experimental import evaluation\n", + "import kfp" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Dataset\n", + "\n", + "We use this bigquery table for training" + ], + "metadata": { + "id": "BiVlyW5OUnjK" + } + }, + { + "cell_type": "code", + "source": [ + "# Define BigQuery table to be used for training\n", + "\n", + "BQ_TABLE = \"bigquery-public-data.samples.gsod\"" + ], + "metadata": { + "id": "bViYfWfpVAiF" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "\n", + "from google.cloud import bigquery\n", + "\n", + "# Create client in default region\n", + "bq_client = bigquery.Client(\n", + " project=PROJECT_ID,\n", + " credentials=aiplatform.initializer.global_config.credentials,\n", + ")\n" + ], + "metadata": { + "id": "20S9En09X0PY" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Create test dataset in default region\n", + "PREDICTION_INPUT_DATASET_ID = f\"gsod_prediction_{UUID}\"\n", + "PREDICTION_INPUT_TABLE_ID = f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}.prediction\"" + ], + "metadata": { + "id": "KvRQNKhEmGHs" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "bq_dataset = bigquery.Dataset(f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}\")\n", + "bq_dataset = bq_client.create_dataset(bq_dataset)\n", + "print(f\"Created dataset {bq_client.project}.{bq_dataset.dataset_id}\")" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "cw3n0ftZYZ_h", + "outputId": "143f5d75-c324-4ebc-efba-12b51fdc47c9" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Created dataset model-eval-test.gsod_prediction_7yeib57l\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "# Select a subset of the original dataset for testing\n", + "PREDICTION_SIZE = 10\n", + "query = f\"\"\"\n", + " SELECT *\n", + " FROM {BQ_TABLE}\n", + " LIMIT {PREDICTION_SIZE} \n", + " \"\"\"\n", + "\n", + "job_config = bigquery.QueryJobConfig(destination=PREDICTION_INPUT_TABLE_ID)\n", + "\n", + "query_job = bq_client.query(query, job_config=job_config) # API request\n", + "query_job.result() # Waits for query to finish" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "XG0U5lmfYrNT", + "outputId": "03dccbd9-7405-48c1-f9b6-2bec0872df37" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": {}, + "execution_count": 83 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Create the dataset" + ], + "metadata": { + "id": "4XpPTSFoYCsT" + } + }, + { + "cell_type": "code", + "source": [ + "DATASET_NAME = \"Pen\"+UUID" + ], + "metadata": { + "id": "gHOUMfskYIpO" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=DATASET_NAME,\n", + " bq_source=[f\"bq://{BQ_TABLE}\"],\n", + ")\n", + "\n", + "COLUMN_SPECS = {\n", + " \"year\": \"auto\",\n", + " \"month\": \"auto\",\n", + " \"day\": \"auto\",\n", + "}\n", + "\n", + "label_column = \"mean_temp\"\n", + "\n", + "dataset = aiplatform.TabularDataset('projects/1058599485685/locations/us-central1/datasets/5507798990181105664')\n", + "\n", + "print(dataset.resource_name)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IyXwOcbVYBd1", + "outputId": "cad8909b-8285-403f-d8ec-c304a577fb23" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "projects/1058599485685/locations/us-central1/datasets/5507798990181105664\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Train the model" + ], + "metadata": { + "id": "A-QQkeUnq8Xt" + } + }, + { + "cell_type": "code", + "source": [ + "MODEL_NAME = \"pen\" + UUID" + ], + "metadata": { + "id": "Bxn6ATUXrET6" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "training_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name= \"pen_training_job\",\n", + " optimization_prediction_type=\"regression\",\n", + " optimization_objective=\"minimize-rmse\"\n", + ")\n", + "\n", + "print(training_job)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "3l691PEMZFdA", + "outputId": "f0757841-1901-408f-a587-28b6f3a4181d" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "model = training_job.run(\n", + " dataset=dataset,\n", + " model_display_name=MODEL_NAME,\n", + " training_fraction_split=0.6,\n", + " validation_fraction_split=0.2,\n", + " test_fraction_split=0.2,\n", + " budget_milli_node_hours=1,\n", + " disable_early_stopping=False,\n", + " target_column=label_column,\n", + ")" + ], + "metadata": { + "id": "IIfvPCGYyFCT" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Get Model" + ], + "metadata": { + "id": "RYjWdtscAmFP" + } + }, + { + "cell_type": "code", + "source": [ + "# model = aiplatform.Model(f'/projects/{PROJECT_ID}/locations/{REGION}/models/2036871678734106624')\n", + "MODEL_ID = '2241222511826042880'\n", + "model = aiplatform.Model(model_name=MODEL_ID)" + ], + "metadata": { + "id": "4Dkk4P_TAlkr" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### List model eval metrics" + ], + "metadata": { + "id": "rYirKB_9yaa0" + } + }, + { + "cell_type": "code", + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ], + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Model Evaluation" + ], + "metadata": { + "id": "ce6beLsXASnK" + } + }, + { + "cell_type": "code", + "source": [ + "@kfp.dsl.pipeline(\n", + " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " # batch_predict_gcs_source_uris: list,\n", + " bigquery_source_input_uri: str,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = 'jsonl',\n", + " batch_predict_machine_type: str = 'n1-standard-16',\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_machine_type: str = 'n1-standard-4',\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_disk_size_gb: int = 50,\n", + " dataflow_service_account: str = '',\n", + " dataflow_subnetwork: str = '',\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = ''):\n", + " \n", + " # get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + " get_model_task = evaluation.GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Batch Prediction.\n", + " batch_predict_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs['model'],\n", + " job_display_name='model-registry-batch-predict-evaluatio',\n", + " bigquery_source_input_uri=bigquery_source_input_uri,\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " encryption_spec_key_name=encryption_spec_key_name)\n", + "\n", + " # # # Run the Batch Explain process (sampler -> batch explanation).\n", + " # data_sampler_task = EvaluationDataSamplerOp(\n", + " # project=project,\n", + " # location=location,\n", + " # root_dir=root_dir,\n", + " # gcs_source_uris=batch_predict_gcs_source_uris,\n", + " # instances_format=batch_predict_instances_format,\n", + " # sample_size=batch_predict_explanation_data_sample_size)\n", + " # batch_explain_task = ModelBatchPredictOp(\n", + " # project=project,\n", + " # location=location,\n", + " # model=get_model_task.outputs['model'],\n", + " # job_display_name='model-registry-batch-explain-evaluation-{{$.pipeline_job_uuid}}-{{$.pipeline_task_uuid}}',\n", + " # gcs_source_uris=data_sampler_task.outputs['gcs_output_directory'],\n", + " # instances_format=batch_predict_instances_format,\n", + " # predictions_format=batch_predict_predictions_format,\n", + " # gcs_destination_output_uri_prefix=root_dir,\n", + " # generate_explanation=True,\n", + " # explanation_parameters=batch_predict_explanation_parameters,\n", + " # explanation_metadata=batch_predict_explanation_metadata,\n", + " # machine_type=batch_predict_machine_type,\n", + " # starting_replica_count=batch_predict_starting_replica_count,\n", + " # max_replica_count=batch_predict_max_replica_count,\n", + " # encryption_spec_key_name=encryption_spec_key_name)\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_format=batch_predict_predictions_format,\n", + " predictions_gcs_source=batch_predict_task\n", + " .outputs['gcs_output_directory'],\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name)\n", + " # feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " # project=project,\n", + " # location=location,\n", + " # root_dir=root_dir,\n", + " # predictions_format='jsonl',\n", + " # predictions_gcs_source=batch_explain_task\n", + " # .outputs['gcs_output_directory'],\n", + " # dataflow_machine_type=dataflow_machine_type,\n", + " # dataflow_max_workers_num=dataflow_max_num_workers,\n", + " # dataflow_disk_size=dataflow_disk_size_gb,\n", + " # dataflow_service_account=dataflow_service_account,\n", + " # dataflow_subnetwork=dataflow_subnetwork,\n", + " # dataflow_use_public_ips=dataflow_use_public_ips,\n", + " # encryption_spec_key_name=encryption_spec_key_name)\n", + " ModelImportEvaluationOp(\n", + " regression_metrics=eval_task.outputs['evaluation_metrics'],\n", + " # feature_attributions=feature_attribution_task\n", + " # .outputs['feature_attributions'],\n", + " model=get_model_task.outputs['model'],\n", + " dataset_type=batch_predict_instances_format)" + ], + "metadata": { + "id": "ktMsqtibAUzz" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "from kfp.v2 import compiler # noqa: F811\n", + "\n", + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" + ], + "metadata": { + "id": "NOvOMTEgCVcW", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "e9918c90-562c-4c1a-ce37-ffce17cb0af7" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.7/dist-packages/kfp/v2/compiler/compiler.py:1295: FutureWarning: APIs imported from the v1 namespace (e.g. kfp.dsl, kfp.components, etc) will not be supported by the v2 compiler since v2.0.0\n", + " category=FutureWarning,\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pen{UUID}\"\n", + "parameters = {\n", + " 'project':PROJECT_ID,\n", + " 'location':REGION,\n", + " 'root_dir':PIPELINE_ROOT,\n", + " 'prediction_type':'regression',\n", + " 'model_name':f'projects/{PROJECT_ID}/locations/{REGION}/models/{MODEL_ID}',\n", + " 'target_column_name':label_column,\n", + " 'bigquery_source_input_uri':f\"bq://{PREDICTION_INPUT_TABLE_ID}\",\n", + " 'batch_predict_instances_format':'bigquery',\n", + " }" + ], + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "DISPLAY_NAME = \"pen\" + UUID\n", + "\n", + "job = aiplatform.PipelineJob(\n", + " display_name=DISPLAY_NAME,\n", + " template_path=\"tabular_regression_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run()\n", + "\n", + "# ! rm tabular_regression_pipeline.json" + ], + "metadata": { + "id": "pdHib_yUEuEk" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ], + "metadata": { + "id": "mKRTDi8ioXBY" + } + }, + { + "cell_type": "markdown", + "source": [ + "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" + ], + "metadata": { + "id": "U2zocUvk2YVs" + } + }, + { + "cell_type": "markdown", + "source": [ + "### Molde Evaluation Results" + ], + "metadata": { + "id": "XcKaONSsGNC4" + } + }, + { + "cell_type": "code", + "source": [ + "# _________NOTE_________:\n", + "#this is a sample code from eval team... need to be degbugged or replaced with a \n", + "# better appraoch\n", + "\n", + "from google.cloud import aiplatform_v1\n", + "\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " if ((\"model-evaluation\" in task.task_name) and\n", + " (\"model-evaluation-import\" not in task.task_name) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED)):\n", + " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ], + "metadata": { + "id": "mtHA8rhGGQv3" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial:\n", + "\n", + "{TODO: Include commands to delete individual resources below}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete endpoint resource\n", + "! gcloud ai endpoints delete $ENDPOINT_NAME --quiet --region $REGION\n", + "\n", + "# Delete model resource\n", + "! gcloud ai models delete $MODEL_NAME --quiet\n", + "\n", + "# Delete Cloud Storage objects that were created\n", + "! gsutil -m rm -r $JOB_DIR\n", + "\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "community-automl-regression.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb new file mode 100644 index 000000000..9eba72c37 --- /dev/null +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -0,0 +1,1273 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI regression model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML` \n", + "- Vertex AI `Pipeline`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Import a pre-trained `AutoML model resource` into the pipeline\n", + "- Run a `batch prediction` job\n", + "- Evaulate the AutoML model using the `regression evluation component`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "{TODO: Include a paragraph with Dataset information and where to obtain it.} \n", + "\n", + "{TODO: Make sure the dataset is accessible to the public. **Googlers**: Add your dataset to the [public samples bucket](http://goto/cloudsamples#sample-storage-bucket) within gs://cloud-samples-data/vertex-ai, if it doesn't already exist there.}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "\n", + "{TODO: Update the list of billable products that your tutorial uses.}\n", + "\n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "{TODO: Include links to pricing documentation for each product you listed above.}\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook. You can skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. \n", + "\n", + "{TODO: Suggest using the latest major GA version of each package; i.e., --upgrade}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17\n", + "! pip3 install $USER kfp google-cloud-pipeline-components --upgrade -q\n", + "# TODO: Add remaining package installs here" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com). {TODO: Update the APIs needed for your tutorial. Edit the API names, and update the link to append the API IDs, separating each one with a comma. For example, container.googleapis.com,cloudbuild.googleapis.com}\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Compute service account storage object creator and viewer permissions!!!**\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n", + "\n", + "{TODO: replace the `TIMESTAMP` with `UUID` in official notebooks}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "\n", + "{TODO: Adjust wording in the first paragraph to fit your use case - explain how your tutorial uses the Cloud Storage bucket. The example below shows how Vertex AI uses the bucket for training.}\n", + "\n", + "When you submit a training job using the Cloud SDK, you upload a Python package\n", + "containing your training code to a Cloud Storage bucket. Vertex AI runs\n", + "the code from this package. In this tutorial, Vertex AI also saves the\n", + "trained model that results from your job in the same bucket. Using this model artifact, you can then\n", + "create Vertex AI model and endpoint resources in order to serve\n", + "online predictions.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all\n", + "Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"model-eval-testaip-7yeib57l\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account {TODO: Include these cells if the notebook specifies a service account}\n", + "\n", + "{TODO: What uses service account in the notebook; e.g., You use a service account to create Vertex AI Pipeline jobs.}. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for {TODO; e.g., Vertex AI Pipelines}\n", + "\n", + "Run the following commands to grant your service account access to {TODO; i.e., read and write pipeline artifacts} in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import google.cloud.aiplatform as aiplatform\n", + "from google_cloud_pipeline_components.experimental.evaluation import (\n", + " ModelEvaluationRegressionOp, \n", + " ModelEvaluationFeatureAttributionOp,\n", + " ModelImportEvaluationOp)\n", + "from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + "from google_cloud_pipeline_components.experimental import evaluation\n", + "import kfp" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Dataset\n", + "\n", + "We use this bigquery table for training" + ], + "metadata": { + "id": "BiVlyW5OUnjK" + } + }, + { + "cell_type": "code", + "source": [ + "# Define BigQuery table to be used for training\n", + "\n", + "BQ_TABLE = \"bigquery-public-data.samples.gsod\"" + ], + "metadata": { + "id": "bViYfWfpVAiF" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "\n", + "from google.cloud import bigquery\n", + "\n", + "# Create client in default region\n", + "bq_client = bigquery.Client(\n", + " project=PROJECT_ID,\n", + " credentials=aiplatform.initializer.global_config.credentials,\n", + ")\n" + ], + "metadata": { + "id": "20S9En09X0PY" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Create test dataset in default region\n", + "PREDICTION_INPUT_DATASET_ID = f\"gsod_prediction_{UUID}\"\n", + "PREDICTION_INPUT_TABLE_ID = f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}.prediction\"" + ], + "metadata": { + "id": "KvRQNKhEmGHs" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "bq_dataset = bigquery.Dataset(f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}\")\n", + "bq_dataset = bq_client.create_dataset(bq_dataset)\n", + "print(f\"Created dataset {bq_client.project}.{bq_dataset.dataset_id}\")" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "cw3n0ftZYZ_h", + "outputId": "143f5d75-c324-4ebc-efba-12b51fdc47c9" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Created dataset model-eval-test.gsod_prediction_7yeib57l\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "# Select a subset of the original dataset for testing\n", + "PREDICTION_SIZE = 10\n", + "query = f\"\"\"\n", + " SELECT *\n", + " FROM {BQ_TABLE}\n", + " LIMIT {PREDICTION_SIZE} \n", + " \"\"\"\n", + "\n", + "job_config = bigquery.QueryJobConfig(destination=PREDICTION_INPUT_TABLE_ID)\n", + "\n", + "query_job = bq_client.query(query, job_config=job_config) # API request\n", + "query_job.result() # Waits for query to finish" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "XG0U5lmfYrNT", + "outputId": "03dccbd9-7405-48c1-f9b6-2bec0872df37" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": {}, + "execution_count": 83 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Create the dataset" + ], + "metadata": { + "id": "4XpPTSFoYCsT" + } + }, + { + "cell_type": "code", + "source": [ + "DATASET_NAME = \"Pen\"+UUID" + ], + "metadata": { + "id": "gHOUMfskYIpO" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=DATASET_NAME,\n", + " bq_source=[f\"bq://{BQ_TABLE}\"],\n", + ")\n", + "\n", + "COLUMN_SPECS = {\n", + " \"year\": \"auto\",\n", + " \"month\": \"auto\",\n", + " \"day\": \"auto\",\n", + "}\n", + "\n", + "label_column = \"mean_temp\"\n", + "\n", + "dataset = aiplatform.TabularDataset('projects/1058599485685/locations/us-central1/datasets/5507798990181105664')\n", + "\n", + "print(dataset.resource_name)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IyXwOcbVYBd1", + "outputId": "cad8909b-8285-403f-d8ec-c304a577fb23" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "projects/1058599485685/locations/us-central1/datasets/5507798990181105664\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Train the model" + ], + "metadata": { + "id": "A-QQkeUnq8Xt" + } + }, + { + "cell_type": "code", + "source": [ + "MODEL_NAME = \"pen\" + UUID" + ], + "metadata": { + "id": "Bxn6ATUXrET6" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "training_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name= \"pen_training_job\",\n", + " optimization_prediction_type=\"regression\",\n", + " optimization_objective=\"minimize-rmse\"\n", + ")\n", + "\n", + "print(training_job)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "3l691PEMZFdA", + "outputId": "f0757841-1901-408f-a587-28b6f3a4181d" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "model = training_job.run(\n", + " dataset=dataset,\n", + " model_display_name=MODEL_NAME,\n", + " training_fraction_split=0.6,\n", + " validation_fraction_split=0.2,\n", + " test_fraction_split=0.2,\n", + " budget_milli_node_hours=1,\n", + " disable_early_stopping=False,\n", + " target_column=label_column,\n", + ")" + ], + "metadata": { + "id": "IIfvPCGYyFCT" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Get Model" + ], + "metadata": { + "id": "RYjWdtscAmFP" + } + }, + { + "cell_type": "code", + "source": [ + "# model = aiplatform.Model(f'/projects/{PROJECT_ID}/locations/{REGION}/models/2036871678734106624')\n", + "MODEL_ID = '2241222511826042880'\n", + "model = aiplatform.Model(model_name=MODEL_ID)" + ], + "metadata": { + "id": "4Dkk4P_TAlkr" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### List model eval metrics" + ], + "metadata": { + "id": "rYirKB_9yaa0" + } + }, + { + "cell_type": "code", + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ], + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Model Evaluation" + ], + "metadata": { + "id": "ce6beLsXASnK" + } + }, + { + "cell_type": "code", + "source": [ + "@kfp.dsl.pipeline(\n", + " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " # batch_predict_gcs_source_uris: list,\n", + " bigquery_source_input_uri: str,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = 'jsonl',\n", + " batch_predict_machine_type: str = 'n1-standard-16',\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_machine_type: str = 'n1-standard-4',\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_disk_size_gb: int = 50,\n", + " dataflow_service_account: str = '',\n", + " dataflow_subnetwork: str = '',\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = ''):\n", + " \n", + " # get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + " get_model_task = evaluation.GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Batch Prediction.\n", + " batch_predict_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs['model'],\n", + " job_display_name='model-registry-batch-predict-evaluatio',\n", + " bigquery_source_input_uri=bigquery_source_input_uri,\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " encryption_spec_key_name=encryption_spec_key_name)\n", + "\n", + " # # # Run the Batch Explain process (sampler -> batch explanation).\n", + " # data_sampler_task = EvaluationDataSamplerOp(\n", + " # project=project,\n", + " # location=location,\n", + " # root_dir=root_dir,\n", + " # gcs_source_uris=batch_predict_gcs_source_uris,\n", + " # instances_format=batch_predict_instances_format,\n", + " # sample_size=batch_predict_explanation_data_sample_size)\n", + " # batch_explain_task = ModelBatchPredictOp(\n", + " # project=project,\n", + " # location=location,\n", + " # model=get_model_task.outputs['model'],\n", + " # job_display_name='model-registry-batch-explain-evaluation-{{$.pipeline_job_uuid}}-{{$.pipeline_task_uuid}}',\n", + " # gcs_source_uris=data_sampler_task.outputs['gcs_output_directory'],\n", + " # instances_format=batch_predict_instances_format,\n", + " # predictions_format=batch_predict_predictions_format,\n", + " # gcs_destination_output_uri_prefix=root_dir,\n", + " # generate_explanation=True,\n", + " # explanation_parameters=batch_predict_explanation_parameters,\n", + " # explanation_metadata=batch_predict_explanation_metadata,\n", + " # machine_type=batch_predict_machine_type,\n", + " # starting_replica_count=batch_predict_starting_replica_count,\n", + " # max_replica_count=batch_predict_max_replica_count,\n", + " # encryption_spec_key_name=encryption_spec_key_name)\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_format=batch_predict_predictions_format,\n", + " predictions_gcs_source=batch_predict_task\n", + " .outputs['gcs_output_directory'],\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name)\n", + " # feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " # project=project,\n", + " # location=location,\n", + " # root_dir=root_dir,\n", + " # predictions_format='jsonl',\n", + " # predictions_gcs_source=batch_explain_task\n", + " # .outputs['gcs_output_directory'],\n", + " # dataflow_machine_type=dataflow_machine_type,\n", + " # dataflow_max_workers_num=dataflow_max_num_workers,\n", + " # dataflow_disk_size=dataflow_disk_size_gb,\n", + " # dataflow_service_account=dataflow_service_account,\n", + " # dataflow_subnetwork=dataflow_subnetwork,\n", + " # dataflow_use_public_ips=dataflow_use_public_ips,\n", + " # encryption_spec_key_name=encryption_spec_key_name)\n", + " ModelImportEvaluationOp(\n", + " regression_metrics=eval_task.outputs['evaluation_metrics'],\n", + " # feature_attributions=feature_attribution_task\n", + " # .outputs['feature_attributions'],\n", + " model=get_model_task.outputs['model'],\n", + " dataset_type=batch_predict_instances_format)" + ], + "metadata": { + "id": "ktMsqtibAUzz" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "from kfp.v2 import compiler # noqa: F811\n", + "\n", + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" + ], + "metadata": { + "id": "NOvOMTEgCVcW", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "e9918c90-562c-4c1a-ce37-ffce17cb0af7" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.7/dist-packages/kfp/v2/compiler/compiler.py:1295: FutureWarning: APIs imported from the v1 namespace (e.g. kfp.dsl, kfp.components, etc) will not be supported by the v2 compiler since v2.0.0\n", + " category=FutureWarning,\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pen{UUID}\"\n", + "parameters = {\n", + " 'project':PROJECT_ID,\n", + " 'location':REGION,\n", + " 'root_dir':PIPELINE_ROOT,\n", + " 'prediction_type':'regression',\n", + " 'model_name':f'projects/{PROJECT_ID}/locations/{REGION}/models/{MODEL_ID}',\n", + " 'target_column_name':label_column,\n", + " 'bigquery_source_input_uri':f\"bq://{PREDICTION_INPUT_TABLE_ID}\",\n", + " 'batch_predict_instances_format':'bigquery',\n", + " }" + ], + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "DISPLAY_NAME = \"pen\" + UUID\n", + "\n", + "job = aiplatform.PipelineJob(\n", + " display_name=DISPLAY_NAME,\n", + " template_path=\"tabular_regression_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run()\n", + "\n", + "# ! rm tabular_regression_pipeline.json" + ], + "metadata": { + "id": "pdHib_yUEuEk" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ], + "metadata": { + "id": "mKRTDi8ioXBY" + } + }, + { + "cell_type": "markdown", + "source": [ + "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" + ], + "metadata": { + "id": "U2zocUvk2YVs" + } + }, + { + "cell_type": "markdown", + "source": [ + "### Molde Evaluation Results" + ], + "metadata": { + "id": "XcKaONSsGNC4" + } + }, + { + "cell_type": "code", + "source": [ + "# _________NOTE_________:\n", + "#this is a sample code from eval team... need to be degbugged or replaced with a \n", + "# better appraoch\n", + "\n", + "from google.cloud import aiplatform_v1\n", + "\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " if ((\"model-evaluation\" in task.task_name) and\n", + " (\"model-evaluation-import\" not in task.task_name) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED)):\n", + " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ], + "metadata": { + "id": "mtHA8rhGGQv3" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial:\n", + "\n", + "{TODO: Include commands to delete individual resources below}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete endpoint resource\n", + "! gcloud ai endpoints delete $ENDPOINT_NAME --quiet --region $REGION\n", + "\n", + "# Delete model resource\n", + "! gcloud ai models delete $MODEL_NAME --quiet\n", + "\n", + "# Delete Cloud Storage objects that were created\n", + "! gsutil -m rm -r $JOB_DIR\n", + "\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "community-automl-regression.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file From c3a4a7f2856fe18ba82a2c1653172cf555b43c3f Mon Sep 17 00:00:00 2001 From: Soheila Zangeneh Date: Mon, 29 Aug 2022 12:03:26 -0400 Subject: [PATCH 02/34] Remove extra file --- ..._tabular_regression_model evaluation.ipynb | 1273 ----------------- 1 file changed, 1273 deletions(-) delete mode 100644 notebooks/community/model_evaluation/automl_tabular_regression_model evaluation.ipynb diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model evaluation.ipynb deleted file mode 100644 index 9eba72c37..000000000 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model evaluation.ipynb +++ /dev/null @@ -1,1273 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI regression model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `AutoML` \n", - "- Vertex AI `Pipeline`\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Import a pre-trained `AutoML model resource` into the pipeline\n", - "- Run a `batch prediction` job\n", - "- Evaulate the AutoML model using the `regression evluation component`" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "{TODO: Include a paragraph with Dataset information and where to obtain it.} \n", - "\n", - "{TODO: Make sure the dataset is accessible to the public. **Googlers**: Add your dataset to the [public samples bucket](http://goto/cloudsamples#sample-storage-bucket) within gs://cloud-samples-data/vertex-ai, if it doesn't already exist there.}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "\n", - "{TODO: Update the list of billable products that your tutorial uses.}\n", - "\n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "{TODO: Include links to pricing documentation for each product you listed above.}\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook. You can skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. \n", - "\n", - "{TODO: Suggest using the latest major GA version of each package; i.e., --upgrade}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17\n", - "! pip3 install $USER kfp google-cloud-pipeline-components --upgrade -q\n", - "# TODO: Add remaining package installs here" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com). {TODO: Update the APIs needed for your tutorial. Edit the API names, and update the link to append the API IDs, separating each one with a comma. For example, container.googleapis.com,cloudbuild.googleapis.com}\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Compute service account storage object creator and viewer permissions!!!**\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n", - "\n", - "{TODO: replace the `TIMESTAMP` with `UUID` in official notebooks}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "\n", - "{TODO: Adjust wording in the first paragraph to fit your use case - explain how your tutorial uses the Cloud Storage bucket. The example below shows how Vertex AI uses the bucket for training.}\n", - "\n", - "When you submit a training job using the Cloud SDK, you upload a Python package\n", - "containing your training code to a Cloud Storage bucket. Vertex AI runs\n", - "the code from this package. In this tutorial, Vertex AI also saves the\n", - "trained model that results from your job in the same bucket. Using this model artifact, you can then\n", - "create Vertex AI model and endpoint resources in order to serve\n", - "online predictions.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all\n", - "Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"model-eval-testaip-7yeib57l\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account {TODO: Include these cells if the notebook specifies a service account}\n", - "\n", - "{TODO: What uses service account in the notebook; e.g., You use a service account to create Vertex AI Pipeline jobs.}. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for {TODO; e.g., Vertex AI Pipelines}\n", - "\n", - "Run the following commands to grant your service account access to {TODO; i.e., read and write pipeline artifacts} in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import google.cloud.aiplatform as aiplatform\n", - "from google_cloud_pipeline_components.experimental.evaluation import (\n", - " ModelEvaluationRegressionOp, \n", - " ModelEvaluationFeatureAttributionOp,\n", - " ModelImportEvaluationOp)\n", - "from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - "from google_cloud_pipeline_components.experimental import evaluation\n", - "import kfp" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "source": [ - "## Dataset\n", - "\n", - "We use this bigquery table for training" - ], - "metadata": { - "id": "BiVlyW5OUnjK" - } - }, - { - "cell_type": "code", - "source": [ - "# Define BigQuery table to be used for training\n", - "\n", - "BQ_TABLE = \"bigquery-public-data.samples.gsod\"" - ], - "metadata": { - "id": "bViYfWfpVAiF" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "\n", - "from google.cloud import bigquery\n", - "\n", - "# Create client in default region\n", - "bq_client = bigquery.Client(\n", - " project=PROJECT_ID,\n", - " credentials=aiplatform.initializer.global_config.credentials,\n", - ")\n" - ], - "metadata": { - "id": "20S9En09X0PY" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "# Create test dataset in default region\n", - "PREDICTION_INPUT_DATASET_ID = f\"gsod_prediction_{UUID}\"\n", - "PREDICTION_INPUT_TABLE_ID = f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}.prediction\"" - ], - "metadata": { - "id": "KvRQNKhEmGHs" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "bq_dataset = bigquery.Dataset(f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}\")\n", - "bq_dataset = bq_client.create_dataset(bq_dataset)\n", - "print(f\"Created dataset {bq_client.project}.{bq_dataset.dataset_id}\")" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "cw3n0ftZYZ_h", - "outputId": "143f5d75-c324-4ebc-efba-12b51fdc47c9" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Created dataset model-eval-test.gsod_prediction_7yeib57l\n" - ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "# Select a subset of the original dataset for testing\n", - "PREDICTION_SIZE = 10\n", - "query = f\"\"\"\n", - " SELECT *\n", - " FROM {BQ_TABLE}\n", - " LIMIT {PREDICTION_SIZE} \n", - " \"\"\"\n", - "\n", - "job_config = bigquery.QueryJobConfig(destination=PREDICTION_INPUT_TABLE_ID)\n", - "\n", - "query_job = bq_client.query(query, job_config=job_config) # API request\n", - "query_job.result() # Waits for query to finish" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "XG0U5lmfYrNT", - "outputId": "03dccbd9-7405-48c1-f9b6-2bec0872df37" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": {}, - "execution_count": 83 - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "### Create the dataset" - ], - "metadata": { - "id": "4XpPTSFoYCsT" - } - }, - { - "cell_type": "code", - "source": [ - "DATASET_NAME = \"Pen\"+UUID" - ], - "metadata": { - "id": "gHOUMfskYIpO" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=DATASET_NAME,\n", - " bq_source=[f\"bq://{BQ_TABLE}\"],\n", - ")\n", - "\n", - "COLUMN_SPECS = {\n", - " \"year\": \"auto\",\n", - " \"month\": \"auto\",\n", - " \"day\": \"auto\",\n", - "}\n", - "\n", - "label_column = \"mean_temp\"\n", - "\n", - "dataset = aiplatform.TabularDataset('projects/1058599485685/locations/us-central1/datasets/5507798990181105664')\n", - "\n", - "print(dataset.resource_name)" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "IyXwOcbVYBd1", - "outputId": "cad8909b-8285-403f-d8ec-c304a577fb23" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "projects/1058599485685/locations/us-central1/datasets/5507798990181105664\n" - ] - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "## Train the model" - ], - "metadata": { - "id": "A-QQkeUnq8Xt" - } - }, - { - "cell_type": "code", - "source": [ - "MODEL_NAME = \"pen\" + UUID" - ], - "metadata": { - "id": "Bxn6ATUXrET6" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "training_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name= \"pen_training_job\",\n", - " optimization_prediction_type=\"regression\",\n", - " optimization_objective=\"minimize-rmse\"\n", - ")\n", - "\n", - "print(training_job)" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "3l691PEMZFdA", - "outputId": "f0757841-1901-408f-a587-28b6f3a4181d" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "\n" - ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "model = training_job.run(\n", - " dataset=dataset,\n", - " model_display_name=MODEL_NAME,\n", - " training_fraction_split=0.6,\n", - " validation_fraction_split=0.2,\n", - " test_fraction_split=0.2,\n", - " budget_milli_node_hours=1,\n", - " disable_early_stopping=False,\n", - " target_column=label_column,\n", - ")" - ], - "metadata": { - "id": "IIfvPCGYyFCT" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## Get Model" - ], - "metadata": { - "id": "RYjWdtscAmFP" - } - }, - { - "cell_type": "code", - "source": [ - "# model = aiplatform.Model(f'/projects/{PROJECT_ID}/locations/{REGION}/models/2036871678734106624')\n", - "MODEL_ID = '2241222511826042880'\n", - "model = aiplatform.Model(model_name=MODEL_ID)" - ], - "metadata": { - "id": "4Dkk4P_TAlkr" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### List model eval metrics" - ], - "metadata": { - "id": "rYirKB_9yaa0" - } - }, - { - "cell_type": "code", - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ], - "metadata": { - "id": "KkgCdQQAyZP1" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## Model Evaluation" - ], - "metadata": { - "id": "ce6beLsXASnK" - } - }, - { - "cell_type": "code", - "source": [ - "@kfp.dsl.pipeline(\n", - " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " # batch_predict_gcs_source_uris: list,\n", - " bigquery_source_input_uri: str,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = 'jsonl',\n", - " batch_predict_machine_type: str = 'n1-standard-16',\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_machine_type: str = 'n1-standard-4',\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_disk_size_gb: int = 50,\n", - " dataflow_service_account: str = '',\n", - " dataflow_subnetwork: str = '',\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = ''):\n", - " \n", - " # get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - " get_model_task = evaluation.GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Batch Prediction.\n", - " batch_predict_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs['model'],\n", - " job_display_name='model-registry-batch-predict-evaluatio',\n", - " bigquery_source_input_uri=bigquery_source_input_uri,\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " encryption_spec_key_name=encryption_spec_key_name)\n", - "\n", - " # # # Run the Batch Explain process (sampler -> batch explanation).\n", - " # data_sampler_task = EvaluationDataSamplerOp(\n", - " # project=project,\n", - " # location=location,\n", - " # root_dir=root_dir,\n", - " # gcs_source_uris=batch_predict_gcs_source_uris,\n", - " # instances_format=batch_predict_instances_format,\n", - " # sample_size=batch_predict_explanation_data_sample_size)\n", - " # batch_explain_task = ModelBatchPredictOp(\n", - " # project=project,\n", - " # location=location,\n", - " # model=get_model_task.outputs['model'],\n", - " # job_display_name='model-registry-batch-explain-evaluation-{{$.pipeline_job_uuid}}-{{$.pipeline_task_uuid}}',\n", - " # gcs_source_uris=data_sampler_task.outputs['gcs_output_directory'],\n", - " # instances_format=batch_predict_instances_format,\n", - " # predictions_format=batch_predict_predictions_format,\n", - " # gcs_destination_output_uri_prefix=root_dir,\n", - " # generate_explanation=True,\n", - " # explanation_parameters=batch_predict_explanation_parameters,\n", - " # explanation_metadata=batch_predict_explanation_metadata,\n", - " # machine_type=batch_predict_machine_type,\n", - " # starting_replica_count=batch_predict_starting_replica_count,\n", - " # max_replica_count=batch_predict_max_replica_count,\n", - " # encryption_spec_key_name=encryption_spec_key_name)\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationRegressionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " ground_truth_column=target_column_name,\n", - " predictions_format=batch_predict_predictions_format,\n", - " predictions_gcs_source=batch_predict_task\n", - " .outputs['gcs_output_directory'],\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name)\n", - " # feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " # project=project,\n", - " # location=location,\n", - " # root_dir=root_dir,\n", - " # predictions_format='jsonl',\n", - " # predictions_gcs_source=batch_explain_task\n", - " # .outputs['gcs_output_directory'],\n", - " # dataflow_machine_type=dataflow_machine_type,\n", - " # dataflow_max_workers_num=dataflow_max_num_workers,\n", - " # dataflow_disk_size=dataflow_disk_size_gb,\n", - " # dataflow_service_account=dataflow_service_account,\n", - " # dataflow_subnetwork=dataflow_subnetwork,\n", - " # dataflow_use_public_ips=dataflow_use_public_ips,\n", - " # encryption_spec_key_name=encryption_spec_key_name)\n", - " ModelImportEvaluationOp(\n", - " regression_metrics=eval_task.outputs['evaluation_metrics'],\n", - " # feature_attributions=feature_attribution_task\n", - " # .outputs['feature_attributions'],\n", - " model=get_model_task.outputs['model'],\n", - " dataset_type=batch_predict_instances_format)" - ], - "metadata": { - "id": "ktMsqtibAUzz" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "from kfp.v2 import compiler # noqa: F811\n", - "\n", - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_regression_pipeline.json\",\n", - ")" - ], - "metadata": { - "id": "NOvOMTEgCVcW", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "e9918c90-562c-4c1a-ce37-ffce17cb0af7" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stderr", - "text": [ - "/usr/local/lib/python3.7/dist-packages/kfp/v2/compiler/compiler.py:1295: FutureWarning: APIs imported from the v1 namespace (e.g. kfp.dsl, kfp.components, etc) will not be supported by the v2 compiler since v2.0.0\n", - " category=FutureWarning,\n" - ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pen{UUID}\"\n", - "parameters = {\n", - " 'project':PROJECT_ID,\n", - " 'location':REGION,\n", - " 'root_dir':PIPELINE_ROOT,\n", - " 'prediction_type':'regression',\n", - " 'model_name':f'projects/{PROJECT_ID}/locations/{REGION}/models/{MODEL_ID}',\n", - " 'target_column_name':label_column,\n", - " 'bigquery_source_input_uri':f\"bq://{PREDICTION_INPUT_TABLE_ID}\",\n", - " 'batch_predict_instances_format':'bigquery',\n", - " }" - ], - "metadata": { - "id": "JeSiA6-TSgV8" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "DISPLAY_NAME = \"pen\" + UUID\n", - "\n", - "job = aiplatform.PipelineJob(\n", - " display_name=DISPLAY_NAME,\n", - " template_path=\"tabular_regression_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run()\n", - "\n", - "# ! rm tabular_regression_pipeline.json" - ], - "metadata": { - "id": "pdHib_yUEuEk" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "Click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ], - "metadata": { - "id": "mKRTDi8ioXBY" - } - }, - { - "cell_type": "markdown", - "source": [ - "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" - ], - "metadata": { - "id": "U2zocUvk2YVs" - } - }, - { - "cell_type": "markdown", - "source": [ - "### Molde Evaluation Results" - ], - "metadata": { - "id": "XcKaONSsGNC4" - } - }, - { - "cell_type": "code", - "source": [ - "# _________NOTE_________:\n", - "#this is a sample code from eval team... need to be degbugged or replaced with a \n", - "# better appraoch\n", - "\n", - "from google.cloud import aiplatform_v1\n", - "\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " if ((\"model-evaluation\" in task.task_name) and\n", - " (\"model-evaluation-import\" not in task.task_name) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED)):\n", - " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ], - "metadata": { - "id": "mtHA8rhGGQv3" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial:\n", - "\n", - "{TODO: Include commands to delete individual resources below}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete endpoint resource\n", - "! gcloud ai endpoints delete $ENDPOINT_NAME --quiet --region $REGION\n", - "\n", - "# Delete model resource\n", - "! gcloud ai models delete $MODEL_NAME --quiet\n", - "\n", - "# Delete Cloud Storage objects that were created\n", - "! gsutil -m rm -r $JOB_DIR\n", - "\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "community-automl-regression.ipynb", - "provenance": [] - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} \ No newline at end of file From 1e16e7b726121a4322d3ef1c367a63d0567ff92c Mon Sep 17 00:00:00 2001 From: Soheila Zangeneh Date: Tue, 30 Aug 2022 09:46:49 -0400 Subject: [PATCH 03/34] Pring evaluation results --- ..._tabular_regression_model_evaluation.ipynb | 342 +++++++++--------- 1 file changed, 171 insertions(+), 171 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index 9eba72c37..065c70cd1 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -678,30 +678,35 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "BiVlyW5OUnjK" + }, "source": [ "## Dataset\n", "\n", "We use this bigquery table for training" - ], - "metadata": { - "id": "BiVlyW5OUnjK" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bViYfWfpVAiF" + }, + "outputs": [], "source": [ "# Define BigQuery table to be used for training\n", "\n", "BQ_TABLE = \"bigquery-public-data.samples.gsod\"" - ], - "metadata": { - "id": "bViYfWfpVAiF" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "20S9En09X0PY" + }, + "outputs": [], "source": [ "\n", "from google.cloud import bigquery\n", @@ -711,33 +716,24 @@ " project=PROJECT_ID,\n", " credentials=aiplatform.initializer.global_config.credentials,\n", ")\n" - ], - "metadata": { - "id": "20S9En09X0PY" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KvRQNKhEmGHs" + }, + "outputs": [], "source": [ "# Create test dataset in default region\n", "PREDICTION_INPUT_DATASET_ID = f\"gsod_prediction_{UUID}\"\n", "PREDICTION_INPUT_TABLE_ID = f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}.prediction\"" - ], - "metadata": { - "id": "KvRQNKhEmGHs" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", - "source": [ - "bq_dataset = bigquery.Dataset(f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}\")\n", - "bq_dataset = bq_client.create_dataset(bq_dataset)\n", - "print(f\"Created dataset {bq_client.project}.{bq_dataset.dataset_id}\")" - ], + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -745,33 +741,24 @@ "id": "cw3n0ftZYZ_h", "outputId": "143f5d75-c324-4ebc-efba-12b51fdc47c9" }, - "execution_count": null, "outputs": [ { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ "Created dataset model-eval-test.gsod_prediction_7yeib57l\n" ] } + ], + "source": [ + "bq_dataset = bigquery.Dataset(f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}\")\n", + "bq_dataset = bq_client.create_dataset(bq_dataset)\n", + "print(f\"Created dataset {bq_client.project}.{bq_dataset.dataset_id}\")" ] }, { "cell_type": "code", - "source": [ - "# Select a subset of the original dataset for testing\n", - "PREDICTION_SIZE = 10\n", - "query = f\"\"\"\n", - " SELECT *\n", - " FROM {BQ_TABLE}\n", - " LIMIT {PREDICTION_SIZE} \n", - " \"\"\"\n", - "\n", - "job_config = bigquery.QueryJobConfig(destination=PREDICTION_INPUT_TABLE_ID)\n", - "\n", - "query_job = bq_client.query(query, job_config=job_config) # API request\n", - "query_job.result() # Waits for query to finish" - ], + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -779,42 +766,72 @@ "id": "XG0U5lmfYrNT", "outputId": "03dccbd9-7405-48c1-f9b6-2bec0872df37" }, - "execution_count": null, "outputs": [ { - "output_type": "execute_result", "data": { "text/plain": [ "" ] }, + "execution_count": 83, "metadata": {}, - "execution_count": 83 + "output_type": "execute_result" } + ], + "source": [ + "# Select a subset of the original dataset for testing\n", + "PREDICTION_SIZE = 10\n", + "query = f\"\"\"\n", + " SELECT *\n", + " FROM {BQ_TABLE}\n", + " LIMIT {PREDICTION_SIZE} \n", + " \"\"\"\n", + "\n", + "job_config = bigquery.QueryJobConfig(destination=PREDICTION_INPUT_TABLE_ID)\n", + "\n", + "query_job = bq_client.query(query, job_config=job_config) # API request\n", + "query_job.result() # Waits for query to finish" ] }, { "cell_type": "markdown", - "source": [ - "### Create the dataset" - ], "metadata": { "id": "4XpPTSFoYCsT" - } + }, + "source": [ + "### Create the dataset" + ] }, { "cell_type": "code", - "source": [ - "DATASET_NAME = \"Pen\"+UUID" - ], + "execution_count": null, "metadata": { "id": "gHOUMfskYIpO" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "DATASET_NAME = \"Pen\"+UUID" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IyXwOcbVYBd1", + "outputId": "cad8909b-8285-403f-d8ec-c304a577fb23" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "projects/1058599485685/locations/us-central1/datasets/5507798990181105664\n" + ] + } + ], "source": [ "dataset = aiplatform.TabularDataset.create(\n", " display_name=DATASET_NAME,\n", @@ -832,56 +849,31 @@ "dataset = aiplatform.TabularDataset('projects/1058599485685/locations/us-central1/datasets/5507798990181105664')\n", "\n", "print(dataset.resource_name)" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "IyXwOcbVYBd1", - "outputId": "cad8909b-8285-403f-d8ec-c304a577fb23" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "projects/1058599485685/locations/us-central1/datasets/5507798990181105664\n" - ] - } ] }, { "cell_type": "markdown", - "source": [ - "## Train the model" - ], "metadata": { "id": "A-QQkeUnq8Xt" - } + }, + "source": [ + "## Train the model" + ] }, { "cell_type": "code", - "source": [ - "MODEL_NAME = \"pen\" + UUID" - ], + "execution_count": null, "metadata": { "id": "Bxn6ATUXrET6" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "MODEL_NAME = \"pen\" + UUID" + ] }, { "cell_type": "code", - "source": [ - "training_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name= \"pen_training_job\",\n", - " optimization_prediction_type=\"regression\",\n", - " optimization_objective=\"minimize-rmse\"\n", - ")\n", - "\n", - "print(training_job)" - ], + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -889,19 +881,32 @@ "id": "3l691PEMZFdA", "outputId": "f0757841-1901-408f-a587-28b6f3a4181d" }, - "execution_count": null, "outputs": [ { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ "\n" ] } + ], + "source": [ + "training_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name= \"pen_training_job\",\n", + " optimization_prediction_type=\"regression\",\n", + " optimization_objective=\"minimize-rmse\"\n", + ")\n", + "\n", + "print(training_job)" ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IIfvPCGYyFCT" + }, + "outputs": [], "source": [ "model = training_job.run(\n", " dataset=dataset,\n", @@ -913,70 +918,70 @@ " disable_early_stopping=False,\n", " target_column=label_column,\n", ")" - ], - "metadata": { - "id": "IIfvPCGYyFCT" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "## Get Model" - ], "metadata": { "id": "RYjWdtscAmFP" - } + }, + "source": [ + "## Get Model" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "4Dkk4P_TAlkr" + }, + "outputs": [], "source": [ "# model = aiplatform.Model(f'/projects/{PROJECT_ID}/locations/{REGION}/models/2036871678734106624')\n", "MODEL_ID = '2241222511826042880'\n", "model = aiplatform.Model(model_name=MODEL_ID)" - ], - "metadata": { - "id": "4Dkk4P_TAlkr" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "### List model eval metrics" - ], "metadata": { "id": "rYirKB_9yaa0" - } + }, + "source": [ + "### List model eval metrics" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "outputs": [], "source": [ "# Get evaluations\n", "model_evaluations = model.list_model_evaluations()\n", "\n", "model_evaluation = list(model_evaluations)[0]\n", "print(model_evaluation)" - ], - "metadata": { - "id": "KkgCdQQAyZP1" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "## Model Evaluation" - ], "metadata": { "id": "ce6beLsXASnK" - } + }, + "source": [ + "## Model Evaluation" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ktMsqtibAUzz" + }, + "outputs": [], "source": [ "@kfp.dsl.pipeline(\n", " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", @@ -1085,44 +1090,44 @@ " # .outputs['feature_attributions'],\n", " model=get_model_task.outputs['model'],\n", " dataset_type=batch_predict_instances_format)" - ], - "metadata": { - "id": "ktMsqtibAUzz" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", - "source": [ - "from kfp.v2 import compiler # noqa: F811\n", - "\n", - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_regression_pipeline.json\",\n", - ")" - ], + "execution_count": null, "metadata": { - "id": "NOvOMTEgCVcW", "colab": { "base_uri": "https://localhost:8080/" }, + "id": "NOvOMTEgCVcW", "outputId": "e9918c90-562c-4c1a-ce37-ffce17cb0af7" }, - "execution_count": null, "outputs": [ { - "output_type": "stream", "name": "stderr", + "output_type": "stream", "text": [ "/usr/local/lib/python3.7/dist-packages/kfp/v2/compiler/compiler.py:1295: FutureWarning: APIs imported from the v1 namespace (e.g. kfp.dsl, kfp.components, etc) will not be supported by the v2 compiler since v2.0.0\n", " category=FutureWarning,\n" ] } + ], + "source": [ + "from kfp.v2 import compiler # noqa: F811\n", + "\n", + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "outputs": [], "source": [ "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pen{UUID}\"\n", "parameters = {\n", @@ -1135,15 +1140,15 @@ " 'bigquery_source_input_uri':f\"bq://{PREDICTION_INPUT_TABLE_ID}\",\n", " 'batch_predict_instances_format':'bigquery',\n", " }" - ], - "metadata": { - "id": "JeSiA6-TSgV8" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pdHib_yUEuEk" + }, + "outputs": [], "source": [ "DISPLAY_NAME = \"pen\" + UUID\n", "\n", @@ -1157,44 +1162,44 @@ "job.run()\n", "\n", "# ! rm tabular_regression_pipeline.json" - ], - "metadata": { - "id": "pdHib_yUEuEk" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, "source": [ "Click on the generated link to see your run in the Cloud Console.\n", "\n", "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ], - "metadata": { - "id": "mKRTDi8ioXBY" - } + ] }, { "cell_type": "markdown", - "source": [ - "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" - ], "metadata": { "id": "U2zocUvk2YVs" - } + }, + "source": [ + "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" + ] }, { "cell_type": "markdown", - "source": [ - "### Molde Evaluation Results" - ], "metadata": { "id": "XcKaONSsGNC4" - } + }, + "source": [ + "### Molde Evaluation Results" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mtHA8rhGGQv3" + }, + "outputs": [], "source": [ "# _________NOTE_________:\n", "#this is a sample code from eval team... need to be degbugged or replaced with a \n", @@ -1205,18 +1210,13 @@ "for task in job._gca_resource.job_detail.task_details:\n", " if ((\"model-evaluation\" in task.task_name) and\n", " (\"model-evaluation-import\" not in task.task_name) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED)):\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", "\n", "print(evaluation_metrics)\n", "print(evaluation_metrics_gcs_uri)" - ], - "metadata": { - "id": "mtHA8rhGGQv3" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", @@ -1270,4 +1270,4 @@ }, "nbformat": 4, "nbformat_minor": 0 -} \ No newline at end of file +} From 1b07079c3611acf03a112d0685da4d2fbce4166c Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Fri, 2 Sep 2022 05:48:19 +0000 Subject: [PATCH 04/34] adds the automl-tabular-classification notebook in model_evaluation folder --- ...ular_classification_model_evaluation.ipynb | 1377 +++++++++++++++++ 1 file changed, 1377 insertions(+) create mode 100644 notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb diff --git a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb new file mode 100644 index 000000000..18efb3319 --- /dev/null +++ b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb @@ -0,0 +1,1377 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI classification model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML` \n", + "- Vertex AI `Pipeline`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Import a pre-trained `AutoML model resource` into the pipeline.\n", + "- Run a `batch prediction` job.\n", + "- Evaulate the AutoML model using the `classification evluation component`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43", + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "from kfp.v2 import compiler\n", + "from google.cloud import aiplatform_v1\n", + "import matplotlib.pyplot as plt\n", + "import json" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset\",\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print (\"Resource name:\",dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple classification model the created dataset using `Adopted` as the target column. \n", + "\n", + "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if TRAINING_JOB_DISPLAY_NAME == \"\" or \\\n", + " TRAINING_JOB_DISPLAY_NAME is None or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\":\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define the AutoML training job\n", + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"classification\",\n", + " column_transformations=[\n", + " {\"categorical\": {\"column_name\": \"Type\"}},\n", + " {\"numeric\": {\"column_name\": \"Age\"}},\n", + " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", + " {\"categorical\": {\"column_name\": \"Color1\"}},\n", + " {\"categorical\": {\"column_name\": \"Color2\"}},\n", + " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", + " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", + " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", + " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", + " {\"categorical\": {\"column_name\": \"Health\"}},\n", + " {\"numeric\": {\"column_name\": \"Fee\"}},\n", + " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", + " ],\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if MODEL_DISPLAY_NAME == \"\" or \\\n", + " MODEL_DISPLAY_NAME is None or MODEL_DISPLAY_NAME == \"[your-model-display-name]\":\n", + " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run the training job on the created Vertex AI dataset by passing the needed arguments for training.\n", + "\n", + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Run the training job\n", + "model = train_job.run(\n", + " service_account=SERVICE_ACCOUNT,\n", + " dataset=dataset,\n", + " target_column=\"Adopted\",\n", + " training_fraction_split=0.8,\n", + " validation_fraction_split=0.1,\n", + " test_fraction_split=0.1,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " disable_early_stopping=False,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(\n", + " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = 'jsonl',\n", + " batch_predict_machine_type: str = 'n1-standard-4',\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_machine_type: str = 'n1-standard-4',\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_disk_size_gb: int = 50,\n", + " dataflow_service_account: str = '',\n", + " dataflow_subnetwork: str = '',\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = ''):\n", + " \n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " GetVertexModelOp,\n", + " EvaluationDataSamplerOp,\n", + " ModelEvaluationClassificationOp, \n", + " ModelEvaluationFeatureAttributionOp,\n", + " ModelImportEvaluationOp\n", + " )\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " \n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + " \n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size\n", + " )\n", + " \n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs['model'],\n", + " job_display_name='model-registry-batch-predict-evaluation',\n", + " gcs_source_uris= data_sampler_task.outputs['gcs_output_directory'],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationClassificationOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " problem_type=prediction_type,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", + " predictions_format=batch_predict_predictions_format,\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name\n", + " )\n", + " \n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format='jsonl',\n", + " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " problem_type=prediction_type,\n", + " classification_metrics=eval_task.outputs['evaluation_metrics'],\n", + " feature_attributions=feature_attribution_task.outputs['feature_attributions'],\n", + " model=get_model_task.outputs['model'],\n", + " dataset_type=batch_predict_instances_format\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_classification_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_classification_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\" or \\\n", + " PIPELINE_DISPLAY_NAME == \"\" or PIPELINE_DISPLAY_NAME is None:\n", + " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", + "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", + "- `target_column_name`: Name of the column to be used as the target for classification.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", + "parameters = {\n", + " 'project':PROJECT_ID,\n", + " 'location':REGION,\n", + " 'root_dir':PIPELINE_ROOT,\n", + " 'model_name':model.resource_name,\n", + " 'prediction_type':\"classification\",\n", + " 'target_column_name':\"Adopted\",\n", + " 'batch_predict_gcs_source_uris':[DATA_SOURCE],\n", + " 'batch_predict_instances_format':'csv',\n", + " 'batch_predict_explanation_data_sample_size': 3000\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run the pipeline using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=DISPLAY_NAME,\n", + " template_path=\"tabular_classification_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U2zocUvk2YVs" + }, + "source": [ + "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if ((\"model-evaluation\" in task.task_name) and\n", + " (\"model-evaluation-import\" not in task.task_name) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", + " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the metrics\n", + "\n", + "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(5,3))\n", + "plt.bar(x=metrics,height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if ((task.task_name == \"feature-attribution\" ) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", + " feat_attrs = task.outputs.get('feature_attributions').artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print (feat_attrs)\n", + "print (feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print (attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + " \n", + "plt.figure(figsize=(5,3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "community-automl-regression.ipynb", + "provenance": [] + }, + "environment": { + "kernel": "conda-env-eval_comp-py", + "name": "common-cpu.m90", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/base-cpu:m90" + }, + "kernelspec": { + "display_name": "Python [conda env:eval_comp]", + "language": "python", + "name": "conda-env-eval_comp-py" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From 320f3007fbaefabb7d38313b1cf94ee0cfc6dcba Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Fri, 2 Sep 2022 05:52:29 +0000 Subject: [PATCH 05/34] removes unnecessary imports --- ..._tabular_regression_model_evaluation.ipynb | 275 +++++++++--------- 1 file changed, 132 insertions(+), 143 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index 065c70cd1..b4be6069e 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -645,13 +645,12 @@ "outputs": [], "source": [ "import google.cloud.aiplatform as aiplatform\n", - "from google_cloud_pipeline_components.experimental.evaluation import (\n", - " ModelEvaluationRegressionOp, \n", - " ModelEvaluationFeatureAttributionOp,\n", - " ModelImportEvaluationOp)\n", + "import kfp\n", "from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", "from google_cloud_pipeline_components.experimental import evaluation\n", - "import kfp" + "from google_cloud_pipeline_components.experimental.evaluation import (\n", + " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", + " ModelImportEvaluationOp)" ] }, { @@ -708,14 +707,13 @@ }, "outputs": [], "source": [ - "\n", "from google.cloud import bigquery\n", "\n", "# Create client in default region\n", "bq_client = bigquery.Client(\n", " project=PROJECT_ID,\n", " credentials=aiplatform.initializer.global_config.credentials,\n", - ")\n" + ")" ] }, { @@ -735,11 +733,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "cw3n0ftZYZ_h", - "outputId": "143f5d75-c324-4ebc-efba-12b51fdc47c9" + "id": "cw3n0ftZYZ_h" }, "outputs": [ { @@ -760,11 +754,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "XG0U5lmfYrNT", - "outputId": "03dccbd9-7405-48c1-f9b6-2bec0872df37" + "id": "XG0U5lmfYrNT" }, "outputs": [ { @@ -810,18 +800,14 @@ }, "outputs": [], "source": [ - "DATASET_NAME = \"Pen\"+UUID" + "DATASET_NAME = \"Pen\" + UUID" ] }, { "cell_type": "code", "execution_count": null, "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "IyXwOcbVYBd1", - "outputId": "cad8909b-8285-403f-d8ec-c304a577fb23" + "id": "IyXwOcbVYBd1" }, "outputs": [ { @@ -846,7 +832,9 @@ "\n", "label_column = \"mean_temp\"\n", "\n", - "dataset = aiplatform.TabularDataset('projects/1058599485685/locations/us-central1/datasets/5507798990181105664')\n", + "dataset = aiplatform.TabularDataset(\n", + " \"projects/1058599485685/locations/us-central1/datasets/5507798990181105664\"\n", + ")\n", "\n", "print(dataset.resource_name)" ] @@ -875,11 +863,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "3l691PEMZFdA", - "outputId": "f0757841-1901-408f-a587-28b6f3a4181d" + "id": "3l691PEMZFdA" }, "outputs": [ { @@ -892,9 +876,9 @@ ], "source": [ "training_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name= \"pen_training_job\",\n", + " display_name=\"pen_training_job\",\n", " optimization_prediction_type=\"regression\",\n", - " optimization_objective=\"minimize-rmse\"\n", + " optimization_objective=\"minimize-rmse\",\n", ")\n", "\n", "print(training_job)" @@ -938,7 +922,7 @@ "outputs": [], "source": [ "# model = aiplatform.Model(f'/projects/{PROJECT_ID}/locations/{REGION}/models/2036871678734106624')\n", - "MODEL_ID = '2241222511826042880'\n", + "MODEL_ID = \"2241222511826042880\"\n", "model = aiplatform.Model(model_name=MODEL_ID)" ] }, @@ -983,8 +967,7 @@ }, "outputs": [], "source": [ - "@kfp.dsl.pipeline(\n", - " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", + "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", "def evaluation_automl_tabular_feature_attribution_pipeline(\n", " project: str,\n", " location: str,\n", @@ -995,112 +978,111 @@ " # batch_predict_gcs_source_uris: list,\n", " bigquery_source_input_uri: str,\n", " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = 'jsonl',\n", - " batch_predict_machine_type: str = 'n1-standard-16',\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-16\",\n", " batch_predict_starting_replica_count: int = 5,\n", " batch_predict_max_replica_count: int = 10,\n", " batch_predict_explanation_metadata: dict = {},\n", " batch_predict_explanation_parameters: dict = {},\n", " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_machine_type: str = 'n1-standard-4',\n", + " dataflow_machine_type: str = \"n1-standard-4\",\n", " dataflow_max_num_workers: int = 5,\n", " dataflow_disk_size_gb: int = 50,\n", - " dataflow_service_account: str = '',\n", - " dataflow_subnetwork: str = '',\n", + " dataflow_service_account: str = \"\",\n", + " dataflow_subnetwork: str = \"\",\n", " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = ''):\n", - " \n", - " # get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - " get_model_task = evaluation.GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Batch Prediction.\n", - " batch_predict_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs['model'],\n", - " job_display_name='model-registry-batch-predict-evaluatio',\n", - " bigquery_source_input_uri=bigquery_source_input_uri,\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " encryption_spec_key_name=encryption_spec_key_name)\n", - "\n", - " # # # Run the Batch Explain process (sampler -> batch explanation).\n", - " # data_sampler_task = EvaluationDataSamplerOp(\n", - " # project=project,\n", - " # location=location,\n", - " # root_dir=root_dir,\n", - " # gcs_source_uris=batch_predict_gcs_source_uris,\n", - " # instances_format=batch_predict_instances_format,\n", - " # sample_size=batch_predict_explanation_data_sample_size)\n", - " # batch_explain_task = ModelBatchPredictOp(\n", - " # project=project,\n", - " # location=location,\n", - " # model=get_model_task.outputs['model'],\n", - " # job_display_name='model-registry-batch-explain-evaluation-{{$.pipeline_job_uuid}}-{{$.pipeline_task_uuid}}',\n", - " # gcs_source_uris=data_sampler_task.outputs['gcs_output_directory'],\n", - " # instances_format=batch_predict_instances_format,\n", - " # predictions_format=batch_predict_predictions_format,\n", - " # gcs_destination_output_uri_prefix=root_dir,\n", - " # generate_explanation=True,\n", - " # explanation_parameters=batch_predict_explanation_parameters,\n", - " # explanation_metadata=batch_predict_explanation_metadata,\n", - " # machine_type=batch_predict_machine_type,\n", - " # starting_replica_count=batch_predict_starting_replica_count,\n", - " # max_replica_count=batch_predict_max_replica_count,\n", - " # encryption_spec_key_name=encryption_spec_key_name)\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationRegressionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " ground_truth_column=target_column_name,\n", - " predictions_format=batch_predict_predictions_format,\n", - " predictions_gcs_source=batch_predict_task\n", - " .outputs['gcs_output_directory'],\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name)\n", - " # feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " # project=project,\n", - " # location=location,\n", - " # root_dir=root_dir,\n", - " # predictions_format='jsonl',\n", - " # predictions_gcs_source=batch_explain_task\n", - " # .outputs['gcs_output_directory'],\n", - " # dataflow_machine_type=dataflow_machine_type,\n", - " # dataflow_max_workers_num=dataflow_max_num_workers,\n", - " # dataflow_disk_size=dataflow_disk_size_gb,\n", - " # dataflow_service_account=dataflow_service_account,\n", - " # dataflow_subnetwork=dataflow_subnetwork,\n", - " # dataflow_use_public_ips=dataflow_use_public_ips,\n", - " # encryption_spec_key_name=encryption_spec_key_name)\n", - " ModelImportEvaluationOp(\n", - " regression_metrics=eval_task.outputs['evaluation_metrics'],\n", - " # feature_attributions=feature_attribution_task\n", - " # .outputs['feature_attributions'],\n", - " model=get_model_task.outputs['model'],\n", - " dataset_type=batch_predict_instances_format)" + " encryption_spec_key_name: str = \"\",\n", + "):\n", + "\n", + " # get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + " get_model_task = evaluation.GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Batch Prediction.\n", + " batch_predict_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluatio\",\n", + " bigquery_source_input_uri=bigquery_source_input_uri,\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " # # # Run the Batch Explain process (sampler -> batch explanation).\n", + " # data_sampler_task = EvaluationDataSamplerOp(\n", + " # project=project,\n", + " # location=location,\n", + " # root_dir=root_dir,\n", + " # gcs_source_uris=batch_predict_gcs_source_uris,\n", + " # instances_format=batch_predict_instances_format,\n", + " # sample_size=batch_predict_explanation_data_sample_size)\n", + " # batch_explain_task = ModelBatchPredictOp(\n", + " # project=project,\n", + " # location=location,\n", + " # model=get_model_task.outputs['model'],\n", + " # job_display_name='model-registry-batch-explain-evaluation-{{$.pipeline_job_uuid}}-{{$.pipeline_task_uuid}}',\n", + " # gcs_source_uris=data_sampler_task.outputs['gcs_output_directory'],\n", + " # instances_format=batch_predict_instances_format,\n", + " # predictions_format=batch_predict_predictions_format,\n", + " # gcs_destination_output_uri_prefix=root_dir,\n", + " # generate_explanation=True,\n", + " # explanation_parameters=batch_predict_explanation_parameters,\n", + " # explanation_metadata=batch_predict_explanation_metadata,\n", + " # machine_type=batch_predict_machine_type,\n", + " # starting_replica_count=batch_predict_starting_replica_count,\n", + " # max_replica_count=batch_predict_max_replica_count,\n", + " # encryption_spec_key_name=encryption_spec_key_name)\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_format=batch_predict_predictions_format,\n", + " predictions_gcs_source=batch_predict_task.outputs[\"gcs_output_directory\"],\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + " # feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " # project=project,\n", + " # location=location,\n", + " # root_dir=root_dir,\n", + " # predictions_format='jsonl',\n", + " # predictions_gcs_source=batch_explain_task\n", + " # .outputs['gcs_output_directory'],\n", + " # dataflow_machine_type=dataflow_machine_type,\n", + " # dataflow_max_workers_num=dataflow_max_num_workers,\n", + " # dataflow_disk_size=dataflow_disk_size_gb,\n", + " # dataflow_service_account=dataflow_service_account,\n", + " # dataflow_subnetwork=dataflow_subnetwork,\n", + " # dataflow_use_public_ips=dataflow_use_public_ips,\n", + " # encryption_spec_key_name=encryption_spec_key_name)\n", + " ModelImportEvaluationOp(\n", + " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " # feature_attributions=feature_attribution_task\n", + " # .outputs['feature_attributions'],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "NOvOMTEgCVcW", - "outputId": "e9918c90-562c-4c1a-ce37-ffce17cb0af7" + "id": "NOvOMTEgCVcW" }, "outputs": [ { @@ -1131,15 +1113,15 @@ "source": [ "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pen{UUID}\"\n", "parameters = {\n", - " 'project':PROJECT_ID,\n", - " 'location':REGION,\n", - " 'root_dir':PIPELINE_ROOT,\n", - " 'prediction_type':'regression',\n", - " 'model_name':f'projects/{PROJECT_ID}/locations/{REGION}/models/{MODEL_ID}',\n", - " 'target_column_name':label_column,\n", - " 'bigquery_source_input_uri':f\"bq://{PREDICTION_INPUT_TABLE_ID}\",\n", - " 'batch_predict_instances_format':'bigquery',\n", - " }" + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"prediction_type\": \"regression\",\n", + " \"model_name\": f\"projects/{PROJECT_ID}/locations/{REGION}/models/{MODEL_ID}\",\n", + " \"target_column_name\": label_column,\n", + " \"bigquery_source_input_uri\": f\"bq://{PREDICTION_INPUT_TABLE_ID}\",\n", + " \"batch_predict_instances_format\": \"bigquery\",\n", + "}" ] }, { @@ -1202,17 +1184,24 @@ "outputs": [], "source": [ "# _________NOTE_________:\n", - "#this is a sample code from eval team... need to be degbugged or replaced with a \n", + "# this is a sample code from eval team... need to be degbugged or replaced with a\n", "# better appraoch\n", "\n", "from google.cloud import aiplatform_v1\n", "\n", "for task in job._gca_resource.job_detail.task_details:\n", - " if ((\"model-evaluation\" in task.task_name) and\n", - " (\"model-evaluation-import\" not in task.task_name) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", - " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", + " 0\n", + " ] # ['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", "\n", "print(evaluation_metrics)\n", "print(evaluation_metrics_gcs_uri)" @@ -1260,8 +1249,8 @@ "metadata": { "colab": { "collapsed_sections": [], - "name": "community-automl-regression.ipynb", - "provenance": [] + "name": "automl_tabular_regression_model_evaluation.ipynb", + "toc_visible": true }, "kernelspec": { "display_name": "Python 3", From a6c8d5c0ba86bd0d0be836aeaed7087f696c1305 Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Fri, 2 Sep 2022 05:54:16 +0000 Subject: [PATCH 06/34] adjusts the imports inside the pipeline --- ...oml_tabular_classification_model_evaluation.ipynb | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb index 18efb3319..590d325cc 100644 --- a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb @@ -927,13 +927,11 @@ " dataflow_use_public_ips: bool = True,\n", " encryption_spec_key_name: str = ''):\n", " \n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " GetVertexModelOp,\n", - " EvaluationDataSamplerOp,\n", - " ModelEvaluationClassificationOp, \n", - " ModelEvaluationFeatureAttributionOp,\n", - " ModelImportEvaluationOp\n", - " )\n", + " from google_cloud_pipeline_components.experimental.evaluation import GetVertexModelOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import EvaluationDataSamplerOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationClassificationOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationFeatureAttributionOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import ModelImportEvaluationOp\n", " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", " \n", " # Get the Vertex AI model resource\n", From cc88effaa99d8e4365727bda762077104b7ad74d Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Fri, 2 Sep 2022 05:57:05 +0000 Subject: [PATCH 07/34] adjusts the imports --- ...ular_classification_model_evaluation.ipynb | 2799 +++++++++-------- 1 file changed, 1426 insertions(+), 1373 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb index 590d325cc..cd5eb6da1 100644 --- a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb @@ -1,1375 +1,1428 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI classification model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `AutoML` \n", - "- Vertex AI `Pipeline`\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Import a pre-trained `AutoML model resource` into the pipeline.\n", - "- Run a `batch prediction` job.\n", - "- Evaulate the AutoML model using the `classification evluation component`." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43", - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "from kfp.v2 import compiler\n", - "from google.cloud import aiplatform_v1\n", - "import matplotlib.pyplot as plt\n", - "import json" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset\",\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print (\"Resource name:\",dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple classification model the created dataset using `Adopted` as the target column. \n", - "\n", - "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if TRAINING_JOB_DISPLAY_NAME == \"\" or \\\n", - " TRAINING_JOB_DISPLAY_NAME is None or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\":\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Define the AutoML training job\n", - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"classification\",\n", - " column_transformations=[\n", - " {\"categorical\": {\"column_name\": \"Type\"}},\n", - " {\"numeric\": {\"column_name\": \"Age\"}},\n", - " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", - " {\"categorical\": {\"column_name\": \"Color1\"}},\n", - " {\"categorical\": {\"column_name\": \"Color2\"}},\n", - " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", - " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", - " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", - " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", - " {\"categorical\": {\"column_name\": \"Health\"}},\n", - " {\"numeric\": {\"column_name\": \"Fee\"}},\n", - " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", - " ],\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if MODEL_DISPLAY_NAME == \"\" or \\\n", - " MODEL_DISPLAY_NAME is None or MODEL_DISPLAY_NAME == \"[your-model-display-name]\":\n", - " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Run the training job on the created Vertex AI dataset by passing the needed arguments for training.\n", - "\n", - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Run the training job\n", - "model = train_job.run(\n", - " service_account=SERVICE_ACCOUNT,\n", - " dataset=dataset,\n", - " target_column=\"Adopted\",\n", - " training_fraction_split=0.8,\n", - " validation_fraction_split=0.1,\n", - " test_fraction_split=0.1,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " disable_early_stopping=False,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(\n", - " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = 'jsonl',\n", - " batch_predict_machine_type: str = 'n1-standard-4',\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_machine_type: str = 'n1-standard-4',\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_disk_size_gb: int = 50,\n", - " dataflow_service_account: str = '',\n", - " dataflow_subnetwork: str = '',\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = ''):\n", - " \n", - " from google_cloud_pipeline_components.experimental.evaluation import GetVertexModelOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import EvaluationDataSamplerOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationClassificationOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationFeatureAttributionOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import ModelImportEvaluationOp\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " \n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - " \n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size\n", - " )\n", - " \n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs['model'],\n", - " job_display_name='model-registry-batch-predict-evaluation',\n", - " gcs_source_uris= data_sampler_task.outputs['gcs_output_directory'],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationClassificationOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " problem_type=prediction_type,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", - " predictions_format=batch_predict_predictions_format,\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name\n", - " )\n", - " \n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format='jsonl',\n", - " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " problem_type=prediction_type,\n", - " classification_metrics=eval_task.outputs['evaluation_metrics'],\n", - " feature_attributions=feature_attribution_task.outputs['feature_attributions'],\n", - " model=get_model_task.outputs['model'],\n", - " dataset_type=batch_predict_instances_format\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_classification_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_classification_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\" or \\\n", - " PIPELINE_DISPLAY_NAME == \"\" or PIPELINE_DISPLAY_NAME is None:\n", - " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", - "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", - "- `target_column_name`: Name of the column to be used as the target for classification.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", - "parameters = {\n", - " 'project':PROJECT_ID,\n", - " 'location':REGION,\n", - " 'root_dir':PIPELINE_ROOT,\n", - " 'model_name':model.resource_name,\n", - " 'prediction_type':\"classification\",\n", - " 'target_column_name':\"Adopted\",\n", - " 'batch_predict_gcs_source_uris':[DATA_SOURCE],\n", - " 'batch_predict_instances_format':'csv',\n", - " 'batch_predict_explanation_data_sample_size': 3000\n", - " }" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Run the pipeline using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=DISPLAY_NAME,\n", - " template_path=\"tabular_classification_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U2zocUvk2YVs" - }, - "source": [ - "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if ((\"model-evaluation\" in task.task_name) and\n", - " (\"model-evaluation-import\" not in task.task_name) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", - " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualize the metrics\n", - "\n", - "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(5,3))\n", - "plt.bar(x=metrics,height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if ((task.task_name == \"feature-attribution\" ) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", - " feat_attrs = task.outputs.get('feature_attributions').artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print (feat_attrs)\n", - "print (feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print (attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - " \n", - "plt.figure(figsize=(5,3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "community-automl-regression.ipynb", - "provenance": [] - }, - "environment": { - "kernel": "conda-env-eval_comp-py", - "name": "common-cpu.m90", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/base-cpu:m90" - }, - "kernelspec": { - "display_name": "Python [conda env:eval_comp]", - "language": "python", - "name": "conda-env-eval_comp-py" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.6" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI classification model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML` \n", + "- Vertex AI `Pipeline`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Import a pre-trained `AutoML model resource` into the pipeline.\n", + "- Run a `batch prediction` job.\n", + "- Evaulate the AutoML model using the `classification evluation component`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8d97acf78771" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3390c9e9426c" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "93cdc7e78bf8" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset\",\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aa59f1629b5f" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple classification model the created dataset using `Adopted` as the target column. \n", + "\n", + "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5dd3db2d1225" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "4244f2f1056a" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8e8a48f855da" + }, + "outputs": [], + "source": [ + "# Define the AutoML training job\n", + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"classification\",\n", + " column_transformations=[\n", + " {\"categorical\": {\"column_name\": \"Type\"}},\n", + " {\"numeric\": {\"column_name\": \"Age\"}},\n", + " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", + " {\"categorical\": {\"column_name\": \"Color1\"}},\n", + " {\"categorical\": {\"column_name\": \"Color2\"}},\n", + " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", + " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", + " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", + " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", + " {\"categorical\": {\"column_name\": \"Health\"}},\n", + " {\"numeric\": {\"column_name\": \"Fee\"}},\n", + " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", + " ],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ebc17308377b" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5be50885d307" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "214f19b9c648" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "678953591916" + }, + "source": [ + "Run the training job on the created Vertex AI dataset by passing the needed arguments for training.\n", + "\n", + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b22221d68fc6" + }, + "outputs": [], + "source": [ + "# Run the training job\n", + "model = train_job.run(\n", + " service_account=SERVICE_ACCOUNT,\n", + " dataset=dataset,\n", + " target_column=\"Adopted\",\n", + " training_fraction_split=0.8,\n", + " validation_fraction_split=0.1,\n", + " test_fraction_split=0.1,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " disable_early_stopping=False,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "df75a9b2b7d6" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "a689fd5db924" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "be3cbb3448f2" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "065713f41276" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a487a82b631b" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d82b01b774e2" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_machine_type: str = \"n1-standard-4\",\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_disk_size_gb: int = 50,\n", + " dataflow_service_account: str = \"\",\n", + " dataflow_subnetwork: str = \"\",\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = \"\",\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationClassificationOp, ModelEvaluationFeatureAttributionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationClassificationOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " problem_type=prediction_type,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=\"jsonl\",\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " problem_type=prediction_type,\n", + " classification_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "df815b63414a" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_classification_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6024364c32cf" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_classification_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "88dcddfc8674" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d0b1eafce659" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1d686c0dfef1" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d02d918e8552" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", + "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", + "- `target_column_name`: Name of the column to be used as the target for classification.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3eff91df0105" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"prediction_type\": \"classification\",\n", + " \"target_column_name\": \"Adopted\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "625a9cbc60bb" + }, + "source": [ + "Run the pipeline using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3592146f31e1" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=DISPLAY_NAME,\n", + " template_path=\"tabular_classification_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U2zocUvk2YVs" + }, + "source": [ + "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d1b840a79c4e" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", + " 0\n", + " ] # ['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ca00512eb89f" + }, + "source": [ + "### Visualize the metrics\n", + "\n", + "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "f9e38f73f838" + }, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "049c9bbae2cb" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "03ca8c149bc6" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "719d2cd57d10" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "82e308dd8aca" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5bfe517357f8" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d7a7dca9e3cc" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "automl_tabular_classification_model_evaluation.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 } From ce54769d1f88583c6a8b06fc9b6ebd42527454df Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Fri, 2 Sep 2022 11:41:11 +0000 Subject: [PATCH 08/34] elaborates imports inside pipeline --- ...ular_classification_model_evaluation.ipynb | 2799 ++++++++--------- 1 file changed, 1373 insertions(+), 1426 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb index cd5eb6da1..590d325cc 100644 --- a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb @@ -1,1428 +1,1375 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI classification model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `AutoML` \n", - "- Vertex AI `Pipeline`\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Import a pre-trained `AutoML model resource` into the pipeline.\n", - "- Run a `batch prediction` job.\n", - "- Evaulate the AutoML model using the `classification evluation component`." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "import matplotlib.pyplot as plt\n", - "from google.cloud import aiplatform_v1\n", - "from kfp.v2 import compiler" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8d97acf78771" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3390c9e9426c" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "93cdc7e78bf8" - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset\",\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print(\"Resource name:\", dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aa59f1629b5f" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple classification model the created dataset using `Adopted` as the target column. \n", - "\n", - "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "5dd3db2d1225" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "4244f2f1056a" - }, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if (\n", - " TRAINING_JOB_DISPLAY_NAME == \"\"\n", - " or TRAINING_JOB_DISPLAY_NAME is None\n", - " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", - "):\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "8e8a48f855da" - }, - "outputs": [], - "source": [ - "# Define the AutoML training job\n", - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"classification\",\n", - " column_transformations=[\n", - " {\"categorical\": {\"column_name\": \"Type\"}},\n", - " {\"numeric\": {\"column_name\": \"Age\"}},\n", - " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", - " {\"categorical\": {\"column_name\": \"Color1\"}},\n", - " {\"categorical\": {\"column_name\": \"Color2\"}},\n", - " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", - " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", - " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", - " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", - " {\"categorical\": {\"column_name\": \"Health\"}},\n", - " {\"numeric\": {\"column_name\": \"Fee\"}},\n", - " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", - " ],\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ebc17308377b" - }, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "5be50885d307" - }, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "214f19b9c648" - }, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if (\n", - " MODEL_DISPLAY_NAME == \"\"\n", - " or MODEL_DISPLAY_NAME is None\n", - " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", - "):\n", - " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "678953591916" - }, - "source": [ - "Run the training job on the created Vertex AI dataset by passing the needed arguments for training.\n", - "\n", - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "b22221d68fc6" - }, - "outputs": [], - "source": [ - "# Run the training job\n", - "model = train_job.run(\n", - " service_account=SERVICE_ACCOUNT,\n", - " dataset=dataset,\n", - " target_column=\"Adopted\",\n", - " training_fraction_split=0.8,\n", - " validation_fraction_split=0.1,\n", - " test_fraction_split=0.1,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " disable_early_stopping=False,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "df75a9b2b7d6" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "a689fd5db924" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "be3cbb3448f2" - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "065713f41276" - }, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "a487a82b631b" - }, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d82b01b774e2" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = \"jsonl\",\n", - " batch_predict_machine_type: str = \"n1-standard-4\",\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_machine_type: str = \"n1-standard-4\",\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_disk_size_gb: int = 50,\n", - " dataflow_service_account: str = \"\",\n", - " dataflow_subnetwork: str = \"\",\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = \"\",\n", - "):\n", - "\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " EvaluationDataSamplerOp, GetVertexModelOp,\n", - " ModelEvaluationClassificationOp, ModelEvaluationFeatureAttributionOp,\n", - " ModelImportEvaluationOp)\n", - "\n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size,\n", - " )\n", - "\n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs[\"model\"],\n", - " job_display_name=\"model-registry-batch-predict-evaluation\",\n", - " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationClassificationOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " problem_type=prediction_type,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " predictions_format=batch_predict_predictions_format,\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format=\"jsonl\",\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " problem_type=prediction_type,\n", - " classification_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", - " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", - " model=get_model_task.outputs[\"model\"],\n", - " dataset_type=batch_predict_instances_format,\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "df815b63414a" - }, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_classification_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6024364c32cf" - }, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_classification_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "88dcddfc8674" - }, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d0b1eafce659" - }, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "1d686c0dfef1" - }, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if (\n", - " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", - " or PIPELINE_DISPLAY_NAME == \"\"\n", - " or PIPELINE_DISPLAY_NAME is None\n", - "):\n", - " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d02d918e8552" - }, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", - "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", - "- `target_column_name`: Name of the column to be used as the target for classification.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3eff91df0105" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", - "parameters = {\n", - " \"project\": PROJECT_ID,\n", - " \"location\": REGION,\n", - " \"root_dir\": PIPELINE_ROOT,\n", - " \"model_name\": model.resource_name,\n", - " \"prediction_type\": \"classification\",\n", - " \"target_column_name\": \"Adopted\",\n", - " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", - " \"batch_predict_instances_format\": \"csv\",\n", - " \"batch_predict_explanation_data_sample_size\": 3000,\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "625a9cbc60bb" - }, - "source": [ - "Run the pipeline using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3592146f31e1" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=DISPLAY_NAME,\n", - " template_path=\"tabular_classification_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U2zocUvk2YVs" - }, - "source": [ - "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d1b840a79c4e" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if (\n", - " (\"model-evaluation\" in task.task_name)\n", - " and (\"model-evaluation-import\" not in task.task_name)\n", - " and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " )\n", - " ):\n", - " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", - " 0\n", - " ] # ['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ca00512eb89f" - }, - "source": [ - "### Visualize the metrics\n", - "\n", - "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "f9e38f73f838" - }, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=metrics, height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "049c9bbae2cb" - }, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "03ca8c149bc6" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if (task.task_name == \"feature-attribution\") and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " ):\n", - " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print(feat_attrs)\n", - "print(feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "719d2cd57d10" - }, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "82e308dd8aca" - }, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print(attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5bfe517357f8" - }, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d7a7dca9e3cc" - }, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - "\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "automl_tabular_classification_model_evaluation.ipynb", - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - } - }, - "nbformat": 4, - "nbformat_minor": 0 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI classification model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML` \n", + "- Vertex AI `Pipeline`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Import a pre-trained `AutoML model resource` into the pipeline.\n", + "- Run a `batch prediction` job.\n", + "- Evaulate the AutoML model using the `classification evluation component`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43", + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "from kfp.v2 import compiler\n", + "from google.cloud import aiplatform_v1\n", + "import matplotlib.pyplot as plt\n", + "import json" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset\",\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print (\"Resource name:\",dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple classification model the created dataset using `Adopted` as the target column. \n", + "\n", + "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if TRAINING_JOB_DISPLAY_NAME == \"\" or \\\n", + " TRAINING_JOB_DISPLAY_NAME is None or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\":\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define the AutoML training job\n", + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"classification\",\n", + " column_transformations=[\n", + " {\"categorical\": {\"column_name\": \"Type\"}},\n", + " {\"numeric\": {\"column_name\": \"Age\"}},\n", + " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", + " {\"categorical\": {\"column_name\": \"Color1\"}},\n", + " {\"categorical\": {\"column_name\": \"Color2\"}},\n", + " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", + " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", + " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", + " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", + " {\"categorical\": {\"column_name\": \"Health\"}},\n", + " {\"numeric\": {\"column_name\": \"Fee\"}},\n", + " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", + " ],\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if MODEL_DISPLAY_NAME == \"\" or \\\n", + " MODEL_DISPLAY_NAME is None or MODEL_DISPLAY_NAME == \"[your-model-display-name]\":\n", + " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run the training job on the created Vertex AI dataset by passing the needed arguments for training.\n", + "\n", + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Run the training job\n", + "model = train_job.run(\n", + " service_account=SERVICE_ACCOUNT,\n", + " dataset=dataset,\n", + " target_column=\"Adopted\",\n", + " training_fraction_split=0.8,\n", + " validation_fraction_split=0.1,\n", + " test_fraction_split=0.1,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " disable_early_stopping=False,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(\n", + " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = 'jsonl',\n", + " batch_predict_machine_type: str = 'n1-standard-4',\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_machine_type: str = 'n1-standard-4',\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_disk_size_gb: int = 50,\n", + " dataflow_service_account: str = '',\n", + " dataflow_subnetwork: str = '',\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = ''):\n", + " \n", + " from google_cloud_pipeline_components.experimental.evaluation import GetVertexModelOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import EvaluationDataSamplerOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationClassificationOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationFeatureAttributionOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import ModelImportEvaluationOp\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " \n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + " \n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size\n", + " )\n", + " \n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs['model'],\n", + " job_display_name='model-registry-batch-predict-evaluation',\n", + " gcs_source_uris= data_sampler_task.outputs['gcs_output_directory'],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationClassificationOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " problem_type=prediction_type,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", + " predictions_format=batch_predict_predictions_format,\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name\n", + " )\n", + " \n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format='jsonl',\n", + " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " problem_type=prediction_type,\n", + " classification_metrics=eval_task.outputs['evaluation_metrics'],\n", + " feature_attributions=feature_attribution_task.outputs['feature_attributions'],\n", + " model=get_model_task.outputs['model'],\n", + " dataset_type=batch_predict_instances_format\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_classification_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_classification_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\" or \\\n", + " PIPELINE_DISPLAY_NAME == \"\" or PIPELINE_DISPLAY_NAME is None:\n", + " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", + "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", + "- `target_column_name`: Name of the column to be used as the target for classification.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", + "parameters = {\n", + " 'project':PROJECT_ID,\n", + " 'location':REGION,\n", + " 'root_dir':PIPELINE_ROOT,\n", + " 'model_name':model.resource_name,\n", + " 'prediction_type':\"classification\",\n", + " 'target_column_name':\"Adopted\",\n", + " 'batch_predict_gcs_source_uris':[DATA_SOURCE],\n", + " 'batch_predict_instances_format':'csv',\n", + " 'batch_predict_explanation_data_sample_size': 3000\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run the pipeline using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=DISPLAY_NAME,\n", + " template_path=\"tabular_classification_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U2zocUvk2YVs" + }, + "source": [ + "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if ((\"model-evaluation\" in task.task_name) and\n", + " (\"model-evaluation-import\" not in task.task_name) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", + " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the metrics\n", + "\n", + "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(5,3))\n", + "plt.bar(x=metrics,height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if ((task.task_name == \"feature-attribution\" ) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", + " feat_attrs = task.outputs.get('feature_attributions').artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print (feat_attrs)\n", + "print (feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print (attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + " \n", + "plt.figure(figsize=(5,3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "community-automl-regression.ipynb", + "provenance": [] + }, + "environment": { + "kernel": "conda-env-eval_comp-py", + "name": "common-cpu.m90", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/base-cpu:m90" + }, + "kernelspec": { + "display_name": "Python [conda env:eval_comp]", + "language": "python", + "name": "conda-env-eval_comp-py" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 } From 293ce7709d70f36f43f124750859006e45a197fa Mon Sep 17 00:00:00 2001 From: sudarshan-SpringML Date: Fri, 2 Sep 2022 12:00:18 +0000 Subject: [PATCH 09/34] modified regression notebook --- ..._tabular_regression_model_evaluation.ipynb | 2817 +++++++++-------- 1 file changed, 1561 insertions(+), 1256 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index b4be6069e..7868f55db 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -1,1262 +1,1567 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI regression model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `AutoML` \n", - "- Vertex AI `Pipeline`\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Import a pre-trained `AutoML model resource` into the pipeline\n", - "- Run a `batch prediction` job\n", - "- Evaulate the AutoML model using the `regression evluation component`" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "{TODO: Include a paragraph with Dataset information and where to obtain it.} \n", - "\n", - "{TODO: Make sure the dataset is accessible to the public. **Googlers**: Add your dataset to the [public samples bucket](http://goto/cloudsamples#sample-storage-bucket) within gs://cloud-samples-data/vertex-ai, if it doesn't already exist there.}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "\n", - "{TODO: Update the list of billable products that your tutorial uses.}\n", - "\n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "{TODO: Include links to pricing documentation for each product you listed above.}\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook. You can skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. \n", - "\n", - "{TODO: Suggest using the latest major GA version of each package; i.e., --upgrade}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17\n", - "! pip3 install $USER kfp google-cloud-pipeline-components --upgrade -q\n", - "# TODO: Add remaining package installs here" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com). {TODO: Update the APIs needed for your tutorial. Edit the API names, and update the link to append the API IDs, separating each one with a comma. For example, container.googleapis.com,cloudbuild.googleapis.com}\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Compute service account storage object creator and viewer permissions!!!**\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n", - "\n", - "{TODO: replace the `TIMESTAMP` with `UUID` in official notebooks}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "\n", - "{TODO: Adjust wording in the first paragraph to fit your use case - explain how your tutorial uses the Cloud Storage bucket. The example below shows how Vertex AI uses the bucket for training.}\n", - "\n", - "When you submit a training job using the Cloud SDK, you upload a Python package\n", - "containing your training code to a Cloud Storage bucket. Vertex AI runs\n", - "the code from this package. In this tutorial, Vertex AI also saves the\n", - "trained model that results from your job in the same bucket. Using this model artifact, you can then\n", - "create Vertex AI model and endpoint resources in order to serve\n", - "online predictions.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all\n", - "Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"model-eval-testaip-7yeib57l\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account {TODO: Include these cells if the notebook specifies a service account}\n", - "\n", - "{TODO: What uses service account in the notebook; e.g., You use a service account to create Vertex AI Pipeline jobs.}. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for {TODO; e.g., Vertex AI Pipelines}\n", - "\n", - "Run the following commands to grant your service account access to {TODO; i.e., read and write pipeline artifacts} in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - "from google_cloud_pipeline_components.experimental import evaluation\n", - "from google_cloud_pipeline_components.experimental.evaluation import (\n", - " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", - " ModelImportEvaluationOp)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BiVlyW5OUnjK" - }, - "source": [ - "## Dataset\n", - "\n", - "We use this bigquery table for training" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bViYfWfpVAiF" - }, - "outputs": [], - "source": [ - "# Define BigQuery table to be used for training\n", - "\n", - "BQ_TABLE = \"bigquery-public-data.samples.gsod\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "20S9En09X0PY" - }, - "outputs": [], - "source": [ - "from google.cloud import bigquery\n", - "\n", - "# Create client in default region\n", - "bq_client = bigquery.Client(\n", - " project=PROJECT_ID,\n", - " credentials=aiplatform.initializer.global_config.credentials,\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "KvRQNKhEmGHs" - }, - "outputs": [], - "source": [ - "# Create test dataset in default region\n", - "PREDICTION_INPUT_DATASET_ID = f\"gsod_prediction_{UUID}\"\n", - "PREDICTION_INPUT_TABLE_ID = f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}.prediction\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cw3n0ftZYZ_h" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Created dataset model-eval-test.gsod_prediction_7yeib57l\n" - ] - } - ], - "source": [ - "bq_dataset = bigquery.Dataset(f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}\")\n", - "bq_dataset = bq_client.create_dataset(bq_dataset)\n", - "print(f\"Created dataset {bq_client.project}.{bq_dataset.dataset_id}\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "XG0U5lmfYrNT" - }, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 83, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Select a subset of the original dataset for testing\n", - "PREDICTION_SIZE = 10\n", - "query = f\"\"\"\n", - " SELECT *\n", - " FROM {BQ_TABLE}\n", - " LIMIT {PREDICTION_SIZE} \n", - " \"\"\"\n", - "\n", - "job_config = bigquery.QueryJobConfig(destination=PREDICTION_INPUT_TABLE_ID)\n", - "\n", - "query_job = bq_client.query(query, job_config=job_config) # API request\n", - "query_job.result() # Waits for query to finish" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "4XpPTSFoYCsT" - }, - "source": [ - "### Create the dataset" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "gHOUMfskYIpO" - }, - "outputs": [], - "source": [ - "DATASET_NAME = \"Pen\" + UUID" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "IyXwOcbVYBd1" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "projects/1058599485685/locations/us-central1/datasets/5507798990181105664\n" - ] - } - ], - "source": [ - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=DATASET_NAME,\n", - " bq_source=[f\"bq://{BQ_TABLE}\"],\n", - ")\n", - "\n", - "COLUMN_SPECS = {\n", - " \"year\": \"auto\",\n", - " \"month\": \"auto\",\n", - " \"day\": \"auto\",\n", - "}\n", - "\n", - "label_column = \"mean_temp\"\n", - "\n", - "dataset = aiplatform.TabularDataset(\n", - " \"projects/1058599485685/locations/us-central1/datasets/5507798990181105664\"\n", - ")\n", - "\n", - "print(dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "A-QQkeUnq8Xt" - }, - "source": [ - "## Train the model" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Bxn6ATUXrET6" - }, - "outputs": [], - "source": [ - "MODEL_NAME = \"pen\" + UUID" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3l691PEMZFdA" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], - "source": [ - "training_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=\"pen_training_job\",\n", - " optimization_prediction_type=\"regression\",\n", - " optimization_objective=\"minimize-rmse\",\n", - ")\n", - "\n", - "print(training_job)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "IIfvPCGYyFCT" - }, - "outputs": [], - "source": [ - "model = training_job.run(\n", - " dataset=dataset,\n", - " model_display_name=MODEL_NAME,\n", - " training_fraction_split=0.6,\n", - " validation_fraction_split=0.2,\n", - " test_fraction_split=0.2,\n", - " budget_milli_node_hours=1,\n", - " disable_early_stopping=False,\n", - " target_column=label_column,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "RYjWdtscAmFP" - }, - "source": [ - "## Get Model" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "4Dkk4P_TAlkr" - }, - "outputs": [], - "source": [ - "# model = aiplatform.Model(f'/projects/{PROJECT_ID}/locations/{REGION}/models/2036871678734106624')\n", - "MODEL_ID = \"2241222511826042880\"\n", - "model = aiplatform.Model(model_name=MODEL_ID)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rYirKB_9yaa0" - }, - "source": [ - "### List model eval metrics" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "KkgCdQQAyZP1" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ktMsqtibAUzz" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " # batch_predict_gcs_source_uris: list,\n", - " bigquery_source_input_uri: str,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = \"jsonl\",\n", - " batch_predict_machine_type: str = \"n1-standard-16\",\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_machine_type: str = \"n1-standard-4\",\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_disk_size_gb: int = 50,\n", - " dataflow_service_account: str = \"\",\n", - " dataflow_subnetwork: str = \"\",\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = \"\",\n", - "):\n", - "\n", - " # get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - " get_model_task = evaluation.GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Batch Prediction.\n", - " batch_predict_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs[\"model\"],\n", - " job_display_name=\"model-registry-batch-predict-evaluatio\",\n", - " bigquery_source_input_uri=bigquery_source_input_uri,\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " # # # Run the Batch Explain process (sampler -> batch explanation).\n", - " # data_sampler_task = EvaluationDataSamplerOp(\n", - " # project=project,\n", - " # location=location,\n", - " # root_dir=root_dir,\n", - " # gcs_source_uris=batch_predict_gcs_source_uris,\n", - " # instances_format=batch_predict_instances_format,\n", - " # sample_size=batch_predict_explanation_data_sample_size)\n", - " # batch_explain_task = ModelBatchPredictOp(\n", - " # project=project,\n", - " # location=location,\n", - " # model=get_model_task.outputs['model'],\n", - " # job_display_name='model-registry-batch-explain-evaluation-{{$.pipeline_job_uuid}}-{{$.pipeline_task_uuid}}',\n", - " # gcs_source_uris=data_sampler_task.outputs['gcs_output_directory'],\n", - " # instances_format=batch_predict_instances_format,\n", - " # predictions_format=batch_predict_predictions_format,\n", - " # gcs_destination_output_uri_prefix=root_dir,\n", - " # generate_explanation=True,\n", - " # explanation_parameters=batch_predict_explanation_parameters,\n", - " # explanation_metadata=batch_predict_explanation_metadata,\n", - " # machine_type=batch_predict_machine_type,\n", - " # starting_replica_count=batch_predict_starting_replica_count,\n", - " # max_replica_count=batch_predict_max_replica_count,\n", - " # encryption_spec_key_name=encryption_spec_key_name)\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationRegressionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " ground_truth_column=target_column_name,\n", - " predictions_format=batch_predict_predictions_format,\n", - " predictions_gcs_source=batch_predict_task.outputs[\"gcs_output_directory\"],\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - " # feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " # project=project,\n", - " # location=location,\n", - " # root_dir=root_dir,\n", - " # predictions_format='jsonl',\n", - " # predictions_gcs_source=batch_explain_task\n", - " # .outputs['gcs_output_directory'],\n", - " # dataflow_machine_type=dataflow_machine_type,\n", - " # dataflow_max_workers_num=dataflow_max_num_workers,\n", - " # dataflow_disk_size=dataflow_disk_size_gb,\n", - " # dataflow_service_account=dataflow_service_account,\n", - " # dataflow_subnetwork=dataflow_subnetwork,\n", - " # dataflow_use_public_ips=dataflow_use_public_ips,\n", - " # encryption_spec_key_name=encryption_spec_key_name)\n", - " ModelImportEvaluationOp(\n", - " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", - " # feature_attributions=feature_attribution_task\n", - " # .outputs['feature_attributions'],\n", - " model=get_model_task.outputs[\"model\"],\n", - " dataset_type=batch_predict_instances_format,\n", - " )" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NOvOMTEgCVcW" - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.7/dist-packages/kfp/v2/compiler/compiler.py:1295: FutureWarning: APIs imported from the v1 namespace (e.g. kfp.dsl, kfp.components, etc) will not be supported by the v2 compiler since v2.0.0\n", - " category=FutureWarning,\n" - ] - } - ], - "source": [ - "from kfp.v2 import compiler # noqa: F811\n", - "\n", - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_regression_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "JeSiA6-TSgV8" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pen{UUID}\"\n", - "parameters = {\n", - " \"project\": PROJECT_ID,\n", - " \"location\": REGION,\n", - " \"root_dir\": PIPELINE_ROOT,\n", - " \"prediction_type\": \"regression\",\n", - " \"model_name\": f\"projects/{PROJECT_ID}/locations/{REGION}/models/{MODEL_ID}\",\n", - " \"target_column_name\": label_column,\n", - " \"bigquery_source_input_uri\": f\"bq://{PREDICTION_INPUT_TABLE_ID}\",\n", - " \"batch_predict_instances_format\": \"bigquery\",\n", - "}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pdHib_yUEuEk" - }, - "outputs": [], - "source": [ - "DISPLAY_NAME = \"pen\" + UUID\n", - "\n", - "job = aiplatform.PipelineJob(\n", - " display_name=DISPLAY_NAME,\n", - " template_path=\"tabular_regression_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run()\n", - "\n", - "# ! rm tabular_regression_pipeline.json" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "Click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U2zocUvk2YVs" - }, - "source": [ - "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Molde Evaluation Results" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "mtHA8rhGGQv3" - }, - "outputs": [], - "source": [ - "# _________NOTE_________:\n", - "# this is a sample code from eval team... need to be degbugged or replaced with a\n", - "# better appraoch\n", - "\n", - "from google.cloud import aiplatform_v1\n", - "\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " if (\n", - " (\"model-evaluation\" in task.task_name)\n", - " and (\"model-evaluation-import\" not in task.task_name)\n", - " and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " )\n", - " ):\n", - " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", - " 0\n", - " ] # ['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial:\n", - "\n", - "{TODO: Include commands to delete individual resources below}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete endpoint resource\n", - "! gcloud ai endpoints delete $ENDPOINT_NAME --quiet --region $REGION\n", - "\n", - "# Delete model resource\n", - "! gcloud ai models delete $MODEL_NAME --quiet\n", - "\n", - "# Delete Cloud Storage objects that were created\n", - "! gsutil -m rm -r $JOB_DIR\n", - "\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI regression model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML` \n", + "- Vertex AI `Pipeline`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Import a pre-trained `AutoML model resource` into the pipeline\n", + "- Run a `batch prediction` job\n", + "- Evaulate the AutoML model using the `regression evluation component`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook. You can skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. \n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[33mWARNING: You are using pip version 21.0.1; however, version 22.2.2 is available.\n", + "You should consider upgrading via the '/usr/local/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\n", + "\u001b[33mWARNING: You are using pip version 21.0.1; however, version 22.2.2 is available.\n", + "You should consider upgrading via the '/usr/local/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\n", + "\u001b[33mWARNING: You are using pip version 21.0.1; however, version 22.2.2 is available.\n", + "You should consider upgrading via the '/usr/local/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\n", + "\u001b[33mWARNING: You are using pip version 21.0.1; however, version 22.2.2 is available.\n", + "You should consider upgrading via the '/usr/local/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\n" + ] + } + ], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com). {TODO: Update the APIs needed for your tutorial. Edit the API names, and update the link to append the API IDs, separating each one with a comma. For example, container.googleapis.com,cloudbuild.googleapis.com}\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Compute service account storage object creator and viewer permissions!!!**\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Project ID: vertex-ai-dev\n" + ] + } + ], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Updated property [core/project].\n", + "\n", + "\n", + "Updates are available for some Cloud SDK components. To install them,\n", + "please run:\n", + " $ gcloud components update\n", + "\n", + "\n", + "\n", + "To take a quick anonymous survey, run:\n", + " $ gcloud survey\n", + "\n" + ] + } + ], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Creating gs://vertex-ai-devaip-quar9hbz/...\n" + ] + } + ], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Service Account: 931647533046-compute@developer.gserviceaccount.com\n" + ] } - ], - "metadata": { + ], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for {TODO; e.g., Vertex AI Pipelines}\n", + "\n", + "Run the following commands to grant your service account access to {TODO; i.e., read and write pipeline artifacts} in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "from kfp.v2 import compiler\n", + "from google.cloud import aiplatform_v1\n", + "import matplotlib.pyplot as plt\n", + "import json" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BiVlyW5OUnjK" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "id": "bViYfWfpVAiF" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "id": "20S9En09X0PY", + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Creating TabularDataset\n", + "Create TabularDataset backing LRO: projects/931647533046/locations/us-central1/datasets/9019973380832493568/operations/3060886783697879040\n", + "TabularDataset created. Resource name: projects/931647533046/locations/us-central1/datasets/9019973380832493568\n", + "To use this TabularDataset in another session:\n", + "ds = aiplatform.TabularDataset('projects/931647533046/locations/us-central1/datasets/9019973380832493568')\n", + "Resource name: projects/931647533046/locations/us-central1/datasets/9019973380832493568\n" + ] + } + ], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset\",\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print (\"Resource name:\",dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A-QQkeUnq8Xt" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple regression model using the created dataset using `Age` as the target column. \n", + "\n", + "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "id": "Bxn6ATUXrET6" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if TRAINING_JOB_DISPLAY_NAME == \"\" or \\\n", + " TRAINING_JOB_DISPLAY_NAME is None or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\":\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An AutoML training job is created with the `AutoMLForecastingTrainingJob` class, with the following parameters:\n", + "\n", + "- `display_name`: The human readable name for the `TrainingJob` resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", + "- `column_transformations`: (Optional): Transformations to apply to the input columns\n", + "- `optimization_objective`: The optimization objective to minimize or maximize.\n", + " - `minimize-rmse`\n", + " - `minimize-mae`\n", + " - `minimize-rmsle`" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { "colab": { - "collapsed_sections": [], - "name": "automl_tabular_regression_model_evaluation.ipynb", - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" + "base_uri": "https://localhost:8080/" + }, + "id": "3l691PEMZFdA", + "outputId": "f0757841-1901-408f-a587-28b6f3a4181d" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/usr/local/lib/python3.9/site-packages/google/cloud/aiplatform/training_jobs.py:4562: DeprecationWarning: consider using column_specs instead. column_transformations will be deprecated in the future.\n", + " column_transformations_utils.validate_and_get_column_transformations(\n" + ] + } + ], + "source": [ + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"regression\",\n", + " column_transformations=[\n", + " {\"categorical\": {\"column_name\": \"Type\"}},\n", + " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", + " {\"categorical\": {\"column_name\": \"Gender\"}},\n", + " {\"categorical\": {\"column_name\": \"Color1\"}},\n", + " {\"categorical\": {\"column_name\": \"Color2\"}},\n", + " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", + " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", + " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", + " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", + " {\"categorical\": {\"column_name\": \"Health\"}},\n", + " {\"numeric\": {\"column_name\": \"Fee\"}},\n", + " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", + " {\"categorical\": {\"column_name\": \"Adopted\"}},\n", + "\n", + " ], \n", + " optimization_objective=\"minimize-rmse\"\n", + ")\n", + "\n", + "print(train_job)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if MODEL_DISPLAY_NAME == \"\" or \\\n", + " MODEL_DISPLAY_NAME is None or MODEL_DISPLAY_NAME == \"[your-model-display-name]\":\n", + " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, you start the training job by invoking the method `run`, with the following parameters:\n", + "\n", + "- `dataset`: The `Dataset` resource to train the model.\n", + "- `target_column`: The name of the column values of which the Model is to predict.\n", + "- `training_fraction_split`: The percentage of the dataset to use for training.\n", + "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", + "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", + "- `model_display_name`: The human readable name for the trained model.\n", + "- `disable_early_stopping`: If true, the entire budget is used.\n", + "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IIfvPCGYyFCT", + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "View Training:\n", + "https://console.cloud.google.com/ai/platform/locations/us-central1/training/4882406946784673792?project=931647533046\n", + "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", + "PipelineState.PIPELINE_STATE_RUNNING\n", + "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", + "PipelineState.PIPELINE_STATE_RUNNING\n", + "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", + "PipelineState.PIPELINE_STATE_RUNNING\n", + "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", + "PipelineState.PIPELINE_STATE_RUNNING\n", + "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", + "PipelineState.PIPELINE_STATE_RUNNING\n", + "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", + "PipelineState.PIPELINE_STATE_RUNNING\n", + "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", + "PipelineState.PIPELINE_STATE_RUNNING\n", + "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", + "PipelineState.PIPELINE_STATE_RUNNING\n", + "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", + "PipelineState.PIPELINE_STATE_RUNNING\n", + "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", + "PipelineState.PIPELINE_STATE_RUNNING\n", + "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", + "PipelineState.PIPELINE_STATE_RUNNING\n", + "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", + "PipelineState.PIPELINE_STATE_RUNNING\n", + "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", + "PipelineState.PIPELINE_STATE_RUNNING\n", + "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", + "PipelineState.PIPELINE_STATE_RUNNING\n" + ] } + ], + "source": [ + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=\"Age\",\n", + " training_fraction_split=0.8,\n", + " validation_fraction_split=0.1,\n", + " test_fraction_split=0.1,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " disable_early_stopping=False,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rYirKB_9yaa0" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ktMsqtibAUzz" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(\n", + " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = 'jsonl',\n", + " batch_predict_machine_type: str = 'n1-standard-4',\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_machine_type: str = 'n1-standard-4',\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_disk_size_gb: int = 50,\n", + " dataflow_service_account: str = '',\n", + " dataflow_subnetwork: str = '',\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = ''):\n", + " \n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " GetVertexModelOp,\n", + " EvaluationDataSamplerOp,\n", + " ModelEvaluationRegressionOp, \n", + " ModelEvaluationFeatureAttributionOp,\n", + " ModelImportEvaluationOp\n", + " )\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " \n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + " \n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size\n", + " )\n", + " \n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs['model'],\n", + " job_display_name='model-registry-batch-predict-evaluation',\n", + " gcs_source_uris= data_sampler_task.outputs['gcs_output_directory'],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", + " predictions_format=batch_predict_predictions_format,\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name\n", + " )\n", + " \n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format='jsonl',\n", + " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " problem_type=prediction_type,\n", + " regression_metrics=eval_task.outputs['evaluation_metrics'],\n", + " feature_attributions=feature_attribution_task.outputs['feature_attributions'],\n", + " model=get_model_task.outputs['model'],\n", + " dataset_type=batch_predict_instances_format\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_regression_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "NOvOMTEgCVcW", + "outputId": "e9918c90-562c-4c1a-ce37-ffce17cb0af7" + }, + "outputs": [], + "source": [ + "\n", + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\" or \\\n", + " PIPELINE_DISPLAY_NAME == \"\" or PIPELINE_DISPLAY_NAME is None:\n", + " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", + "- `prediction_type`: Type of the prediction (In this tutorial, it is \"regression\").\n", + "- `target_column_name`: Name of the column to be used as the target for regression.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", + "parameters = {\n", + " 'project':PROJECT_ID,\n", + " 'location':REGION,\n", + " 'root_dir':PIPELINE_ROOT,\n", + " 'model_name':model.resource_name,\n", + " 'prediction_type':\"regression\",\n", + " 'target_column_name':\"Age\",\n", + " 'batch_predict_gcs_source_uris':[DATA_SOURCE],\n", + " 'batch_predict_instances_format':'csv',\n", + " 'batch_predict_explanation_data_sample_size': 3000\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run the pipeline using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pdHib_yUEuEk" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=DISPLAY_NAME,\n", + " template_path=\"tabular_regression_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U2zocUvk2YVs" + }, + "source": [ + "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mtHA8rhGGQv3", + "tags": [] + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if ((\"model-evaluation\" in task.task_name) and\n", + " (\"model-evaluation-import\" not in task.task_name) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", + " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the metrics\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " if i[0]==\"meanAbsolutePercentageError\": #we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", + " continue\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(10,5))\n", + "plt.bar(x=metrics,height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if ((task.task_name == \"feature-attribution\" ) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", + " feat_attrs = task.outputs.get('feature_attributions').artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print (feat_attrs)\n", + "print (feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print (attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + " \n", + "plt.figure(figsize=(5,3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "community-automl-regression.ipynb", + "provenance": [] + }, + "environment": { + "kernel": "python3", + "name": "common-cpu.m94", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/base-cpu:m94" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" }, - "nbformat": 4, - "nbformat_minor": 0 + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 } From 89862f5fd8527ed7eccbcc32aa346d225478163e Mon Sep 17 00:00:00 2001 From: sudarshan-SpringML Date: Fri, 2 Sep 2022 13:25:40 +0000 Subject: [PATCH 10/34] renamed pipeline displayname to resolve error --- .../automl_tabular_regression_model_evaluation.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index 7868f55db..340fd875f 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -1313,7 +1313,7 @@ "outputs": [], "source": [ "job = aiplatform.PipelineJob(\n", - " display_name=DISPLAY_NAME,\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", " template_path=\"tabular_regression_pipeline.json\",\n", " parameter_values=parameters,\n", " enable_caching=True,\n", From 93436427827a69d5aec5a6e0b706d871bfe758e0 Mon Sep 17 00:00:00 2001 From: Soheila Zangeneh Date: Mon, 29 Aug 2022 11:19:14 -0400 Subject: [PATCH 11/34] Add automl regression model eval first draft --- ..._tabular_regression_model evaluation.ipynb | 1273 +++++++++++++++++ ..._tabular_regression_model_evaluation.ipynb | 1273 +++++++++++++++++ 2 files changed, 2546 insertions(+) create mode 100644 notebooks/community/model_evaluation/automl_tabular_regression_model evaluation.ipynb create mode 100644 notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model evaluation.ipynb new file mode 100644 index 000000000..9eba72c37 --- /dev/null +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model evaluation.ipynb @@ -0,0 +1,1273 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI regression model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML` \n", + "- Vertex AI `Pipeline`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Import a pre-trained `AutoML model resource` into the pipeline\n", + "- Run a `batch prediction` job\n", + "- Evaulate the AutoML model using the `regression evluation component`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "{TODO: Include a paragraph with Dataset information and where to obtain it.} \n", + "\n", + "{TODO: Make sure the dataset is accessible to the public. **Googlers**: Add your dataset to the [public samples bucket](http://goto/cloudsamples#sample-storage-bucket) within gs://cloud-samples-data/vertex-ai, if it doesn't already exist there.}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "\n", + "{TODO: Update the list of billable products that your tutorial uses.}\n", + "\n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "{TODO: Include links to pricing documentation for each product you listed above.}\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook. You can skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. \n", + "\n", + "{TODO: Suggest using the latest major GA version of each package; i.e., --upgrade}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17\n", + "! pip3 install $USER kfp google-cloud-pipeline-components --upgrade -q\n", + "# TODO: Add remaining package installs here" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com). {TODO: Update the APIs needed for your tutorial. Edit the API names, and update the link to append the API IDs, separating each one with a comma. For example, container.googleapis.com,cloudbuild.googleapis.com}\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Compute service account storage object creator and viewer permissions!!!**\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n", + "\n", + "{TODO: replace the `TIMESTAMP` with `UUID` in official notebooks}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "\n", + "{TODO: Adjust wording in the first paragraph to fit your use case - explain how your tutorial uses the Cloud Storage bucket. The example below shows how Vertex AI uses the bucket for training.}\n", + "\n", + "When you submit a training job using the Cloud SDK, you upload a Python package\n", + "containing your training code to a Cloud Storage bucket. Vertex AI runs\n", + "the code from this package. In this tutorial, Vertex AI also saves the\n", + "trained model that results from your job in the same bucket. Using this model artifact, you can then\n", + "create Vertex AI model and endpoint resources in order to serve\n", + "online predictions.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all\n", + "Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"model-eval-testaip-7yeib57l\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account {TODO: Include these cells if the notebook specifies a service account}\n", + "\n", + "{TODO: What uses service account in the notebook; e.g., You use a service account to create Vertex AI Pipeline jobs.}. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for {TODO; e.g., Vertex AI Pipelines}\n", + "\n", + "Run the following commands to grant your service account access to {TODO; i.e., read and write pipeline artifacts} in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import google.cloud.aiplatform as aiplatform\n", + "from google_cloud_pipeline_components.experimental.evaluation import (\n", + " ModelEvaluationRegressionOp, \n", + " ModelEvaluationFeatureAttributionOp,\n", + " ModelImportEvaluationOp)\n", + "from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + "from google_cloud_pipeline_components.experimental import evaluation\n", + "import kfp" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Dataset\n", + "\n", + "We use this bigquery table for training" + ], + "metadata": { + "id": "BiVlyW5OUnjK" + } + }, + { + "cell_type": "code", + "source": [ + "# Define BigQuery table to be used for training\n", + "\n", + "BQ_TABLE = \"bigquery-public-data.samples.gsod\"" + ], + "metadata": { + "id": "bViYfWfpVAiF" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "\n", + "from google.cloud import bigquery\n", + "\n", + "# Create client in default region\n", + "bq_client = bigquery.Client(\n", + " project=PROJECT_ID,\n", + " credentials=aiplatform.initializer.global_config.credentials,\n", + ")\n" + ], + "metadata": { + "id": "20S9En09X0PY" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Create test dataset in default region\n", + "PREDICTION_INPUT_DATASET_ID = f\"gsod_prediction_{UUID}\"\n", + "PREDICTION_INPUT_TABLE_ID = f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}.prediction\"" + ], + "metadata": { + "id": "KvRQNKhEmGHs" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "bq_dataset = bigquery.Dataset(f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}\")\n", + "bq_dataset = bq_client.create_dataset(bq_dataset)\n", + "print(f\"Created dataset {bq_client.project}.{bq_dataset.dataset_id}\")" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "cw3n0ftZYZ_h", + "outputId": "143f5d75-c324-4ebc-efba-12b51fdc47c9" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Created dataset model-eval-test.gsod_prediction_7yeib57l\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "# Select a subset of the original dataset for testing\n", + "PREDICTION_SIZE = 10\n", + "query = f\"\"\"\n", + " SELECT *\n", + " FROM {BQ_TABLE}\n", + " LIMIT {PREDICTION_SIZE} \n", + " \"\"\"\n", + "\n", + "job_config = bigquery.QueryJobConfig(destination=PREDICTION_INPUT_TABLE_ID)\n", + "\n", + "query_job = bq_client.query(query, job_config=job_config) # API request\n", + "query_job.result() # Waits for query to finish" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "XG0U5lmfYrNT", + "outputId": "03dccbd9-7405-48c1-f9b6-2bec0872df37" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": {}, + "execution_count": 83 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Create the dataset" + ], + "metadata": { + "id": "4XpPTSFoYCsT" + } + }, + { + "cell_type": "code", + "source": [ + "DATASET_NAME = \"Pen\"+UUID" + ], + "metadata": { + "id": "gHOUMfskYIpO" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=DATASET_NAME,\n", + " bq_source=[f\"bq://{BQ_TABLE}\"],\n", + ")\n", + "\n", + "COLUMN_SPECS = {\n", + " \"year\": \"auto\",\n", + " \"month\": \"auto\",\n", + " \"day\": \"auto\",\n", + "}\n", + "\n", + "label_column = \"mean_temp\"\n", + "\n", + "dataset = aiplatform.TabularDataset('projects/1058599485685/locations/us-central1/datasets/5507798990181105664')\n", + "\n", + "print(dataset.resource_name)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IyXwOcbVYBd1", + "outputId": "cad8909b-8285-403f-d8ec-c304a577fb23" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "projects/1058599485685/locations/us-central1/datasets/5507798990181105664\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Train the model" + ], + "metadata": { + "id": "A-QQkeUnq8Xt" + } + }, + { + "cell_type": "code", + "source": [ + "MODEL_NAME = \"pen\" + UUID" + ], + "metadata": { + "id": "Bxn6ATUXrET6" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "training_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name= \"pen_training_job\",\n", + " optimization_prediction_type=\"regression\",\n", + " optimization_objective=\"minimize-rmse\"\n", + ")\n", + "\n", + "print(training_job)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "3l691PEMZFdA", + "outputId": "f0757841-1901-408f-a587-28b6f3a4181d" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "model = training_job.run(\n", + " dataset=dataset,\n", + " model_display_name=MODEL_NAME,\n", + " training_fraction_split=0.6,\n", + " validation_fraction_split=0.2,\n", + " test_fraction_split=0.2,\n", + " budget_milli_node_hours=1,\n", + " disable_early_stopping=False,\n", + " target_column=label_column,\n", + ")" + ], + "metadata": { + "id": "IIfvPCGYyFCT" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Get Model" + ], + "metadata": { + "id": "RYjWdtscAmFP" + } + }, + { + "cell_type": "code", + "source": [ + "# model = aiplatform.Model(f'/projects/{PROJECT_ID}/locations/{REGION}/models/2036871678734106624')\n", + "MODEL_ID = '2241222511826042880'\n", + "model = aiplatform.Model(model_name=MODEL_ID)" + ], + "metadata": { + "id": "4Dkk4P_TAlkr" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### List model eval metrics" + ], + "metadata": { + "id": "rYirKB_9yaa0" + } + }, + { + "cell_type": "code", + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ], + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Model Evaluation" + ], + "metadata": { + "id": "ce6beLsXASnK" + } + }, + { + "cell_type": "code", + "source": [ + "@kfp.dsl.pipeline(\n", + " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " # batch_predict_gcs_source_uris: list,\n", + " bigquery_source_input_uri: str,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = 'jsonl',\n", + " batch_predict_machine_type: str = 'n1-standard-16',\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_machine_type: str = 'n1-standard-4',\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_disk_size_gb: int = 50,\n", + " dataflow_service_account: str = '',\n", + " dataflow_subnetwork: str = '',\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = ''):\n", + " \n", + " # get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + " get_model_task = evaluation.GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Batch Prediction.\n", + " batch_predict_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs['model'],\n", + " job_display_name='model-registry-batch-predict-evaluatio',\n", + " bigquery_source_input_uri=bigquery_source_input_uri,\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " encryption_spec_key_name=encryption_spec_key_name)\n", + "\n", + " # # # Run the Batch Explain process (sampler -> batch explanation).\n", + " # data_sampler_task = EvaluationDataSamplerOp(\n", + " # project=project,\n", + " # location=location,\n", + " # root_dir=root_dir,\n", + " # gcs_source_uris=batch_predict_gcs_source_uris,\n", + " # instances_format=batch_predict_instances_format,\n", + " # sample_size=batch_predict_explanation_data_sample_size)\n", + " # batch_explain_task = ModelBatchPredictOp(\n", + " # project=project,\n", + " # location=location,\n", + " # model=get_model_task.outputs['model'],\n", + " # job_display_name='model-registry-batch-explain-evaluation-{{$.pipeline_job_uuid}}-{{$.pipeline_task_uuid}}',\n", + " # gcs_source_uris=data_sampler_task.outputs['gcs_output_directory'],\n", + " # instances_format=batch_predict_instances_format,\n", + " # predictions_format=batch_predict_predictions_format,\n", + " # gcs_destination_output_uri_prefix=root_dir,\n", + " # generate_explanation=True,\n", + " # explanation_parameters=batch_predict_explanation_parameters,\n", + " # explanation_metadata=batch_predict_explanation_metadata,\n", + " # machine_type=batch_predict_machine_type,\n", + " # starting_replica_count=batch_predict_starting_replica_count,\n", + " # max_replica_count=batch_predict_max_replica_count,\n", + " # encryption_spec_key_name=encryption_spec_key_name)\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_format=batch_predict_predictions_format,\n", + " predictions_gcs_source=batch_predict_task\n", + " .outputs['gcs_output_directory'],\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name)\n", + " # feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " # project=project,\n", + " # location=location,\n", + " # root_dir=root_dir,\n", + " # predictions_format='jsonl',\n", + " # predictions_gcs_source=batch_explain_task\n", + " # .outputs['gcs_output_directory'],\n", + " # dataflow_machine_type=dataflow_machine_type,\n", + " # dataflow_max_workers_num=dataflow_max_num_workers,\n", + " # dataflow_disk_size=dataflow_disk_size_gb,\n", + " # dataflow_service_account=dataflow_service_account,\n", + " # dataflow_subnetwork=dataflow_subnetwork,\n", + " # dataflow_use_public_ips=dataflow_use_public_ips,\n", + " # encryption_spec_key_name=encryption_spec_key_name)\n", + " ModelImportEvaluationOp(\n", + " regression_metrics=eval_task.outputs['evaluation_metrics'],\n", + " # feature_attributions=feature_attribution_task\n", + " # .outputs['feature_attributions'],\n", + " model=get_model_task.outputs['model'],\n", + " dataset_type=batch_predict_instances_format)" + ], + "metadata": { + "id": "ktMsqtibAUzz" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "from kfp.v2 import compiler # noqa: F811\n", + "\n", + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" + ], + "metadata": { + "id": "NOvOMTEgCVcW", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "e9918c90-562c-4c1a-ce37-ffce17cb0af7" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.7/dist-packages/kfp/v2/compiler/compiler.py:1295: FutureWarning: APIs imported from the v1 namespace (e.g. kfp.dsl, kfp.components, etc) will not be supported by the v2 compiler since v2.0.0\n", + " category=FutureWarning,\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pen{UUID}\"\n", + "parameters = {\n", + " 'project':PROJECT_ID,\n", + " 'location':REGION,\n", + " 'root_dir':PIPELINE_ROOT,\n", + " 'prediction_type':'regression',\n", + " 'model_name':f'projects/{PROJECT_ID}/locations/{REGION}/models/{MODEL_ID}',\n", + " 'target_column_name':label_column,\n", + " 'bigquery_source_input_uri':f\"bq://{PREDICTION_INPUT_TABLE_ID}\",\n", + " 'batch_predict_instances_format':'bigquery',\n", + " }" + ], + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "DISPLAY_NAME = \"pen\" + UUID\n", + "\n", + "job = aiplatform.PipelineJob(\n", + " display_name=DISPLAY_NAME,\n", + " template_path=\"tabular_regression_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run()\n", + "\n", + "# ! rm tabular_regression_pipeline.json" + ], + "metadata": { + "id": "pdHib_yUEuEk" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ], + "metadata": { + "id": "mKRTDi8ioXBY" + } + }, + { + "cell_type": "markdown", + "source": [ + "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" + ], + "metadata": { + "id": "U2zocUvk2YVs" + } + }, + { + "cell_type": "markdown", + "source": [ + "### Molde Evaluation Results" + ], + "metadata": { + "id": "XcKaONSsGNC4" + } + }, + { + "cell_type": "code", + "source": [ + "# _________NOTE_________:\n", + "#this is a sample code from eval team... need to be degbugged or replaced with a \n", + "# better appraoch\n", + "\n", + "from google.cloud import aiplatform_v1\n", + "\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " if ((\"model-evaluation\" in task.task_name) and\n", + " (\"model-evaluation-import\" not in task.task_name) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED)):\n", + " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ], + "metadata": { + "id": "mtHA8rhGGQv3" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial:\n", + "\n", + "{TODO: Include commands to delete individual resources below}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete endpoint resource\n", + "! gcloud ai endpoints delete $ENDPOINT_NAME --quiet --region $REGION\n", + "\n", + "# Delete model resource\n", + "! gcloud ai models delete $MODEL_NAME --quiet\n", + "\n", + "# Delete Cloud Storage objects that were created\n", + "! gsutil -m rm -r $JOB_DIR\n", + "\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "community-automl-regression.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb new file mode 100644 index 000000000..9eba72c37 --- /dev/null +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -0,0 +1,1273 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI regression model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML` \n", + "- Vertex AI `Pipeline`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Import a pre-trained `AutoML model resource` into the pipeline\n", + "- Run a `batch prediction` job\n", + "- Evaulate the AutoML model using the `regression evluation component`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "{TODO: Include a paragraph with Dataset information and where to obtain it.} \n", + "\n", + "{TODO: Make sure the dataset is accessible to the public. **Googlers**: Add your dataset to the [public samples bucket](http://goto/cloudsamples#sample-storage-bucket) within gs://cloud-samples-data/vertex-ai, if it doesn't already exist there.}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "\n", + "{TODO: Update the list of billable products that your tutorial uses.}\n", + "\n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "{TODO: Include links to pricing documentation for each product you listed above.}\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook. You can skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. \n", + "\n", + "{TODO: Suggest using the latest major GA version of each package; i.e., --upgrade}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17\n", + "! pip3 install $USER kfp google-cloud-pipeline-components --upgrade -q\n", + "# TODO: Add remaining package installs here" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com). {TODO: Update the APIs needed for your tutorial. Edit the API names, and update the link to append the API IDs, separating each one with a comma. For example, container.googleapis.com,cloudbuild.googleapis.com}\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Compute service account storage object creator and viewer permissions!!!**\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n", + "\n", + "{TODO: replace the `TIMESTAMP` with `UUID` in official notebooks}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "\n", + "{TODO: Adjust wording in the first paragraph to fit your use case - explain how your tutorial uses the Cloud Storage bucket. The example below shows how Vertex AI uses the bucket for training.}\n", + "\n", + "When you submit a training job using the Cloud SDK, you upload a Python package\n", + "containing your training code to a Cloud Storage bucket. Vertex AI runs\n", + "the code from this package. In this tutorial, Vertex AI also saves the\n", + "trained model that results from your job in the same bucket. Using this model artifact, you can then\n", + "create Vertex AI model and endpoint resources in order to serve\n", + "online predictions.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all\n", + "Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"model-eval-testaip-7yeib57l\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account {TODO: Include these cells if the notebook specifies a service account}\n", + "\n", + "{TODO: What uses service account in the notebook; e.g., You use a service account to create Vertex AI Pipeline jobs.}. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for {TODO; e.g., Vertex AI Pipelines}\n", + "\n", + "Run the following commands to grant your service account access to {TODO; i.e., read and write pipeline artifacts} in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import google.cloud.aiplatform as aiplatform\n", + "from google_cloud_pipeline_components.experimental.evaluation import (\n", + " ModelEvaluationRegressionOp, \n", + " ModelEvaluationFeatureAttributionOp,\n", + " ModelImportEvaluationOp)\n", + "from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + "from google_cloud_pipeline_components.experimental import evaluation\n", + "import kfp" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Dataset\n", + "\n", + "We use this bigquery table for training" + ], + "metadata": { + "id": "BiVlyW5OUnjK" + } + }, + { + "cell_type": "code", + "source": [ + "# Define BigQuery table to be used for training\n", + "\n", + "BQ_TABLE = \"bigquery-public-data.samples.gsod\"" + ], + "metadata": { + "id": "bViYfWfpVAiF" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "\n", + "from google.cloud import bigquery\n", + "\n", + "# Create client in default region\n", + "bq_client = bigquery.Client(\n", + " project=PROJECT_ID,\n", + " credentials=aiplatform.initializer.global_config.credentials,\n", + ")\n" + ], + "metadata": { + "id": "20S9En09X0PY" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Create test dataset in default region\n", + "PREDICTION_INPUT_DATASET_ID = f\"gsod_prediction_{UUID}\"\n", + "PREDICTION_INPUT_TABLE_ID = f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}.prediction\"" + ], + "metadata": { + "id": "KvRQNKhEmGHs" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "bq_dataset = bigquery.Dataset(f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}\")\n", + "bq_dataset = bq_client.create_dataset(bq_dataset)\n", + "print(f\"Created dataset {bq_client.project}.{bq_dataset.dataset_id}\")" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "cw3n0ftZYZ_h", + "outputId": "143f5d75-c324-4ebc-efba-12b51fdc47c9" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Created dataset model-eval-test.gsod_prediction_7yeib57l\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "# Select a subset of the original dataset for testing\n", + "PREDICTION_SIZE = 10\n", + "query = f\"\"\"\n", + " SELECT *\n", + " FROM {BQ_TABLE}\n", + " LIMIT {PREDICTION_SIZE} \n", + " \"\"\"\n", + "\n", + "job_config = bigquery.QueryJobConfig(destination=PREDICTION_INPUT_TABLE_ID)\n", + "\n", + "query_job = bq_client.query(query, job_config=job_config) # API request\n", + "query_job.result() # Waits for query to finish" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "XG0U5lmfYrNT", + "outputId": "03dccbd9-7405-48c1-f9b6-2bec0872df37" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": {}, + "execution_count": 83 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Create the dataset" + ], + "metadata": { + "id": "4XpPTSFoYCsT" + } + }, + { + "cell_type": "code", + "source": [ + "DATASET_NAME = \"Pen\"+UUID" + ], + "metadata": { + "id": "gHOUMfskYIpO" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=DATASET_NAME,\n", + " bq_source=[f\"bq://{BQ_TABLE}\"],\n", + ")\n", + "\n", + "COLUMN_SPECS = {\n", + " \"year\": \"auto\",\n", + " \"month\": \"auto\",\n", + " \"day\": \"auto\",\n", + "}\n", + "\n", + "label_column = \"mean_temp\"\n", + "\n", + "dataset = aiplatform.TabularDataset('projects/1058599485685/locations/us-central1/datasets/5507798990181105664')\n", + "\n", + "print(dataset.resource_name)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IyXwOcbVYBd1", + "outputId": "cad8909b-8285-403f-d8ec-c304a577fb23" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "projects/1058599485685/locations/us-central1/datasets/5507798990181105664\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Train the model" + ], + "metadata": { + "id": "A-QQkeUnq8Xt" + } + }, + { + "cell_type": "code", + "source": [ + "MODEL_NAME = \"pen\" + UUID" + ], + "metadata": { + "id": "Bxn6ATUXrET6" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "training_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name= \"pen_training_job\",\n", + " optimization_prediction_type=\"regression\",\n", + " optimization_objective=\"minimize-rmse\"\n", + ")\n", + "\n", + "print(training_job)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "3l691PEMZFdA", + "outputId": "f0757841-1901-408f-a587-28b6f3a4181d" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "model = training_job.run(\n", + " dataset=dataset,\n", + " model_display_name=MODEL_NAME,\n", + " training_fraction_split=0.6,\n", + " validation_fraction_split=0.2,\n", + " test_fraction_split=0.2,\n", + " budget_milli_node_hours=1,\n", + " disable_early_stopping=False,\n", + " target_column=label_column,\n", + ")" + ], + "metadata": { + "id": "IIfvPCGYyFCT" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Get Model" + ], + "metadata": { + "id": "RYjWdtscAmFP" + } + }, + { + "cell_type": "code", + "source": [ + "# model = aiplatform.Model(f'/projects/{PROJECT_ID}/locations/{REGION}/models/2036871678734106624')\n", + "MODEL_ID = '2241222511826042880'\n", + "model = aiplatform.Model(model_name=MODEL_ID)" + ], + "metadata": { + "id": "4Dkk4P_TAlkr" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### List model eval metrics" + ], + "metadata": { + "id": "rYirKB_9yaa0" + } + }, + { + "cell_type": "code", + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ], + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Model Evaluation" + ], + "metadata": { + "id": "ce6beLsXASnK" + } + }, + { + "cell_type": "code", + "source": [ + "@kfp.dsl.pipeline(\n", + " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " # batch_predict_gcs_source_uris: list,\n", + " bigquery_source_input_uri: str,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = 'jsonl',\n", + " batch_predict_machine_type: str = 'n1-standard-16',\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_machine_type: str = 'n1-standard-4',\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_disk_size_gb: int = 50,\n", + " dataflow_service_account: str = '',\n", + " dataflow_subnetwork: str = '',\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = ''):\n", + " \n", + " # get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + " get_model_task = evaluation.GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Batch Prediction.\n", + " batch_predict_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs['model'],\n", + " job_display_name='model-registry-batch-predict-evaluatio',\n", + " bigquery_source_input_uri=bigquery_source_input_uri,\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " encryption_spec_key_name=encryption_spec_key_name)\n", + "\n", + " # # # Run the Batch Explain process (sampler -> batch explanation).\n", + " # data_sampler_task = EvaluationDataSamplerOp(\n", + " # project=project,\n", + " # location=location,\n", + " # root_dir=root_dir,\n", + " # gcs_source_uris=batch_predict_gcs_source_uris,\n", + " # instances_format=batch_predict_instances_format,\n", + " # sample_size=batch_predict_explanation_data_sample_size)\n", + " # batch_explain_task = ModelBatchPredictOp(\n", + " # project=project,\n", + " # location=location,\n", + " # model=get_model_task.outputs['model'],\n", + " # job_display_name='model-registry-batch-explain-evaluation-{{$.pipeline_job_uuid}}-{{$.pipeline_task_uuid}}',\n", + " # gcs_source_uris=data_sampler_task.outputs['gcs_output_directory'],\n", + " # instances_format=batch_predict_instances_format,\n", + " # predictions_format=batch_predict_predictions_format,\n", + " # gcs_destination_output_uri_prefix=root_dir,\n", + " # generate_explanation=True,\n", + " # explanation_parameters=batch_predict_explanation_parameters,\n", + " # explanation_metadata=batch_predict_explanation_metadata,\n", + " # machine_type=batch_predict_machine_type,\n", + " # starting_replica_count=batch_predict_starting_replica_count,\n", + " # max_replica_count=batch_predict_max_replica_count,\n", + " # encryption_spec_key_name=encryption_spec_key_name)\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_format=batch_predict_predictions_format,\n", + " predictions_gcs_source=batch_predict_task\n", + " .outputs['gcs_output_directory'],\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name)\n", + " # feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " # project=project,\n", + " # location=location,\n", + " # root_dir=root_dir,\n", + " # predictions_format='jsonl',\n", + " # predictions_gcs_source=batch_explain_task\n", + " # .outputs['gcs_output_directory'],\n", + " # dataflow_machine_type=dataflow_machine_type,\n", + " # dataflow_max_workers_num=dataflow_max_num_workers,\n", + " # dataflow_disk_size=dataflow_disk_size_gb,\n", + " # dataflow_service_account=dataflow_service_account,\n", + " # dataflow_subnetwork=dataflow_subnetwork,\n", + " # dataflow_use_public_ips=dataflow_use_public_ips,\n", + " # encryption_spec_key_name=encryption_spec_key_name)\n", + " ModelImportEvaluationOp(\n", + " regression_metrics=eval_task.outputs['evaluation_metrics'],\n", + " # feature_attributions=feature_attribution_task\n", + " # .outputs['feature_attributions'],\n", + " model=get_model_task.outputs['model'],\n", + " dataset_type=batch_predict_instances_format)" + ], + "metadata": { + "id": "ktMsqtibAUzz" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "from kfp.v2 import compiler # noqa: F811\n", + "\n", + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" + ], + "metadata": { + "id": "NOvOMTEgCVcW", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "e9918c90-562c-4c1a-ce37-ffce17cb0af7" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.7/dist-packages/kfp/v2/compiler/compiler.py:1295: FutureWarning: APIs imported from the v1 namespace (e.g. kfp.dsl, kfp.components, etc) will not be supported by the v2 compiler since v2.0.0\n", + " category=FutureWarning,\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pen{UUID}\"\n", + "parameters = {\n", + " 'project':PROJECT_ID,\n", + " 'location':REGION,\n", + " 'root_dir':PIPELINE_ROOT,\n", + " 'prediction_type':'regression',\n", + " 'model_name':f'projects/{PROJECT_ID}/locations/{REGION}/models/{MODEL_ID}',\n", + " 'target_column_name':label_column,\n", + " 'bigquery_source_input_uri':f\"bq://{PREDICTION_INPUT_TABLE_ID}\",\n", + " 'batch_predict_instances_format':'bigquery',\n", + " }" + ], + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "DISPLAY_NAME = \"pen\" + UUID\n", + "\n", + "job = aiplatform.PipelineJob(\n", + " display_name=DISPLAY_NAME,\n", + " template_path=\"tabular_regression_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run()\n", + "\n", + "# ! rm tabular_regression_pipeline.json" + ], + "metadata": { + "id": "pdHib_yUEuEk" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ], + "metadata": { + "id": "mKRTDi8ioXBY" + } + }, + { + "cell_type": "markdown", + "source": [ + "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" + ], + "metadata": { + "id": "U2zocUvk2YVs" + } + }, + { + "cell_type": "markdown", + "source": [ + "### Molde Evaluation Results" + ], + "metadata": { + "id": "XcKaONSsGNC4" + } + }, + { + "cell_type": "code", + "source": [ + "# _________NOTE_________:\n", + "#this is a sample code from eval team... need to be degbugged or replaced with a \n", + "# better appraoch\n", + "\n", + "from google.cloud import aiplatform_v1\n", + "\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " if ((\"model-evaluation\" in task.task_name) and\n", + " (\"model-evaluation-import\" not in task.task_name) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED)):\n", + " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ], + "metadata": { + "id": "mtHA8rhGGQv3" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial:\n", + "\n", + "{TODO: Include commands to delete individual resources below}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete endpoint resource\n", + "! gcloud ai endpoints delete $ENDPOINT_NAME --quiet --region $REGION\n", + "\n", + "# Delete model resource\n", + "! gcloud ai models delete $MODEL_NAME --quiet\n", + "\n", + "# Delete Cloud Storage objects that were created\n", + "! gsutil -m rm -r $JOB_DIR\n", + "\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "community-automl-regression.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file From aa60f888d29d296218abc7e8437ccbe6a7f015a5 Mon Sep 17 00:00:00 2001 From: Soheila Zangeneh Date: Mon, 29 Aug 2022 12:03:26 -0400 Subject: [PATCH 12/34] Remove extra file --- ..._tabular_regression_model evaluation.ipynb | 1273 ----------------- 1 file changed, 1273 deletions(-) delete mode 100644 notebooks/community/model_evaluation/automl_tabular_regression_model evaluation.ipynb diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model evaluation.ipynb deleted file mode 100644 index 9eba72c37..000000000 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model evaluation.ipynb +++ /dev/null @@ -1,1273 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI regression model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `AutoML` \n", - "- Vertex AI `Pipeline`\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Import a pre-trained `AutoML model resource` into the pipeline\n", - "- Run a `batch prediction` job\n", - "- Evaulate the AutoML model using the `regression evluation component`" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "{TODO: Include a paragraph with Dataset information and where to obtain it.} \n", - "\n", - "{TODO: Make sure the dataset is accessible to the public. **Googlers**: Add your dataset to the [public samples bucket](http://goto/cloudsamples#sample-storage-bucket) within gs://cloud-samples-data/vertex-ai, if it doesn't already exist there.}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "\n", - "{TODO: Update the list of billable products that your tutorial uses.}\n", - "\n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "{TODO: Include links to pricing documentation for each product you listed above.}\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook. You can skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. \n", - "\n", - "{TODO: Suggest using the latest major GA version of each package; i.e., --upgrade}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17\n", - "! pip3 install $USER kfp google-cloud-pipeline-components --upgrade -q\n", - "# TODO: Add remaining package installs here" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com). {TODO: Update the APIs needed for your tutorial. Edit the API names, and update the link to append the API IDs, separating each one with a comma. For example, container.googleapis.com,cloudbuild.googleapis.com}\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Compute service account storage object creator and viewer permissions!!!**\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n", - "\n", - "{TODO: replace the `TIMESTAMP` with `UUID` in official notebooks}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "\n", - "{TODO: Adjust wording in the first paragraph to fit your use case - explain how your tutorial uses the Cloud Storage bucket. The example below shows how Vertex AI uses the bucket for training.}\n", - "\n", - "When you submit a training job using the Cloud SDK, you upload a Python package\n", - "containing your training code to a Cloud Storage bucket. Vertex AI runs\n", - "the code from this package. In this tutorial, Vertex AI also saves the\n", - "trained model that results from your job in the same bucket. Using this model artifact, you can then\n", - "create Vertex AI model and endpoint resources in order to serve\n", - "online predictions.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all\n", - "Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"model-eval-testaip-7yeib57l\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account {TODO: Include these cells if the notebook specifies a service account}\n", - "\n", - "{TODO: What uses service account in the notebook; e.g., You use a service account to create Vertex AI Pipeline jobs.}. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for {TODO; e.g., Vertex AI Pipelines}\n", - "\n", - "Run the following commands to grant your service account access to {TODO; i.e., read and write pipeline artifacts} in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import google.cloud.aiplatform as aiplatform\n", - "from google_cloud_pipeline_components.experimental.evaluation import (\n", - " ModelEvaluationRegressionOp, \n", - " ModelEvaluationFeatureAttributionOp,\n", - " ModelImportEvaluationOp)\n", - "from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - "from google_cloud_pipeline_components.experimental import evaluation\n", - "import kfp" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "source": [ - "## Dataset\n", - "\n", - "We use this bigquery table for training" - ], - "metadata": { - "id": "BiVlyW5OUnjK" - } - }, - { - "cell_type": "code", - "source": [ - "# Define BigQuery table to be used for training\n", - "\n", - "BQ_TABLE = \"bigquery-public-data.samples.gsod\"" - ], - "metadata": { - "id": "bViYfWfpVAiF" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "\n", - "from google.cloud import bigquery\n", - "\n", - "# Create client in default region\n", - "bq_client = bigquery.Client(\n", - " project=PROJECT_ID,\n", - " credentials=aiplatform.initializer.global_config.credentials,\n", - ")\n" - ], - "metadata": { - "id": "20S9En09X0PY" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "# Create test dataset in default region\n", - "PREDICTION_INPUT_DATASET_ID = f\"gsod_prediction_{UUID}\"\n", - "PREDICTION_INPUT_TABLE_ID = f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}.prediction\"" - ], - "metadata": { - "id": "KvRQNKhEmGHs" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "bq_dataset = bigquery.Dataset(f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}\")\n", - "bq_dataset = bq_client.create_dataset(bq_dataset)\n", - "print(f\"Created dataset {bq_client.project}.{bq_dataset.dataset_id}\")" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "cw3n0ftZYZ_h", - "outputId": "143f5d75-c324-4ebc-efba-12b51fdc47c9" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Created dataset model-eval-test.gsod_prediction_7yeib57l\n" - ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "# Select a subset of the original dataset for testing\n", - "PREDICTION_SIZE = 10\n", - "query = f\"\"\"\n", - " SELECT *\n", - " FROM {BQ_TABLE}\n", - " LIMIT {PREDICTION_SIZE} \n", - " \"\"\"\n", - "\n", - "job_config = bigquery.QueryJobConfig(destination=PREDICTION_INPUT_TABLE_ID)\n", - "\n", - "query_job = bq_client.query(query, job_config=job_config) # API request\n", - "query_job.result() # Waits for query to finish" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "XG0U5lmfYrNT", - "outputId": "03dccbd9-7405-48c1-f9b6-2bec0872df37" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": {}, - "execution_count": 83 - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "### Create the dataset" - ], - "metadata": { - "id": "4XpPTSFoYCsT" - } - }, - { - "cell_type": "code", - "source": [ - "DATASET_NAME = \"Pen\"+UUID" - ], - "metadata": { - "id": "gHOUMfskYIpO" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=DATASET_NAME,\n", - " bq_source=[f\"bq://{BQ_TABLE}\"],\n", - ")\n", - "\n", - "COLUMN_SPECS = {\n", - " \"year\": \"auto\",\n", - " \"month\": \"auto\",\n", - " \"day\": \"auto\",\n", - "}\n", - "\n", - "label_column = \"mean_temp\"\n", - "\n", - "dataset = aiplatform.TabularDataset('projects/1058599485685/locations/us-central1/datasets/5507798990181105664')\n", - "\n", - "print(dataset.resource_name)" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "IyXwOcbVYBd1", - "outputId": "cad8909b-8285-403f-d8ec-c304a577fb23" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "projects/1058599485685/locations/us-central1/datasets/5507798990181105664\n" - ] - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "## Train the model" - ], - "metadata": { - "id": "A-QQkeUnq8Xt" - } - }, - { - "cell_type": "code", - "source": [ - "MODEL_NAME = \"pen\" + UUID" - ], - "metadata": { - "id": "Bxn6ATUXrET6" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "training_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name= \"pen_training_job\",\n", - " optimization_prediction_type=\"regression\",\n", - " optimization_objective=\"minimize-rmse\"\n", - ")\n", - "\n", - "print(training_job)" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "3l691PEMZFdA", - "outputId": "f0757841-1901-408f-a587-28b6f3a4181d" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "\n" - ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "model = training_job.run(\n", - " dataset=dataset,\n", - " model_display_name=MODEL_NAME,\n", - " training_fraction_split=0.6,\n", - " validation_fraction_split=0.2,\n", - " test_fraction_split=0.2,\n", - " budget_milli_node_hours=1,\n", - " disable_early_stopping=False,\n", - " target_column=label_column,\n", - ")" - ], - "metadata": { - "id": "IIfvPCGYyFCT" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## Get Model" - ], - "metadata": { - "id": "RYjWdtscAmFP" - } - }, - { - "cell_type": "code", - "source": [ - "# model = aiplatform.Model(f'/projects/{PROJECT_ID}/locations/{REGION}/models/2036871678734106624')\n", - "MODEL_ID = '2241222511826042880'\n", - "model = aiplatform.Model(model_name=MODEL_ID)" - ], - "metadata": { - "id": "4Dkk4P_TAlkr" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### List model eval metrics" - ], - "metadata": { - "id": "rYirKB_9yaa0" - } - }, - { - "cell_type": "code", - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ], - "metadata": { - "id": "KkgCdQQAyZP1" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## Model Evaluation" - ], - "metadata": { - "id": "ce6beLsXASnK" - } - }, - { - "cell_type": "code", - "source": [ - "@kfp.dsl.pipeline(\n", - " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " # batch_predict_gcs_source_uris: list,\n", - " bigquery_source_input_uri: str,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = 'jsonl',\n", - " batch_predict_machine_type: str = 'n1-standard-16',\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_machine_type: str = 'n1-standard-4',\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_disk_size_gb: int = 50,\n", - " dataflow_service_account: str = '',\n", - " dataflow_subnetwork: str = '',\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = ''):\n", - " \n", - " # get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - " get_model_task = evaluation.GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Batch Prediction.\n", - " batch_predict_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs['model'],\n", - " job_display_name='model-registry-batch-predict-evaluatio',\n", - " bigquery_source_input_uri=bigquery_source_input_uri,\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " encryption_spec_key_name=encryption_spec_key_name)\n", - "\n", - " # # # Run the Batch Explain process (sampler -> batch explanation).\n", - " # data_sampler_task = EvaluationDataSamplerOp(\n", - " # project=project,\n", - " # location=location,\n", - " # root_dir=root_dir,\n", - " # gcs_source_uris=batch_predict_gcs_source_uris,\n", - " # instances_format=batch_predict_instances_format,\n", - " # sample_size=batch_predict_explanation_data_sample_size)\n", - " # batch_explain_task = ModelBatchPredictOp(\n", - " # project=project,\n", - " # location=location,\n", - " # model=get_model_task.outputs['model'],\n", - " # job_display_name='model-registry-batch-explain-evaluation-{{$.pipeline_job_uuid}}-{{$.pipeline_task_uuid}}',\n", - " # gcs_source_uris=data_sampler_task.outputs['gcs_output_directory'],\n", - " # instances_format=batch_predict_instances_format,\n", - " # predictions_format=batch_predict_predictions_format,\n", - " # gcs_destination_output_uri_prefix=root_dir,\n", - " # generate_explanation=True,\n", - " # explanation_parameters=batch_predict_explanation_parameters,\n", - " # explanation_metadata=batch_predict_explanation_metadata,\n", - " # machine_type=batch_predict_machine_type,\n", - " # starting_replica_count=batch_predict_starting_replica_count,\n", - " # max_replica_count=batch_predict_max_replica_count,\n", - " # encryption_spec_key_name=encryption_spec_key_name)\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationRegressionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " ground_truth_column=target_column_name,\n", - " predictions_format=batch_predict_predictions_format,\n", - " predictions_gcs_source=batch_predict_task\n", - " .outputs['gcs_output_directory'],\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name)\n", - " # feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " # project=project,\n", - " # location=location,\n", - " # root_dir=root_dir,\n", - " # predictions_format='jsonl',\n", - " # predictions_gcs_source=batch_explain_task\n", - " # .outputs['gcs_output_directory'],\n", - " # dataflow_machine_type=dataflow_machine_type,\n", - " # dataflow_max_workers_num=dataflow_max_num_workers,\n", - " # dataflow_disk_size=dataflow_disk_size_gb,\n", - " # dataflow_service_account=dataflow_service_account,\n", - " # dataflow_subnetwork=dataflow_subnetwork,\n", - " # dataflow_use_public_ips=dataflow_use_public_ips,\n", - " # encryption_spec_key_name=encryption_spec_key_name)\n", - " ModelImportEvaluationOp(\n", - " regression_metrics=eval_task.outputs['evaluation_metrics'],\n", - " # feature_attributions=feature_attribution_task\n", - " # .outputs['feature_attributions'],\n", - " model=get_model_task.outputs['model'],\n", - " dataset_type=batch_predict_instances_format)" - ], - "metadata": { - "id": "ktMsqtibAUzz" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "from kfp.v2 import compiler # noqa: F811\n", - "\n", - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_regression_pipeline.json\",\n", - ")" - ], - "metadata": { - "id": "NOvOMTEgCVcW", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "e9918c90-562c-4c1a-ce37-ffce17cb0af7" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stderr", - "text": [ - "/usr/local/lib/python3.7/dist-packages/kfp/v2/compiler/compiler.py:1295: FutureWarning: APIs imported from the v1 namespace (e.g. kfp.dsl, kfp.components, etc) will not be supported by the v2 compiler since v2.0.0\n", - " category=FutureWarning,\n" - ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pen{UUID}\"\n", - "parameters = {\n", - " 'project':PROJECT_ID,\n", - " 'location':REGION,\n", - " 'root_dir':PIPELINE_ROOT,\n", - " 'prediction_type':'regression',\n", - " 'model_name':f'projects/{PROJECT_ID}/locations/{REGION}/models/{MODEL_ID}',\n", - " 'target_column_name':label_column,\n", - " 'bigquery_source_input_uri':f\"bq://{PREDICTION_INPUT_TABLE_ID}\",\n", - " 'batch_predict_instances_format':'bigquery',\n", - " }" - ], - "metadata": { - "id": "JeSiA6-TSgV8" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "DISPLAY_NAME = \"pen\" + UUID\n", - "\n", - "job = aiplatform.PipelineJob(\n", - " display_name=DISPLAY_NAME,\n", - " template_path=\"tabular_regression_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run()\n", - "\n", - "# ! rm tabular_regression_pipeline.json" - ], - "metadata": { - "id": "pdHib_yUEuEk" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "Click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ], - "metadata": { - "id": "mKRTDi8ioXBY" - } - }, - { - "cell_type": "markdown", - "source": [ - "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" - ], - "metadata": { - "id": "U2zocUvk2YVs" - } - }, - { - "cell_type": "markdown", - "source": [ - "### Molde Evaluation Results" - ], - "metadata": { - "id": "XcKaONSsGNC4" - } - }, - { - "cell_type": "code", - "source": [ - "# _________NOTE_________:\n", - "#this is a sample code from eval team... need to be degbugged or replaced with a \n", - "# better appraoch\n", - "\n", - "from google.cloud import aiplatform_v1\n", - "\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " if ((\"model-evaluation\" in task.task_name) and\n", - " (\"model-evaluation-import\" not in task.task_name) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED)):\n", - " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ], - "metadata": { - "id": "mtHA8rhGGQv3" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial:\n", - "\n", - "{TODO: Include commands to delete individual resources below}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete endpoint resource\n", - "! gcloud ai endpoints delete $ENDPOINT_NAME --quiet --region $REGION\n", - "\n", - "# Delete model resource\n", - "! gcloud ai models delete $MODEL_NAME --quiet\n", - "\n", - "# Delete Cloud Storage objects that were created\n", - "! gsutil -m rm -r $JOB_DIR\n", - "\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "community-automl-regression.ipynb", - "provenance": [] - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} \ No newline at end of file From 01863e86d270abcea13a46ade9f6942ed5159e6a Mon Sep 17 00:00:00 2001 From: Soheila Zangeneh Date: Tue, 30 Aug 2022 09:46:49 -0400 Subject: [PATCH 13/34] Pring evaluation results --- ..._tabular_regression_model_evaluation.ipynb | 342 +++++++++--------- 1 file changed, 171 insertions(+), 171 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index 9eba72c37..065c70cd1 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -678,30 +678,35 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "BiVlyW5OUnjK" + }, "source": [ "## Dataset\n", "\n", "We use this bigquery table for training" - ], - "metadata": { - "id": "BiVlyW5OUnjK" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bViYfWfpVAiF" + }, + "outputs": [], "source": [ "# Define BigQuery table to be used for training\n", "\n", "BQ_TABLE = \"bigquery-public-data.samples.gsod\"" - ], - "metadata": { - "id": "bViYfWfpVAiF" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "20S9En09X0PY" + }, + "outputs": [], "source": [ "\n", "from google.cloud import bigquery\n", @@ -711,33 +716,24 @@ " project=PROJECT_ID,\n", " credentials=aiplatform.initializer.global_config.credentials,\n", ")\n" - ], - "metadata": { - "id": "20S9En09X0PY" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KvRQNKhEmGHs" + }, + "outputs": [], "source": [ "# Create test dataset in default region\n", "PREDICTION_INPUT_DATASET_ID = f\"gsod_prediction_{UUID}\"\n", "PREDICTION_INPUT_TABLE_ID = f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}.prediction\"" - ], - "metadata": { - "id": "KvRQNKhEmGHs" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", - "source": [ - "bq_dataset = bigquery.Dataset(f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}\")\n", - "bq_dataset = bq_client.create_dataset(bq_dataset)\n", - "print(f\"Created dataset {bq_client.project}.{bq_dataset.dataset_id}\")" - ], + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -745,33 +741,24 @@ "id": "cw3n0ftZYZ_h", "outputId": "143f5d75-c324-4ebc-efba-12b51fdc47c9" }, - "execution_count": null, "outputs": [ { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ "Created dataset model-eval-test.gsod_prediction_7yeib57l\n" ] } + ], + "source": [ + "bq_dataset = bigquery.Dataset(f\"{PROJECT_ID}.{PREDICTION_INPUT_DATASET_ID}\")\n", + "bq_dataset = bq_client.create_dataset(bq_dataset)\n", + "print(f\"Created dataset {bq_client.project}.{bq_dataset.dataset_id}\")" ] }, { "cell_type": "code", - "source": [ - "# Select a subset of the original dataset for testing\n", - "PREDICTION_SIZE = 10\n", - "query = f\"\"\"\n", - " SELECT *\n", - " FROM {BQ_TABLE}\n", - " LIMIT {PREDICTION_SIZE} \n", - " \"\"\"\n", - "\n", - "job_config = bigquery.QueryJobConfig(destination=PREDICTION_INPUT_TABLE_ID)\n", - "\n", - "query_job = bq_client.query(query, job_config=job_config) # API request\n", - "query_job.result() # Waits for query to finish" - ], + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -779,42 +766,72 @@ "id": "XG0U5lmfYrNT", "outputId": "03dccbd9-7405-48c1-f9b6-2bec0872df37" }, - "execution_count": null, "outputs": [ { - "output_type": "execute_result", "data": { "text/plain": [ "" ] }, + "execution_count": 83, "metadata": {}, - "execution_count": 83 + "output_type": "execute_result" } + ], + "source": [ + "# Select a subset of the original dataset for testing\n", + "PREDICTION_SIZE = 10\n", + "query = f\"\"\"\n", + " SELECT *\n", + " FROM {BQ_TABLE}\n", + " LIMIT {PREDICTION_SIZE} \n", + " \"\"\"\n", + "\n", + "job_config = bigquery.QueryJobConfig(destination=PREDICTION_INPUT_TABLE_ID)\n", + "\n", + "query_job = bq_client.query(query, job_config=job_config) # API request\n", + "query_job.result() # Waits for query to finish" ] }, { "cell_type": "markdown", - "source": [ - "### Create the dataset" - ], "metadata": { "id": "4XpPTSFoYCsT" - } + }, + "source": [ + "### Create the dataset" + ] }, { "cell_type": "code", - "source": [ - "DATASET_NAME = \"Pen\"+UUID" - ], + "execution_count": null, "metadata": { "id": "gHOUMfskYIpO" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "DATASET_NAME = \"Pen\"+UUID" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IyXwOcbVYBd1", + "outputId": "cad8909b-8285-403f-d8ec-c304a577fb23" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "projects/1058599485685/locations/us-central1/datasets/5507798990181105664\n" + ] + } + ], "source": [ "dataset = aiplatform.TabularDataset.create(\n", " display_name=DATASET_NAME,\n", @@ -832,56 +849,31 @@ "dataset = aiplatform.TabularDataset('projects/1058599485685/locations/us-central1/datasets/5507798990181105664')\n", "\n", "print(dataset.resource_name)" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "IyXwOcbVYBd1", - "outputId": "cad8909b-8285-403f-d8ec-c304a577fb23" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "projects/1058599485685/locations/us-central1/datasets/5507798990181105664\n" - ] - } ] }, { "cell_type": "markdown", - "source": [ - "## Train the model" - ], "metadata": { "id": "A-QQkeUnq8Xt" - } + }, + "source": [ + "## Train the model" + ] }, { "cell_type": "code", - "source": [ - "MODEL_NAME = \"pen\" + UUID" - ], + "execution_count": null, "metadata": { "id": "Bxn6ATUXrET6" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "MODEL_NAME = \"pen\" + UUID" + ] }, { "cell_type": "code", - "source": [ - "training_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name= \"pen_training_job\",\n", - " optimization_prediction_type=\"regression\",\n", - " optimization_objective=\"minimize-rmse\"\n", - ")\n", - "\n", - "print(training_job)" - ], + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -889,19 +881,32 @@ "id": "3l691PEMZFdA", "outputId": "f0757841-1901-408f-a587-28b6f3a4181d" }, - "execution_count": null, "outputs": [ { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ "\n" ] } + ], + "source": [ + "training_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name= \"pen_training_job\",\n", + " optimization_prediction_type=\"regression\",\n", + " optimization_objective=\"minimize-rmse\"\n", + ")\n", + "\n", + "print(training_job)" ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IIfvPCGYyFCT" + }, + "outputs": [], "source": [ "model = training_job.run(\n", " dataset=dataset,\n", @@ -913,70 +918,70 @@ " disable_early_stopping=False,\n", " target_column=label_column,\n", ")" - ], - "metadata": { - "id": "IIfvPCGYyFCT" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "## Get Model" - ], "metadata": { "id": "RYjWdtscAmFP" - } + }, + "source": [ + "## Get Model" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "4Dkk4P_TAlkr" + }, + "outputs": [], "source": [ "# model = aiplatform.Model(f'/projects/{PROJECT_ID}/locations/{REGION}/models/2036871678734106624')\n", "MODEL_ID = '2241222511826042880'\n", "model = aiplatform.Model(model_name=MODEL_ID)" - ], - "metadata": { - "id": "4Dkk4P_TAlkr" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "### List model eval metrics" - ], "metadata": { "id": "rYirKB_9yaa0" - } + }, + "source": [ + "### List model eval metrics" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "outputs": [], "source": [ "# Get evaluations\n", "model_evaluations = model.list_model_evaluations()\n", "\n", "model_evaluation = list(model_evaluations)[0]\n", "print(model_evaluation)" - ], - "metadata": { - "id": "KkgCdQQAyZP1" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "## Model Evaluation" - ], "metadata": { "id": "ce6beLsXASnK" - } + }, + "source": [ + "## Model Evaluation" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ktMsqtibAUzz" + }, + "outputs": [], "source": [ "@kfp.dsl.pipeline(\n", " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", @@ -1085,44 +1090,44 @@ " # .outputs['feature_attributions'],\n", " model=get_model_task.outputs['model'],\n", " dataset_type=batch_predict_instances_format)" - ], - "metadata": { - "id": "ktMsqtibAUzz" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", - "source": [ - "from kfp.v2 import compiler # noqa: F811\n", - "\n", - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_regression_pipeline.json\",\n", - ")" - ], + "execution_count": null, "metadata": { - "id": "NOvOMTEgCVcW", "colab": { "base_uri": "https://localhost:8080/" }, + "id": "NOvOMTEgCVcW", "outputId": "e9918c90-562c-4c1a-ce37-ffce17cb0af7" }, - "execution_count": null, "outputs": [ { - "output_type": "stream", "name": "stderr", + "output_type": "stream", "text": [ "/usr/local/lib/python3.7/dist-packages/kfp/v2/compiler/compiler.py:1295: FutureWarning: APIs imported from the v1 namespace (e.g. kfp.dsl, kfp.components, etc) will not be supported by the v2 compiler since v2.0.0\n", " category=FutureWarning,\n" ] } + ], + "source": [ + "from kfp.v2 import compiler # noqa: F811\n", + "\n", + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "outputs": [], "source": [ "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pen{UUID}\"\n", "parameters = {\n", @@ -1135,15 +1140,15 @@ " 'bigquery_source_input_uri':f\"bq://{PREDICTION_INPUT_TABLE_ID}\",\n", " 'batch_predict_instances_format':'bigquery',\n", " }" - ], - "metadata": { - "id": "JeSiA6-TSgV8" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pdHib_yUEuEk" + }, + "outputs": [], "source": [ "DISPLAY_NAME = \"pen\" + UUID\n", "\n", @@ -1157,44 +1162,44 @@ "job.run()\n", "\n", "# ! rm tabular_regression_pipeline.json" - ], - "metadata": { - "id": "pdHib_yUEuEk" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, "source": [ "Click on the generated link to see your run in the Cloud Console.\n", "\n", "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ], - "metadata": { - "id": "mKRTDi8ioXBY" - } + ] }, { "cell_type": "markdown", - "source": [ - "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" - ], "metadata": { "id": "U2zocUvk2YVs" - } + }, + "source": [ + "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" + ] }, { "cell_type": "markdown", - "source": [ - "### Molde Evaluation Results" - ], "metadata": { "id": "XcKaONSsGNC4" - } + }, + "source": [ + "### Molde Evaluation Results" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mtHA8rhGGQv3" + }, + "outputs": [], "source": [ "# _________NOTE_________:\n", "#this is a sample code from eval team... need to be degbugged or replaced with a \n", @@ -1205,18 +1210,13 @@ "for task in job._gca_resource.job_detail.task_details:\n", " if ((\"model-evaluation\" in task.task_name) and\n", " (\"model-evaluation-import\" not in task.task_name) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED)):\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", "\n", "print(evaluation_metrics)\n", "print(evaluation_metrics_gcs_uri)" - ], - "metadata": { - "id": "mtHA8rhGGQv3" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", @@ -1270,4 +1270,4 @@ }, "nbformat": 4, "nbformat_minor": 0 -} \ No newline at end of file +} From fa6652bd08fd9e739bd549d369187a5881bf180d Mon Sep 17 00:00:00 2001 From: sudarshan-SpringML Date: Fri, 2 Sep 2022 15:39:13 +0000 Subject: [PATCH 14/34] modified some text --- ..._tabular_regression_model_evaluation.ipynb | 188 +++--------------- 1 file changed, 32 insertions(+), 156 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index 340fd875f..6684da910 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -93,7 +93,7 @@ "source": [ "### Dataset\n", "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", "\n", "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", "- `Age`: Age of pet when listed, in months\n", @@ -195,26 +195,11 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": { "id": "2b4ef9b72d43" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[33mWARNING: You are using pip version 21.0.1; however, version 22.2.2 is available.\n", - "You should consider upgrading via the '/usr/local/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\n", - "\u001b[33mWARNING: You are using pip version 21.0.1; however, version 22.2.2 is available.\n", - "You should consider upgrading via the '/usr/local/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\n", - "\u001b[33mWARNING: You are using pip version 21.0.1; however, version 22.2.2 is available.\n", - "You should consider upgrading via the '/usr/local/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\n", - "\u001b[33mWARNING: You are using pip version 21.0.1; however, version 22.2.2 is available.\n", - "You should consider upgrading via the '/usr/local/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\n" - ] - } - ], + "outputs": [], "source": [ "import os\n", "\n", @@ -248,7 +233,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": { "id": "EzrelQZ22IZj" }, @@ -313,7 +298,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": { "id": "oM1iC_MfAts1" }, @@ -324,19 +309,11 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": { "id": "riG_qUokg0XZ" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Project ID: vertex-ai-dev\n" - ] - } - ], + "outputs": [], "source": [ "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", " # Get your GCP project id from gcloud\n", @@ -347,30 +324,11 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": { "id": "set_gcloud_project_id" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Updated property [core/project].\n", - "\n", - "\n", - "Updates are available for some Cloud SDK components. To install them,\n", - "please run:\n", - " $ gcloud components update\n", - "\n", - "\n", - "\n", - "To take a quick anonymous survey, run:\n", - " $ gcloud survey\n", - "\n" - ] - } - ], + "outputs": [], "source": [ "! gcloud config set project $PROJECT_ID" ] @@ -397,7 +355,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": { "id": "sduDOFQVF6kv" }, @@ -423,7 +381,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": { "id": "697568e92bd6" }, @@ -533,7 +491,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": { "id": "MzGDU7TWdts_" }, @@ -545,7 +503,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": { "id": "cf221059d072" }, @@ -567,19 +525,11 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": { "id": "NIq7R4HZCfIc" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Creating gs://vertex-ai-devaip-quar9hbz/...\n" - ] - } - ], + "outputs": [], "source": [ "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" ] @@ -595,7 +545,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": { "id": "vhOb7YnwClBb" }, @@ -617,7 +567,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "metadata": { "id": "UwC1AdGeF6kx" }, @@ -628,19 +578,11 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": { "id": "autoset_service_account" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Service Account: 931647533046-compute@developer.gserviceaccount.com\n" - ] - } - ], + "outputs": [], "source": [ "if (\n", " SERVICE_ACCOUNT == \"\"\n", @@ -673,7 +615,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "metadata": { "id": "6OqzKqhMF6kx" }, @@ -695,7 +637,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "metadata": { "id": "pRUOFELefqf1" }, @@ -722,7 +664,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "metadata": { "id": "ksAefQcCF6ky" }, @@ -744,7 +686,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "metadata": { "id": "bViYfWfpVAiF" }, @@ -755,25 +697,12 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": null, "metadata": { "id": "20S9En09X0PY", "tags": [] }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Creating TabularDataset\n", - "Create TabularDataset backing LRO: projects/931647533046/locations/us-central1/datasets/9019973380832493568/operations/3060886783697879040\n", - "TabularDataset created. Resource name: projects/931647533046/locations/us-central1/datasets/9019973380832493568\n", - "To use this TabularDataset in another session:\n", - "ds = aiplatform.TabularDataset('projects/931647533046/locations/us-central1/datasets/9019973380832493568')\n", - "Resource name: projects/931647533046/locations/us-central1/datasets/9019973380832493568\n" - ] - } - ], + "outputs": [], "source": [ "# Create the Vertex AI Dataset resource\n", "dataset = aiplatform.TabularDataset.create(\n", @@ -799,7 +728,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": null, "metadata": { "id": "Bxn6ATUXrET6" }, @@ -810,7 +739,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -837,7 +766,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -845,23 +774,7 @@ "id": "3l691PEMZFdA", "outputId": "f0757841-1901-408f-a587-28b6f3a4181d" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.9/site-packages/google/cloud/aiplatform/training_jobs.py:4562: DeprecationWarning: consider using column_specs instead. column_transformations will be deprecated in the future.\n", - " column_transformations_utils.validate_and_get_column_transformations(\n" - ] - } - ], + "outputs": [], "source": [ "train_job = aiplatform.AutoMLTabularTrainingJob(\n", " display_name=TRAINING_JOB_DISPLAY_NAME,\n", @@ -897,7 +810,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -906,7 +819,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -947,44 +860,7 @@ "id": "IIfvPCGYyFCT", "tags": [] }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "View Training:\n", - "https://console.cloud.google.com/ai/platform/locations/us-central1/training/4882406946784673792?project=931647533046\n", - "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", - "PipelineState.PIPELINE_STATE_RUNNING\n", - "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", - "PipelineState.PIPELINE_STATE_RUNNING\n", - "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", - "PipelineState.PIPELINE_STATE_RUNNING\n", - "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", - "PipelineState.PIPELINE_STATE_RUNNING\n", - "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", - "PipelineState.PIPELINE_STATE_RUNNING\n", - "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", - "PipelineState.PIPELINE_STATE_RUNNING\n", - "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", - "PipelineState.PIPELINE_STATE_RUNNING\n", - "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", - "PipelineState.PIPELINE_STATE_RUNNING\n", - "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", - "PipelineState.PIPELINE_STATE_RUNNING\n", - "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", - "PipelineState.PIPELINE_STATE_RUNNING\n", - "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", - "PipelineState.PIPELINE_STATE_RUNNING\n", - "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", - "PipelineState.PIPELINE_STATE_RUNNING\n", - "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", - "PipelineState.PIPELINE_STATE_RUNNING\n", - "AutoMLTabularTrainingJob projects/931647533046/locations/us-central1/trainingPipelines/4882406946784673792 current state:\n", - "PipelineState.PIPELINE_STATE_RUNNING\n" - ] - } - ], + "outputs": [], "source": [ "# Run the training job\n", "model = train_job.run(\n", From 4daae032930b33e21eda434ef1fbb6461727ccb9 Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Mon, 5 Sep 2022 10:14:45 +0000 Subject: [PATCH 15/34] added suggested updates from review: remove dataflow params, add/change textual descriptions, add UUID --- ...ular_classification_model_evaluation.ipynb | 106 +++++++++--------- ...lar_classification_evaluation_pipeline.PNG | Bin 0 -> 84131 bytes 2 files changed, 52 insertions(+), 54 deletions(-) create mode 100644 notebooks/community/model_evaluation/images/automl_tabular_classification_evaluation_pipeline.PNG diff --git a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb index 590d325cc..eb9d7f14d 100644 --- a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb @@ -61,7 +61,7 @@ "source": [ "## Overview\n", "\n", - "This notebook demonstrates how to use Vertex AI classification model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + "This notebook demonstrates how to use Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " ] }, { @@ -72,19 +72,24 @@ "source": [ "### Objective\n", "\n", - "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", "\n", "This tutorial uses the following Google Cloud ML services and resources:\n", "\n", - "- Vertex AI `AutoML` \n", - "- Vertex AI `Pipeline`\n", + "- Vertex AI `Datasets`\n", + "- Vertex AI `Training`(AutoML Tabular Classification) \n", + "- Vertex AI `Pipelines`\n", + "- Vertex AI `Batch Predictions`\n", + "\n", "\n", "\n", "The steps performed include:\n", "\n", - "- Import a pre-trained `AutoML model resource` into the pipeline.\n", - "- Run a `batch prediction` job.\n", - "- Evaulate the AutoML model using the `classification evluation component`." + "- Create a Vertex AI `Dataset`.\n", + "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", + "- Import the trained `AutoML model resource` into the pipeline.\n", + "- Run a `Batch Prediction` job.\n", + "- Evaulate the AutoML model using the `Classification Evaluation Component`." ] }, { @@ -112,7 +117,7 @@ "- `PhotoAmt`: Total uploaded photos for this pet\n", "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." + "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." ] }, { @@ -699,7 +704,7 @@ "source": [ "# Create the Vertex AI Dataset resource\n", "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset\",\n", + " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", " gcs_source=DATA_SOURCE,\n", ")\n", "\n", @@ -714,7 +719,7 @@ "\n", "Train a simple classification model the created dataset using `Adopted` as the target column. \n", "\n", - "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." ] }, { @@ -735,7 +740,21 @@ "# If no display name is specified, use the default one\n", "if TRAINING_JOB_DISPLAY_NAME == \"\" or \\\n", " TRAINING_JOB_DISPLAY_NAME is None or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\":\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl\"" + " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", + "\n", + "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", + "- `column_transformations`(Optional): Transformations to apply to the input columns.\n", + "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", + "\n", + "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." ] }, { @@ -790,14 +809,22 @@ "# If no name is specified, use the default name\n", "if MODEL_DISPLAY_NAME == \"\" or \\\n", " MODEL_DISPLAY_NAME is None or MODEL_DISPLAY_NAME == \"[your-model-display-name]\":\n", - " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model\"" + " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Run the training job on the created Vertex AI dataset by passing the needed arguments for training.\n", + "Run training job on the created TabularDataset by passing the following arguments for training:\n", + "\n", + "- `service_account`: The service account configured to run the training job.\n", + "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", + "- `target_column`: The name of the column values of which the Model is to predict.\n", + "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", + "- `budget_milli_node_hours`(Optional): The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", + "\n", + "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", "\n", "The training job takes roughly 1.5-2 hours to finish." ] @@ -811,16 +838,15 @@ }, "outputs": [], "source": [ + "# Specify the target column\n", + "target_column = \"Adopted\"\n", + "\n", "# Run the training job\n", "model = train_job.run(\n", " service_account=SERVICE_ACCOUNT,\n", " dataset=dataset,\n", - " target_column=\"Adopted\",\n", - " training_fraction_split=0.8,\n", - " validation_fraction_split=0.1,\n", - " test_fraction_split=0.1,\n", + " target_column=target_column,\n", " model_display_name=MODEL_DISPLAY_NAME,\n", - " disable_early_stopping=False,\n", " budget_milli_node_hours=1000,\n", ")" ] @@ -918,14 +944,8 @@ " batch_predict_max_replica_count: int = 10,\n", " batch_predict_explanation_metadata: dict = {},\n", " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_machine_type: str = 'n1-standard-4',\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_disk_size_gb: int = 50,\n", - " dataflow_service_account: str = '',\n", - " dataflow_subnetwork: str = '',\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = ''):\n", + " batch_predict_explanation_data_sample_size: int = 10000\n", + "):\n", " \n", " from google_cloud_pipeline_components.experimental.evaluation import GetVertexModelOp\n", " from google_cloud_pipeline_components.experimental.evaluation import EvaluationDataSamplerOp\n", @@ -960,7 +980,6 @@ " machine_type=batch_predict_machine_type,\n", " starting_replica_count=batch_predict_starting_replica_count,\n", " max_replica_count=batch_predict_max_replica_count,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", " # Set the explanation parameters\n", " generate_explanation=True,\n", " explanation_parameters=batch_predict_explanation_parameters,\n", @@ -976,14 +995,7 @@ " problem_type=prediction_type,\n", " ground_truth_column=target_column_name,\n", " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", - " predictions_format=batch_predict_predictions_format,\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name\n", + " predictions_format=batch_predict_predictions_format\n", " )\n", " \n", " # Get Feature Attributions\n", @@ -992,14 +1004,7 @@ " location=location,\n", " root_dir=root_dir,\n", " predictions_format='jsonl',\n", - " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name\n", + " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory']\n", " )\n", "\n", " ModelImportEvaluationOp(\n", @@ -1072,7 +1077,7 @@ "\n", "- `project`: Project ID.\n", "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", "- `target_column_name`: Name of the column to be used as the target for classification.\n", @@ -1144,16 +1149,9 @@ "source": [ "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U2zocUvk2YVs" - }, - "source": [ - "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" + "In the UI, many of the pipeline DAG nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", + "\n", + "" ] }, { diff --git a/notebooks/community/model_evaluation/images/automl_tabular_classification_evaluation_pipeline.PNG b/notebooks/community/model_evaluation/images/automl_tabular_classification_evaluation_pipeline.PNG new file mode 100644 index 0000000000000000000000000000000000000000..275c9df733e16d23116112d6446ea7cedd57978a GIT binary patch literal 84131 zcmbrmc|26_|Nk%9cUeZ(vL?$^mh2Q`iLBW*Lb8o)(b%JzEQ3Op?2#nh)eezuWEi{e8c;?;kj3&MfDg>v}Ga$Nf4{hWc9cG@LXfBqa1Y+8V|r zB&5UO=ROq$c;tB$Y!duATK8|b(%bcwJNar{_!_5vUMc84KRQvw;7L|iRxwL#J0C{AWusRIk3iR507Bv;|-DFJ32ag z(e%6v<2J!%Ee0JgTDMoiKYi+keGkPFDjYiF=$q0Q?Vr`IeR~JxO4N+>AMgZR>o+N3_tG^9;{zZ;QS?jUbNUtJ3Qj5OP{W~;3Jb9 zUW(Wb=O}99QKNUmXRmlBUz5aP`f3RLXKA6Na^&M8*lLv6X-cU3GiqrBK3^WXMGcjR zvht6oE1HHqvh>%%i9D(iyoi0HicU>Q_FOkXtaG`Czf!Hp?@h~z@^{f;g^#*o-;30; z#zGRc*svTX_mD#zJQ!8^G3N* z_1(@pc&fn?dJ63OP&VujXs;IvvO0tu&@zKp8zEh?f0w^3aopsOV#T8yk{}Lt^Z7Sl z^=^C#@>4>(sJP9yE0b=r2^Q5Gx%6gmm`_ZS`3w1b#NDjPp;0he z=(sr>bj_9=sw>Wp7n1M5>3Y*)rL)^#)4&rPim~PIHY(>hkUDWXFD>C|{c)7gUZ)bQ zV5bFmsAcjD#tiHwjum7t!8$)qG9Tb(*}OsH7SW3Ll;dWIXtjGx`|p49$ljOx{dp8T zdE?Rl@e^vs7FAk&fhQI3a%;2C1u|6r$_9V8%1`*WTo0%#7X=PjjjNBy>iXc}$wvy9 z4O}e`(k1ZD7fpo3fhw*60spO-;rtj4sfGEwIJnJ|M-y

8Ui3uXK&*1~0@PMIlv zv?pS?BJH)}Xo&B-gcL`F13cLR{8PKytjlFRenF*ZJ;6de5pu*Jg6a3>`Id{VuBh1{ zg(2;%WuZ~+{MhJS#0#u_wl6%-UkEKzFy?9VYMzW+|LRPt`RZN1rFfRQ(PBk%&K@24 z(XYF#+S3`2)D=0Nu{!5vtmOG#5m~y`j>%oexu77wbi-o5lwjq)M^-q-nDk+)JW1o{ zM-BIH6DMXG-cjDMryJvj!}R_zId@*FmK`2bkTI@tuAn&>TzC_F(ha@K(0vK=HEQU% z?%^x7gs>(wB!nMZ-4$ap=`5yNgX(FBYJZDd52V5Mcn74;=_8Yp6r;e%eMu&INt1ws z&%$mGT80K#@V?Drk%qGnXP#?{d<(8;5w)GhQjB#j>!xvtC_8lmI<+F2*aIu*a0d7sKB?C>TDGE#v~N!Dn38*wQqD_-qL^{Zv%u& zSrUX=LpOJ~iSQ-i!ewVX`TKWhNFx+mEl87+i;d&~Gj76C1}gE;*qjafHiJbRYo#p0 ziXSTUx&8wCHjM?9;;Wwn@5wlat)8$%nGgAdcNDz^b95*g{F34I3E#g!rQg#3T%D9O zGVF6kSb(SG(hzB;KAPk78zHYHT-=A6mYmp$Y+1#!s4iVvcymLAL?Sqb)i+Lo2>EY_$p#oIN`~lRX$KQ#D z_k2zQ83r@%&&1QB_Ox65zj)%&tMYq%sp8hL_ zIHuZz@#v?P$N{tMWWCj}vHRHSqbqm^Jo$9{XmS^3x7Dd)#J=Jph$zzVU8>br^j6|D zWx4XNzk?7anhx0`g(q_hV5_;+l_%#WV+dxQ<<~s64ksSJbsMbe zPHY}4C%F0w9t+XGH{b3n1`RL10M)<#CE*Pr_m~w0L9c6{Qbj}7)$Fyyj#=SWh{sMX7Q&l`}mlo=+mqia-P*@V` zoEy`&E3Mj?`H%#uFn$iHNS$ey!RNQzoBCW=8rjSET<3G$*y1b6WOKSqI8)yF@jHf$ zJemzwc`g?+V;Y2$3u(3Tc*dItG}rp?xM4jtNKIzZ4z$o!b&}P`5Ex6693lzJ;Xqo( zJPq@fOoxo%Vc*lV?MT_+%PasF}U_;hBbfdTfvu=NV1c{FT z7X-YFe->tk0Oxo*!gM2HPoVz^Db)To0&Wul@pZ#~e2l~Ri}Y*K$g;z2W+z8y6up#& z9yh)^sqom54|6kJ(5Efyk`)m|%fx9$47?d^K7}ZsR~9v7q<==1MLI&wNu9ykmi(cn z)~tor!e58)0Vi~MQ~|08MJUFm+ymh*#Pv-=O1l}-=-~@&+i*ASxBXi0n=~VD5~(x! z9R}VQ@+H#O4rV(EkXw^6vBcfO(AR!pjHTJ~^gz_qdS})e7uRMs>mn4xATXrmL5>O1 zPM7OVqFnUsMT$@d_4FESi|vKQ#{Eaa@bY~RMI*&9A z&JTEetCM=*F?mMT{rgfo2BQ9R@;Vkld-@2y)3{9)^fKbtC9o%lV&+ z@#Wv^BP6)v`dX$d9@}+>eV33>rrbd(KW?zo>i_((+{Q*R8i%d{vFpLP4Go!-FUf4SS-D5cjwZPqfJEK?E!sU zT4iIAB1VjP1bjUEpJF49fs6%PY$mn0*1FZHH=yHxYYB6p zWHK8XHSt}2LJvNZ2jj;wy!@fQ>OFsBq?B!qpT#rQ;mX3RT+e3pL_fZs#EOtF*-5#} z4&7_&X5HPP<$7c-d#*rIcB>YlCYc`4ic!-!mCg^UJr(?GwS&>>Ec%Z!a58B>)AfW{ z0#}$DLc{hsWL5tq6?FR?8lv)geWp3Q(*OLeX>_OyDEE+EE##uC^Dt`}xEcdP&N1uT z8Ca;X2v$0tk`LK+jYZVLe;{R-&HVzgu5;FUf&C%EP+z`*b8@EL+TdJ$BC=ys71BWK zrq!9tgO>4v1u;9dae0aOWcE|f_fzAFR^+cCU3^Hf9SXz|?eWY(y{2y(C4ulapc!oS zO5#4TpbzJLNREqG@+(*4w$mz}_QBEi*#yzuGnVNB!V{LF8 zODyohoVJRyV&jh-UaOs>fsSXAVoR`|p2Rx+1A$*YKm1Ie8JsLF2@-MW3mejQ4kwc$~OMRoG7+hhZ2&$zqcrJCZnB zkrCu&i&2o~G{Hd#Vm)p4`Gtz>KwNW|fsPlihX)$L)5=9&%v4pIv)m$dFa**^UB=A1A`1~lPT@VuD29N_OW^hl@Zj2u*-qdU@#(#i)DW4~XKJ@Dp7;>30 zeyqc7Z0oQLX6rVdx=ag1g#5Miy4^vaVzmb*Ew!`fY9tNREu?#g6cqW|c- z#vJ-4Gu(7L>`89zDy}H9gaVP7vdB@_#F`+_!p(9earkYG+fY;wajF!R3pd)^@hp-` z+)u9=dQ{R+z1sD!pg^JXL$X+eci3Z;sZU`CTVD3=;cn8vv~Q`y)q&6VacQgDoYZa+ zdCWEzirfrZ5if7pH8@U15i%(QT8E&kjHe@P)MBpMSS(^D%vd4Aargc;xBPKwi-#Tj z#mobcPux_Yen$P;dyM95H2p!z$M@Tj_AR)6+*AD%9rb}RI^@DeLQg4Gzsd=YAXb8= zT@H#bN~&r2Znre!g!A3}#U(0b{;8{`h8-p2HIExNBKsShS`BN<8{H@G2HCK*e1Yl-Z1jDF7?$v*2frZ}lKowZ z;#d2^@9ZO`pw;C9G$n7)g^bPR&8n$ip`JHnB_rG%d>55>?v({^TAy$%U}dk)-neak z+voN!bV^R*a4yF7F4|AlwTX6lfSma#d+p#Ce(L-BNWt@_jQ#JI9NJeVc3j%gHF;-gpGX< z$wuj7n;O@_XqhH~C(Pd>^~^Fnnl{%EDI}=H;gWB-L6@hfZ);ewcT%XW+)#l zO(+=uk7GNFOVd)dC_LVrl7SbNLPNb;vo1aK4-o6+zhf<7R>&OB@5%fI!w&5-^K}%G zXD^H8B_FHy*A>>c4f>hnIotd0p0(Z4c6X8H(6u8)KRRj;=GPVFuc4chkLim#i?yeG z`-Aa!meN!aS$CEq_ho0nZu3XSy8c%M^jy8J|J*!o>8HZIbU zjT4>?waQS&#fub=3G9w9dwk;MdvS71Ip!9cc16Z|-BHzQt^MFa9?G}mM6!OSJDzPB zyYqRfUuh>uZU;+S*{N+-Y?CzBQBJj-D>j_GZ6L_XJX|%>kS>p0F zF@B{VOQ2<7d`LfjRW3ZxC2D&pCQi;kfhXf8b_JfJlp^f$Hpb2II&U&u^LX^bXMHM} zecjRnzG|*{%(r(ba6-dl1HamW=NI2MKVt~F4dYK% zFGD-z(f5$?bSBvM6i~#Aa^pL=dyc(0%=%rMiV>ujt^UIq+j>r@>@EZzqeEi`pGfrk z@`NAD#2^Z~qtndOXpi-;-L#?F=NDjb?NgEjoQDj2SJn3mLo8(W?mSF2UQq_VRhJlm zACDi4;#d|YCIIj?S%{YIofTz`Lw!B9hwP8`d5=0-3Z!^;q`N| z*51oKVzDS0pWBMP)Q8C1_4)!lF-qhPld zK2tL5DT(lUaI^Re$UF2`c9LghOzwPPfm1bu<|PS&Ge^3_N$TIjqot_IZW+#u4(bdb z@hF_`!8uY7kf;<|A|?7tgWu<<;FD;&Qz2UgYG0V(yOuP9ObCfn;(X)N`9i{o@*(8q zA-5r%$w*+?p1;_n89ZF&e1uT3cTNQCNUk z6HYI^ewxla%%8E`l=?A`S7{>*39kr#BZ=Bt-DHRfYN^n(YY1xZ7Js)Md0Om!@eM^` zk!Y&r0Os4apeUUG<&=?RZ3}l1ES{C0P#CdXKp{(u?DgNcmWui4NBa9N5l@exq5_$d zHa&OFsb^n>xtCf7-k2}n$w+O<7m$9bef!y~`|g*hqI0p*6>*!O235mVwb?VncU_F& z)p=}KLOee<5*E#lfgxKc;RLb7lsOK>UV24yXdNSW8uPF*XYgg_m$y&IJ1bE#+`Go- zir!ToMC2dEMNt=(FL*I9cPq_=J<`e|6FK^YVo`$#0C04LjjR|pPl!{@fESD$bFHC#At$v`--h+`w&Gf#oQ>CNi9k^&AJl?% zGGM25WxvN~lIrX0ewsxaq~rS^(OkIkEv~AjvWC4i1k8W(+pniqVU2U$T&UcmA59|tk%{dzGp?PAgxOtPMsh%EL1OVqV<9lJ1Bdp+-SrGLf1b( z^d{e$QV9vz9@U;asQ!~C5^!n3d)kt6i3{2Z(>2O8g%N_94QUb#;vOFvn%n8M zD9YC=M$3;MkGburN-M97)k6>J?yHYP-o|$K{(ATQz;L(MqpSyEU!Hpup~(BO?J#B$ zZghk*;!R^4HXdc3`>?m;h-Py=?yIUw*2JY5JI0A7ToqsMYrrgUEDjwKXEowjLWJ6t zesS=UK4U92(EIfT^5C*V`eD8A(8=MUez?7OF?O_Cx&4pJ zv1D)!IRI3h*(jc1GZZB31?lArSmPEr3bMlrZ!@6b{?hs$r>l62O0{Am{3#mp2vYH8 z9jE0i;$FUy%O?$=6I&7-!Lx`qN2H2MJ?y~#aM;I;e5BnK`o)JSr%;+{>6V_BU$Lb9 z8hpjsDGj{$!JR`Hf`_0WZt>pqth>r7Il*&M^Mjpd8gaTTuA`u(0kSPojs~3=<4rx@K^-M$EFlzemX;QZ=(?Onl3(&W z47fur$++BMyMPan55Ac|AS zY85%ixPu>B$#^zH?(8-|JTV=_Q6|U8!h#bKGA4<|3y-7Jm8FvdWa8UUD3h7@<9Dy! ze!$`D#&XuZdIR@*2wN`Jt?Tpg9@6EMkr1>4*a$OKoKT|Jp0hrqfGsDMDod|;IHC6O zwHqdzV_-)WYawV*5dywzY6Kr2t!06N93eZqjMeAKyii=$w3W3o_5&Hgzi<2DP&kltXFC{$jEE*+h(hO3S7$DU_UWw zS>l@?s@J*}?RM;GF1M3A6(Ram<#v&3gW>E2#!UX=y`Mhe{nW9@y%Y#hoM~-Fsq>4h zKm1T0XP!u23|Bo`=~-%9O11dGYPTBlQyav3q9DT|Tv@&_faxjSSKp3|aPa<<0-FEz zTr_ZW?1_2fueTG&0Q{a#4HWU$UxVkLtn%NxsArWgr2qbR@0tVz2WIfR9CsXp`V9*{ zAG_5bhJW@a%4TiwkmMt>3OgV~t6p2OE3;b~+Ge@yp|MTbR#;=oL6qSo0x$Z}KugWL zFqieOA?=FR`SGmt=wKkR3XqFIKkHzpfo5F62FxoUFBl3#_Y)Odw2J&K?JG=>Mo$U2 zWZ6rS6Q>!d9@Q*gMLn+B=U&AetBbUX=`u!KHD8i5e`)<+!&IRuSdN(L#}0P7DMdGbi2A6usi<_lf}m{A3} zGOZX~&Z}l16zjXs9B0_PTE{)9K$~~_d$A+!daCY;GEwysaH1waXTD}P-!7wXR!QJ@ z2G&h7X$TE;X&8(vF=$Xyt(E@R8Tfcq!bD<4egh{oA%-xs!%7H6TI!Dx!_o94u|h#i zW|P`*j9m10sWU;k)Fq~}!>iH9Gcq%AwVUH{R!Xar>7|_m3fP8y?cLDB`{uh7*?P_*b`1y9jpcr=PsB?jp))Z^fwh^s`~1Dsb27^rpXtiFt`GxBZ2sj) zR(J1aX6G2qX8}wm(pFq@D8X83D}hMEJpBy%0)z7l`P7C>RRr8AXR$5U(^?1R`Oa8R zxEM6!-B}=oygre@dWwU1!E^?`8@nBLYH|LUEUklRCzMh{mkfX;(ouC5Zgs{{50Bmj zZVwhzprdi6>UmB*!y8uO$V9O=aA%4IU zC6R`Y-fE!*m2fF!zTM(^JMG6&EB^tC#r#cTP^@ADMo!JfnQRGM&rFr+2LD5LJeuz_ z#rUURAR_2p65Oldl6_L5=yy*moN;C9S3doXk-jh!PzIm)d@Z#(=&}f)Jitx1OeQBO z9|X4c1=qJeDP8+)!V=TQ@>jS^Z@WFf{kS3dYLNbJw7Ez>_3=CmxI;fKT{1G%UJ!hE zMv*#;jrG|2@%wx2jt)1^6% zuipP;rWVCFqit3cMH~)^qdsz+|HO%N8WvS=aZ3zhSBfXwX3|2x@|Vv~)gNNM>MnQl z_-N8Dhd$IXNNu<@HQcky8bv>YLPo)4)-{N_+Y2hU2m$iHC53CyOa3Z=x=4QXSh4sy|!p79@kULg zYbaft9r|w(XefPVM2RK((w%ntcr`onKHb3Qii>MBWfDJo8x|Z7F>O@q*C&`%CExP& z{-a)BT57MMLio|DaYugM{-QFUb#)uc$vMpMX8b>bEjxXmIE#4cUN`J6fb|yodJWF>*m*h}Ph|d zI;8O;*0}4Lo05SO9+oT7O&jPQYP1F@fEHzAP#yFVzSuKM0^_Ry4GD4r= z?*Z_lKuHGOyU?#0odi+g#}XoT$LY&{N1UZ?43JnrXSdgGp_fh9XQnT8RmBo5`7Cn~ zx~oTUw8xH{ICC__y}#$?0*T7_JJR&hJ2ZNNB(rQGp= zyVu`Y06PlTnuP_k!ymjQ=1OxNzl+Gz9CV1NBbPP_{hYAoK1sN$Qb!&;vmo(7m z4$QI{T#!=?T6?pSFq1~~`lvL3!S?D=HV%V4htEHUb*@Uzz{g2oITAG=K|&ny?aIIg zOeEFOCwlI?QNXn3s3W%yais=IhD1mfsfsZF~H|iM{66W!5Pvp%?mdGWR5bp9t8WCuFL+itcNdZjXjjh_W7}(hcC7E zk3U_ECAj?wz7n0cltq|qOR`wH=VZnfP_N2gpTdS+bt5}_iQ9sUZt47Jp}Cv%W4*;0 zxipJ=cf6z(EPS-bEr4zWBCwp1w9yzHG{a&R9ZTRRaDp@t{JzB>-BO`(QB3Rt69Vzq5$FL)fi!S4r->ohW7nDS9sv+luX zVXU({Fv#8TXoFWwi+zoTP+Ln z6k-!OM7w+sY4`0ucIiA;yw)Z~#S<O*FZ2r_o3=t08Uh7q-@qJgn_9rR=KMfRGe;oe~=0^Mfu^S$^WU24} zfdD$?h~eq77x6&=1;lv~U*<3a!iyt|UytT@#8R)0%s9o|VC(D2!6Ns;-s4AzFTEX; zE2gR82;6=7xks`zQ1=sce7?IH19Z(D4?CEK1=grD%-q~8!HP$4H{V9)Z@bQ=aYusk zY7F(=n?ubAUTKF-O7Xiu9(W6+wkG?s^@PXr;t{7jolmPjT{Y19vj**p-RX?a9PpEItD^j8HiT&V5&=iU{lMkhA_9$gqlt z-imxnH-I)y!ILL``dpVM8q>3M-iSBDXSvxxpwbBokRweH5IDQS>MUJ5c+>Ee+9Nw% z5jP&K0ziOW?uviG38jcTFp-8=8zAvYqR&`SDambV^GP%}xUG@+RMBT@fV$E@HE9Ny zlXDjuttNS`g7DhDW!M2&G4k-9Zd%p$LrOF0L`Yt!$6Wefjpl+!4bAQ#aiy>t+W}yj`<>;yx)gJZiZlF?;O) z20VyyH{J25acA=lH~VI0B;rhI4HQexp`W(!2Sg^#fg4ZU5|;e`nt1;^GwN>@Ajdr$ z$+bxL$$;Z za757k`F71+LT>MP7!?k5RPu(OFtOiEnBd8YRVYY$y&KmG1mB(tV3gR{?|{^(C{$go zCfF;THK&EH_59Rt1Mk;LPDIR(WH1|*85u%Y4qNxfN2w`(=SYJHCBzDV^@pw)^u2{Y z^T?o%qBJ}~Q35OY&=vd9l)%sab|!zGBb%lB1q2?x$98X%M}Y2dI0 zvXx+cjz1bki+WyNnCwTn)3rovy+uF4hel4kR@%mgDudRV+d>yWaPH6Q02kIZ+Kt(! zhW4J<2I-vkj#nih_W<#!Z7({F8l9mXe|6deusZ;(WTD=&(`CvJH%1zPDV}iN4-8__ z+G=&$J1;&`*-PN za5)r+HCM(zs#zoL9Ohy6<%GY}dJb*Sf~mNG6O)mrf)X=N0?)`<^Od6Mc2wdF`P=Ds zq-?7$2DUqG4NwsuI`<1i9edmew98RZ;H)Qef2Awqulw3vv;t_k5oAC07VZP+%Xt5U zUZ5-F5OLAyjmm{oe(YP~-2W}&JG{v8@5Dwm#n$%!o09%BfzlV~o>*q*XVd!1ac5Hh zL$<{T1hxNf<+ZWar@{NXg{vYTCqq|+-mD&IDX$1u@nWo49M%f!~Qb;x>cD!fQsNQ^7j?Jv!>AE zbWjVFghQO$ig?#YUESSZGm2xUrHuUT+~$X(xpf$ISS&mnT3}PoKk%Mk*1><eW?{~f|cABI(CgH(%=RCks5yn6{%8G}DKykFHD1ewbskdZv zt-pm{8L>oq@xcGkZK#yF)LkoOz@Jvv^(~^xyq%cg!}6!T`K7ZY;5Z|m_vP4$ex`b% zZG7&6(C6xF$$~e&TRBl(Q?T-Zk2}w!zp~Ef(^m=_(Lxh0fE-MgN>{mn9gL80$haXZ zCrgwN;x$~vw#VBTJrEMtEXj#l!>T&VDHC$&niy94<&#)2M*zk*1)x7>Rnay+6nHEf z1g*y)ujiP-cO#Qh^cdQ5Wv_L=ig@b6=RFI+Z^y37a~H>Nh$bik{y1VLyh+DS@5RS* zN5rh#7WzP7@W*59vhkaUgRhir8sJR5zoYd%#vN=NKCAsg8RG-Cz*h&L9uYs&ZDX82 ze#n4o!g?9Vt@btG+uI{5M~GJ5EV=|pLR8S1WFnzP3PqfC2$Jo863&OC{g!GDZ(#(- zRjmlHG;9SRMZmj0*TgObLvCFbxg|4V3MkkY%j0(X+Uuk^P&{2DmQz6CpmRY5UDQ7n z@p7|TKaZCJ1;kGc&)Mg!H%Uw&4EkEmEH6M(FoSy)%!dWMzslo24+|{8{;1+7)(=l= z^#OkKmRKF6^?MtP1CAI-Ttr^C*C`#J9}D6;5fG0^N)hTb%oggT0z#ZDbpP#Fpo@xC zXe<6WJ+6xnVTI=zGVUy%CfV9hfxF#$lh_p>uvx16l6OD4E8YUMo8c#4%~|(l!GZan z7yJR8Od&R!6HQdm$;5}ZiPYP^F7o!)j2cnM(oC|XWsCu~G7w`JK^F=>GA49>OuOYt zt!!(N5$F+T@@A@LT3MrDOM0iQgV?Oy#Jxku6N&aomh9{1;Gfp};eoH1Hgi`+0hNsu z1#W?bKWH+Gy1{jZ*$!=PK%ol6%0VOY(zQs#v5{|(;rpW;!-!EsvBTeA5SA799d`Ma zfn?Kk>=jtrkV5Jh+^g#52#7aVMRHa4Y(KdUmOm(pLqZ<>R+Vd&duLy{lkL83Cxc=D zW0n%^mPzAw=WhWu21N=+ZmRFrwfByfXT}FE{uN7gVw8-&kEk;Ft&J-q@E2&Oj!b@u z;`X16{roj17maUjQMWtjokhb8oyfnEu%7Y98@#y3L&|;}(_Cvo%%AW-ZXrgiMK>MBZnpt=?%!{U zj&wEK8hsLNs#5#Hs^R_-k@oXfvzYDNl?quc8jf2BMboCsQ{{befcF354xNu%w*sLkjq$%ody^H$WFa&xk->MV2te*x7mvT40aGXAUodOSg}G$tGWcxz3>12MrW*7 zp7#DKw3xG!@(EAA`aX`w7Z?b$rlxD+-MU=gG>KhD)hVR{F0V6jey8=P;%5y_?vThG zJz+`KUJ(6U;cdkrXmaPA+^GmZLL)%^lOQ5X&6q~*muKOu6hi1+?0eM$E|)&Aa6%s1 zM+)7?L|r8(o#+Mue&&Hd#uf4e@J(f_hXT5j(U5CmSdOQx1R@MChOKTpC(>$ZFosJG z(FJzZBtKT$P?QQP-!2Js@8*Ve%UFoYb-vH=g!?p}EewIN5HATVVT*P8G<4S`F^=%X zTcobLn|8_pQDHEze~4aRT3YH}%J(P?oKLH1WoOV@JY^Tl$MUtUr6ZQB%A$Iq&oA|F zD7FV>8Rt{tG8`B2q(nkZS6E*TK26=2F*ZlNIxO!c)r}RIa4H$#fQ-M=;=-ff45Dh1 zAPk1&K=1_>3J^tz{4WZ!<;tMb+>A*``*iJ-?m_dTP0%TV^2*H=)RTCaeKk>=chg~A z!|Ceg0#4ykr!qQa`&uORDMec7KH#S(u3PUF3K=2szM9RbK8UN zYHl)_+XMx1AcL-$Ur>!3RfYLxM=B%l_GSWc!@1}W;t2DX`2)@I zJTf6rP>7HtJTKkcHDn}>50J4|NILa6Lss4h;s;jcXact1=}k0bMC1x|#kz&IIt{WT zfujYBquj)SN?<+yay_A9zTlqqbI6tzC?cMlzaXx*S+~5=ovh#@z(SVDr+V8$QGtiQ zI7)2YMOOv)WcQ1`(M(bFgi9|MSos84A&%13%W>Ok;>GK3hg_;7sI*oi zihpJ_>?iJn@EZg1{Q$Y+;gg?(8yRLz4a9* z-=v~?kRWX{6|&Pn==C^q*xr|T_nnu4TPd~S_}_S5pz5yI?(n-4QI@_4x1;p}Q4^_e zzm7K(6x@u*``ASpq-)0~&u=^rV*`|Hn--5L zaYWWoi^lhUS(G1Sc$M+LuhetlVoP)>8?FBRxw*nT5=KzCafA8^FUk3y%Tx{_zScQ0Yv7z}t$W&7Loi0d3#Rge5z4Kf?lliELc$ z2wQO04w#$lcl)gznpVCM&(iNI;bQz-Wrw4F)x#9R>aXC?9}P7sSy#63E54JPI9;gI zUOU?NK!AO?4Z}^B5DhzQCMpcYbu7|ppj1rL=ov^5ynXZV<4?_JdO#AAOh{~WM$$ZB z_sb|&2bxfUU5}`omEmViq}Q3Je@t(%08I&bGqkXA;ftsm|nWSM0R?Jgo6;IK-a>1h5$3+_6lx>i1;s z(uRDD@Bi{{@b=H>EyCugV`$g_o~aF&CO2Vt&4hs%XDi)2&C_{*qnptrR=c4)-}^vt1CpFraB z$T@>O=FxeyWzWkO0CjyN2^{-VFn>I>+VbEW*+Y4@B#B|C;2Kp<$fv(%mzT33tL>H7x`Zs!ZMd4&<;dyF(`Di-v;62Jmla zD#|srjl?g*rP&Kjsw5JwMvn%pA#CA)&9u~4cdp#=zGHvKm*V`Fm~x8BZMVQz)`4XYgkApJ>rd|6+;K|(BUatA zyWRRRj*lKy=_HD@z~Uoa8i290=j4KD7T0c(@Fx6$l1p6kQ{g2 zDqsNDz~8CNyQejT@* z#<#c_JUI^g-FKPF1|=CjLb_54_@@=+4abTtuB&LV(uN_RKo1E1C;xB`w#kT%_~*N} z*{{u#V{q2LLQ;?7v~Lsk5-x33oW<{4o#5@O|DAT}*XrKc6UQENv~DT(vY41z2|X=D zg5IOy>Ti{n`d^{klJ5`43H8gz9QPZRTl5UO50FQzdwj8P#X{VBYI*Q)xjX<;c|07b zIsUO9P-d3Fe-N|79v`h$)T8EL?_rE3kC<6muUYY4Wy2jq!B3rvNPC?!S_Pn)7qXV(UnYPpgwvP^Im`D!V%; zcaU>YPPo3T?{?h+2Iy6p%XLm39`6qfo}x-*PDNf?8LLr5OpUSdhVIuP4vIO7*)jC< zaR!^KJsxx5T9$o~y}`M2ovCW$Q|+CvS*bENEILQBQunUkgWh3F_1oFr zib%QF-_s^DQkb%DUSi3Y75K9zK7gYW*{;x^y!_QkDVy3;MdM@iW<6g}0=2_?mHjy> ze4c@7|9B*omb1y#)j1Wf2JkVC8TGR6{~Q&l)_pul(mkwYm+sAa^iW3)3Zw_Parq`9 z?y_tTo~L{iY%%sG{lhYggEFA03RaUMW9LzA{8(pOu(8>}a=i)81+UOX+Cf zgqJ1B+^3vAEPmcSW%8M{##|nDxZ|Qek{Ls&1gbd%M4iq6H9fl8FI-zeTb}ZD z+c-PS5)(##f4R^JP6NhQ583&mvUP_uHV40}+VL|}-{m%JWQ!WNo?72I`GoPQ^kT(- zs1)Y)Z))Sep1`)gKl0(KM{mg5_p1*$%cxvKDzC(8rvz4A{1p9=&TNRedV*!3$9H8; z85@dU)YX#f#3}S|e~izw^IARLQQCE05ExxolEHDb9PQ0OD+6zEW&7Lt{aBhz80w8+ zGOP6OvtARy9afDW7_V}AfSA9#@IdKUzdaW$#E=moRGAPX{Q^BVM8~1J?^&LZiy_V6 z<3tl@JsN3F#iiGYCg+^X9o4Gi@?wE-QDg?E%OKcq=!R`u`ZFtmQd!Rn2ti==NQWHT zbGcgLqFUk-<7v>}t9>>@MYA^#ieQbeo!umQoRVTtl{a^EPh53k{$qKt&kGeKI27KJQ{M-!wyPmkXxfKF5lIMSxC>ir{q~Y7;suMus6gI(XIsR=0oL65;h@Qy>1831D|6`$W2G1Z=2S#-?9_)GadV1SS016-aHj8Z68sO znM7e0@2F+{+!+j154=01LiBPkFm-A@Tl+v{^ILBt_WttWTk}7W<-6ZsjXS!GT)Bpb4DO>=Oz8MS`%4WEG-m;p&-xw zJ7ELk=rS|-HE`n}-G&Av)+nm~&aR-(YQ(RBUAl-02(fke3(uv#0&kld34S>XCE%HdfZTEp%_KK2sx-@QCMl-KC zWxI!Ex9%k(^mAHX08Cf*_d*-LEQvvwK^nIW(SYh$AY&?6tEhNiTX@xd?ifK@3ivk@ zq%{12{)!2)o=XJmz?3YI`j z3axG2=D&XSIxq^qsE~$gxt(zmX>K4_ZP--iUZ(B0Ag2;M8%ua z%o6)Q9VoB0YQjauhhQ1@!!wk>MRp>%R$uchyW-}YPsi2oWYbLr=UhD;gDG3?`K|6` zow_Y8$L-K^SfkZyJ5lb`_DeG0wHL2N!~Nie&N(;xz}b#iS8@G8U`wlr#QY+nr89S> zwYlzQoU$gam-+1S#Y=R$e00FGcYvCIt?Z_t8pY$duM{r=t&x|1QbV~_h{C^g1YiJ9 z#e4-3@Y_&1yWKX=wbBMnv zxxE#D1WiDe6ov27&Y}s;SyNHH;tQ_7dRw6dJ5M9IM4eQ!;%Flrig#@Vj3gYp0}J|i z0-5jCaI`#qVNGy5Ot(plZU+qrShO-+$6%e!(&0p$9YRLsP?t~m2`6;nZ1Y8c&r2Lq5V zP(ZpBbM6b$s{u5Xf_{H3>6}L3=2@`9ciq8l7XaiLVg(KY+*+a%ASl32n7Zh1=`8{{ zCjA@TPA*(3?ME(9_w$m|b^blWpKqtfFbvAI9q)xU*^h{lE>78I!I7SF9pU-bPcvtH zp7R+T-4OKRT&PXEU_CXIvq9a9N9;~7E~9i{p|o)N!hy0CrSOZ?^wx3D)`#&=G1*f*z{q)xd&r5)N-j^SIBXj zvd3UuzWXF(b>H(l3oTB{Ly{p|(vq$AK3Bj5$w~Xub*Dqlla79wmkXZHfkyM^v%)?z zdG#*QiCrpYb)WjqcMrNky{P)(a?nX>4Su4~tu?spK9$L0DdW4Q4@azFo1NyhPA9Hg z{|{Yn9uH+3{(ra0ntk6RBs8I9r;yx=%5AR^DtnYLV@>wPRUfprea$Pg$c^=2{`Mf{l=^vL@f^^r)rEKtiBm03 zfzU$jPWZG|{>dl3(7kOX;+5ve2!}T*Bbl^I?iy%QJc@1jhKv2!!iOGWsPuBw-TYJZ znz+^xa_TTIqcbYI!hEi}IAAcU|RO>$Pnx{huCG_8?YKaqah8 z(rm_dEsAfHjn~tP-xI&CD(ghLv3~7LhtRU4OS0h4GEJ%@4LA^-j7M!|C@$hgmxq_E zxM=U~?R49C)as+wg5s&^0!EJho{r%Om-_276{e)+RpMZM!b440>2_|%!riBIb

m zWzE}s_r?Ny{Rpd@fBNOjNW?$??!9g={OI|IvQxlp<}Lp{JjO;NWS1V^#**F+OI5wk zk|nkTe3M6sl`ygK!Iq+iWj)T}i)XH~D~^i|gxmi2gy%NtHbuB10?S7g>D#?Nstn&? zVs);RWprKlDRvj>Na@4DLaOm0|46X)+kAmv_1X%|Kzv|QsW+R|DBhHjHQruY3QPk5 z(DB-hjMYHXDcV3b)(p3aN3#DJ=&z}I_IcUgi>8&_ooY@qAG_gRUSM<^SN*_{tnN95 z0}4Hq8SFToA@E^E-9;&!myQ^sOB*E=?cbKlrW#e3dg;So~9q zxKK+smyDOpsrm!-1nq}B1QfT3Z%GrG+XM;cErQjZtumIP7c$PH%+S1FHwkBWSQPK^ z?k-ViYy_GAjy>bi`1^uE&E%!(wfPiA1L5R(aH{4%Qc_0LK4N0IX>hi2d|BujcqJT{ zZ=pJ_q^tVL?FL@XPgjk6^pSRy{=FIqFbUdR^kS&v)dyk&MlCAR`09CTRI*ax^2=@! zj)_-DgWI_#1X?G^6alImio7jLwUt9|AhvttXphxFkztm?p3E~VJWVmdJ~H!!-tWKp zptn$habmJMh`@n9p*c~{7J&#XPF$$k?wOq@{C*>77Cp#Uvhe3QoN+V7rUK~d10Ee~J zTi&66DmLwr@#HrHD&<4~V=NIq-$Ni6->wif_&{@L6Ub(X{5wc08U&F!BMGz(JKv!3 zHMg+P-oT#HR`_--$N>FQRScC|i~GVc4Y!YglW9c8kj(K+CHCJLlHDEcN-Pn-qGvQa)MqH^ou_Univ~`31el0_OTQ9iC-#7zk3WY@1oeDf%*M|_#HY#kbHLI&vOnm>LO))PinR z{8!1PV)wgh6+^chFARy|4d!Ke&w4!c*&G4tM)QuJ`EG9BIH0axmFYByk)<#RA9!Hc zyC|LsPDUOIA_|*i+@uRjM}K{^|E&jDElfv4uBq?u6jVBOC+!l`)_zZu@z82|m}~Bb*jpv@ zDfcAqX4s5HQqMKf{1wR`aeCU)mUp7>n2QhQlMIKW9Z5qysJ4@1>}kBFvLdthx%7*Q z6=RWniOO1tYz>Yxl$sGf91l}3!sXyo8D>k~ox35s zkP^D)r=sl}yRHjsMJs#`>p7x7e?#H)V8@CBbs2#P{dzWq*weZHx zbz1$<{r1W5)b`O^c_kbC9u%Ka_%*(JoEs_p7yg_|pU*V%kDc>{$vGc0@&2G*0f5ZV z4nBIhrZCqH!+K~;UBVczI}|TJ->B}v=vREs0GM2b66_$J2AIz9GawWIHG#IcOs!pg zj3062Gu^#soaPmFV?7(i32ZKuc>;1*xZscDK!ked^|}t{VOWn-76Zrb;75$HSIjKw z^KxoC0MK9}XxM3VyHAP!p=a^M+p>i!o<(mT{Xigg2~Dp7-?#Ml89{6sQhx;xAM20;jYWXrubi=<~0G}b8<0Cc<=O_(Z zhgsX>O2qJ`zwXLD#uuQvGFZ_+&tN)_OSp}DAW*v=Ouh3nVQ(0Vpo?I5s(iY+MII4^ z6hGE^8hbx?MegmE95JofkGMQdPT&Lm)roRlo`^W$6}~>0iu$Qeo;T4>eRBLn|B*0v z>tlXzPjH_Q!rskeICVNRs^*(8IcQf96}_w~3%Gmif18D|`L*t9g_Mr z7lKaR%`Ya6yR+bdV2D88w61|TM<~fUPbR(*I4u#|pf!FBwU1U275dtf0fc-Me2E&x zk3YW@0Jud5sR*o1l-#2eD2t5FpZV6^1x`P*{D{N9%WJdgT;m?oYzQ}kYY^0^miH!E zrga>;W)aP7FMw-YO5A~BcSZ_%)AePA*!5CCF!hwPVt zPSU{_jgi69XtxBPyZV9;HO<-$z;^jZAD<0eH-HQ_5)O_*39%bN+9^B~EB(B`uYwn7 z{uH`L-`_B;AV>2i`oOUJm`4GIFdcnqV3BfASh5Jc0ABQ*dMmD_A)n2AB^rD|suiDI zufAd-vNWb17ejHtN#^wZaKqLX`I7mD{{Su5Ehg6|hz*0=we=k|_kzkn>Z-|*J>V_A zvk(A|>wfY=l!ClGGOYwhAb-i6!i)KZM+;HA#2o$KKBmBI2?~sj;iet+$iImHa(>q+ zAZ_>WiV`a8IoEP-cS4QWsOM3O!+9M68SSJm-Q2}C4x0Wdd3Kq8Z?`AH2kT<6<#HKB!2H@C`jU8 z9frpIgMzciP)y)3?52nPv~e zPu&_gW-S?m3pvIe#@{1e7E6?mBe zY>$j&@?;x}o*5=t`m=Rh+#|gmA+67J7KH3mqa;|NV>&Z&?ugyOyA$`PbY>2;mDy0D z8$Z6r0BXyR&9Cn-KS$h%If+almzUJQL|+hD<=v6mwwT~?qzOzbHs6qYYk~%D>0U~s z7|1owX=*>V2xF~IQm^lPiD+htrB}m!-w{Q>84ujNc=*+k!3NCc ze*v2@{KWlp*?1s2fpFvQ{~3dZf~@u=8K5X-m`4V>s)mjU;bl)~B|gb#Kxc*=L#Rz4 zTxo){@HU)E4;6yE{MJ}90k^3I>h|mS18yUG<#WKi`D-iuhW=-zt|bm&adZpfWT%G+S8J`%c>GjWa<}nRWv<$sGQYnIGFNcL!?h?~K^XEmZ@#Bax zeiF*@*!$q^(3tuXA}dmzl`Z#{B2T~>L_slfzTKQC>;rz!%7*DuO|gf8 z980%g?9VpwydsAEx2_oKF||W6@KuS08i@1ZEXkzuk#6Z~eC4@$zlL2HVjkRTI1Fzp zBY@S|`_RW|KLc}K1YtQ=zY9v}@#tnrkWrqN5;1~C2Z$t@9_e7b>R*u3M9XVUwcV)@ zQSqCH60j`#y2dksIr#!I7UvAAH*&uH^n^~D%dw}Y)@(~;kCnW7HE{^sjJSFANuiRW z!69;em?Zb&H#1>;pCHj+@Y))jEz$V=f;x+v=$6mClqx>iurP=#Em5Pi`HGI_rB3vo zL04n-Yqjld3XADJjs-YmVVCsOKrdsOYiTNz7jWmoBc+3cAK@Y{U%c3d9vbZS|e#-A2QBLU zK9S;*!N-KST3j_cs+*jaq>olwUgbKFyv05FHRLR?^YcX|>Gu`0OmG;#4%<^R3lu{S z@I4TBdh~<#U^NkDq2=9`x-DzRuaG(nhb7@7$0IyBay>iiT@(&fZE^pWPUjpo-)dC; z1J4r@P$MW(7(F}9`bWC|uf{{thSUjFd6!l9b~{J{H8WBd$G-XeoI1O)|FxLv?+dQE z-!V^@;<$_wlZVH^#@J|B+z8D?S?sXB>igYU5*zvy{kGWkOa^0d=hzGK8an&2sV1fEP8>tU&F z_D=yHIvptl%CmvH+I;kCNL`j+Ualb4MIJ`A zQ5U`^fH?W;VBLaWX(L$-lLp~0MJ(WR^F9rl=m`;>K#vhlnv5R>7n8PsMmjN?f=PlX zGB7@Ra0fP_{sr+XtUY{_cVeIMHQ+(l3Z&2R&r+8zoaqtzTN-YHKo}eusmXD%b+>pu zb>g@YvTZMdvGdmXpxK)B^>#*xy#0B!W3J*$rLB?z54}G0&dGqD+lF+HLrGo!65a^) z^MjXlDpbXI5(BY5sDK%CDfRde3FSat3RpTs_=&-)R}W8{KP`R=^zG2_V6K;#aM|PcUAq^ z90Ck&ACGZ{@R~Ywv7k2%TXFL5zKt>u{R7ilK1!poYh^K-XcQz6RP4)Z~|yHu;s+2pyoUW+Rd{!UI9g8n9nO?bCxH z_Cqf~VIxM7t5IL+l$&c;l4`~McFLZEBOt&b9;CKY62v+(j!7cDVMGYClA6Pr2lG!V zx7HuAYrYLQkFNHuva@uXnP6&3*P%u&GvGJl$H~5G>XTjq9$iDh0zD($$63H`U^+Kc z)AtjE6;mb z%U$!^Je2ZapyuFpcckxTs!6k-qrFnSjJHqPtsa@hHTR31G#R&GadM~I>*in;6GOTrZ`#>a<W3bOrP=xM#DNoye*hG z8v^SSBLb)&Y#lb%uLb#wx&ME3(a*R)tQ%i)bIqxP(pJ715Vpof@*i3nvKrR0J(`N} zL(b8Sm(a#Dqhucsgt0sSL@!(p7X(D`KMH}aws^oZ-sp~sWb2^&Z)L-ak0mh6a(_n&eXRz zW>&*Nlodec&j)Nb$OoFGQ<(RzxADfz34L*$XfPK;Ug1N6``VW^W3cM4Mfjah1PzpJ z^>M&R-YuS#FVXlvkGqyy!i;mciPo?$-aeY;ps@U9@wFeg7XD=Qt(Am*3&-EeABw4g zW9Xpx>nPiS(}@=|s)YxWRR43Uxr19Ri&2!6r%J2##+vJCp_Y6QLw z6x|a<{HS178!y~Em)H4lryYQfQJvw0gaanRc$ks!0|aGp7>>Mk_-sJ9k_tgYZZ9E_ zv~2+v>aYOAw6cZwa(={Jm$bzir2@;`ZcM?sGr&oEoA{+l4YXv2pG3uu_q~@2IE+SU&YG4J_MWK{W^i5R(`E%9izRr_;n- zCjmN+f9wGS!-g(WTNx}^Za(UzrC?KwAsu)(uF(HHcDm+@>WdwU>ZUW<{r#8}mj$f3 znMA=)U#)S3DN4=iEwj^1&sm|&s<^#FO)ODZ;M~hHsl#g#$ZeS}iKIpO10?4(U54%~ z^)~H0GJKjWvf5nv7e3=fDZ3g9HY?t2ec;@QMN?iJe&Q_qKHI)X8u^tdrVYFoKfX+= zVrcd>Ez;Iw0OqMre3*ujq#59no6Htqcu^szfN=c4>1VR(Vbok)ou%U$7i|;F!agTTXmW>i1a* zBNa+_P;r%i(Kip{S#x%*q`30q^RTPO`cS{_b~)+Sgzmk({a>po{#9E;qYqTd`azN% zglMy+73Q#fgss(-WQNN9wC)4a;^tgA*H3YES7lz_$EvBvmA z=t_(FtD{$U!!V7{Dl6}pes)wzpPvtY5U`f|S_}caQtzKznIru#+H`O+wi0XSDzT_! z@(35pNMD zKhw_-vnnbTla_FW(uV=Axk3Q(cZ`>)W$#RHd}LiUxEj=_3j{Bi?rkkkrC>C z0I+?cqnJ+KNX*umNo{uk)DbjYaOtVon`pxr0EUJM%-=%2+l629Cl$9Sbu&K6pkMT@;# zX&x-hYtKo$d2r)U50e$xS)aJHyLreTMye>mW|M;BbQM15cK$4OW%b=|)Va~;+|O!$ z=Sh76ih?8oy=}FW)G*9Wzw_w(cNPu*wNREqXR;?=evn@C79kl0NG#u zG*Xs<5)HH(e@jdnwBKQ_TZF!*jV~Lw{g)du`&H=vcslseT-d?7$`8= zE!5nW`Yv^sp)bZN8+#?es<>d9fy&@@!E`6%G2z|bt43znwPL{zm&OjGePWEt1DBQ1 z6_wpFP4gdHCj{5K+OKpNnfhPI8XMhOS9!f0T8w8Sg|Y_d+kph;FgNYCRJl8FLaVO=v$XL6MX ziU$#VoZ_v!^^x7Lp{jUSeXnkp0lG!@vIW85QFm;`MI&8!P>nv~Dc@qkEkozzWiYhc zOD{r%$TNr&iM8jX>wn;XA+>A|7eXex^V<{cdJH|nh< z@IU{(PJ1;(WQBL zNMWUMdwL0+!of?zxcor=5?M#89zohnxx7<1DQqukS>4Rgc#4GBGzD&rX_&qHCJuwm zMn6bOu@PCbZ5Pw(A&4b&pP#<}m>x+^sgb~!KA05JdPNxM`s`FMf_1!UE|CuJdC9p4 zYv%OCTuG)hzdcCbXyv{-phu^CgJY^h3qI^^O=LuTd9W*xrf;;JGzJDQ+sUM-6~aC0 zMy3bQF<|@&XG0(hmhX71gA#A9k4vzcV!tu=?JgV)2rUBH0c>CNnEFJ~lE1Wp%=I;Z z8m#kmF1#>B7e4*#_OpvHRpdHgXMHo8>E#Zps~fvn^iK88F<}=wze`--eGhKl9n4ZC zf~}XT1_?oxVH->ji2;kYrwQ5(F(u#DrTmMT4Kn}S5*(Yp_;57QZW?gjWY4MWutk0n zYkRInsu}7R(#6erIw(y>pdVJ$a(mkOvQ-)_dNIkF0A&Za5f;7YaLZTyQ>>oxumo0`c(#JJFlhFJb!oV-1o0dV-p* zZ}M+7a6rn_H_|hD9Zn`~!C3=5)j67%hKA_Bi{EXdSg`oOxghF}(Mr-Vle8Jd?$O_; zcDm*+68anz0{1z?e0Tkfc{Ib{rA#Ezuq!&9Q`{a;Lg=YMW`C$W*4G2M9LrHVGvisG zW*gJ2c1bn*c6~t#iz%!!4udfjuD`w!&rNn_NN;;&D8}qMXn;Dl*?BjXwOGpIM|qjc zaP#|@Nqu{ zX%Yz)CE4pk(idKYAM#m9@BbuY1nZLC`?JZ>VsBQcO0X{H&%hQCz65;tWn=_#&LW#& zQeEOc-NDW-l*~nGH}?}Ttxq>A)^f=Ud)pEhG5k!Auwu#hz3+ApDL;=YTwr=8aD01< zoKQKkmmThm+VWagHa>Z$z}*3}u^Ul1x>D_t9-bM|Msac2-O;rJ>sXa?N0T4@crwh` zxWLw7JwY}14yJz1yz}McAO^3rpXO9E+5aHV`L#Pk5^yLz*5y;9?PBk&1$`^F(RSTM zboqw2J?!4P(|;`B$#WbmN$XqqQ@r*50rvuy)S2)4qrs_@<)XjyS-jM&%0g2G;&y0t z?HKVQ7Gxy19FtaMSJ7wcUmLr?(AEoFo?A-~2>F;;P_5nKy&pz;s+5B1utE|RyjvHn_#xRjB? zHFzb2?&xEO{k@twhUXBh5rjZsxq|!#XK#;|cv%S#U{9EP_aE*4G#cw!AZ6$5I-1#C zsCi4qwSQ!v#1y~TmPLNJeN@raS9P>9&8jZEpZ?P-!;Z~#k;sGlmoK6V-oGw1^HG>+ z@R%`NZei@}*}j`Y$FN$I5+$8ueZ@%o(%r%d@@*5n%^&XU*Z$AN`D_!LZV53wZ11mSOq&n=^r{V{? zU&z?u$3|%J`2b3ThEy}S*M&UnVuZ5ExR0y77@bhF{8*ilB0&i#yk>7UhRr??zHq$o z2eu~y*>3ZLT)ZALgZ90g;%3Fdc<`LOZ&bA~mr$A9SmXLqTVGW!$rSWzbkxZMLyYb( z9OX+|zqkHDO>xgG+eiuQbt2<#D}D2W9;)IdlWt{(_#++U{&@E~ospJ*WW-jxl>INJ znrvP5e<+@f<$Z1S&iK#t;PR%i4gF`yz#g)h#!K|ytBJj3j`w$0_B5#r>C_EEHH>2K zoQLFwIH_rXWp^VIccAdJz)buM<}guXcP%*J0;tZ5E=`cJi6M?!ZHrEVwuNMbVd&Br z%eFm?_Fu%u=p}-6IZa3+VQnYt?@swqsLDJ^FVYeVT!1X_ z{DL;OIArWaLA})yN9oJ#Ur_~AvrB;`7GXCY2$i9~*@*QidXx4vZ7#7<+x`))D*V`D zE3IO~l}A0!VrLLKx;qvXDAtyc&wD_Jt;o2kv_?pPpQO|Ejz_YXS9-E?**Aj98WmI!h12izm9RHhTNa2Ir+g*oB#Z)hVf^W#>U#XRZ_80H2sUf-)q(&Pt)6C__h(Q6<4Ig$Y`)dIHqOQewESXK z!P7)_kKcYeY<X5ToD#TW9*DopgM#*J_j1mEN1EY=(km?h74~D~isFfr@l(&YjqfD`u{1ZJ_hABW!J%F4}Bn zyWZBV$Pb_YX*{QBUXf;HxPJSB1sm7QPYyeKxA~E75{Dh&w||6h8@iHVSANia5{TSB zJhzYOcxA8gdZWpQN&MF-bBVIG#7sY-Xe6WfbYUiCjx>_E+c@<3lKg8i!@m_M+|KdAtMtW7*$f=(_YTs_;#Z&`s z;ADo&ITv$Hh4bG(9`IY{z}k2MPJy7-Z7bn8V76PNq8f%>fF`JDc4Sgpl44(*WWepl$s5zuTaI zuF=Z;6ZnD76$L80@n&c=-c{&oh1PahOc}5O(-^9cpQW;#|Nojt#_ZD`D$9= zZ8`aZWVTm&pus2ll-uZH%sYP$cIEyc&3vjQ<~r5K8KAnos0 zIF)W0Q%^{MRz2h~1sSXm_-@goi!R~n<6`05*xVwiWqv{~JmzN|r^HjavoF(x4nmL^ zBAGdpAp)r#oWXV0Vi{vU<#cJ}A$WndCKEDHX}BF6p4eN53gDgU&Oo*t35X9(aT7|J zY*x?(3_vtO7(J2QEHbiioJ7V+h-<)8l=P?{u@XZu(%?@^#dp#Uo#;p&s&MVyLg!a3 zs&Bf*(uM?;W)CEXpVa&_kUzUAI-&Y|C`xA`igH3Id(^mU`_-e(s@r``G4tGCbQBEo z%wp>m{!4Z$xYlZL0b#+wj3^G7kRt+>_j+};&Uf?rQ>bandLu_-w8sO;?pNeMgi?{yp;+>z~QS#Em(cK7z%ul-_A*z$( z3A@Jjt4nuij)iB!>IPHEqe$bjp9aMk*MO@teg)R!CBku$X{F3l%@8*y$1yKPK-#*L z$5PG!q@tLF3@_C|-gww_NHvkDx=$1|wLM(vWu?phVOMGG$B5psW0Ns*+n;994SB;# zC?D|!ufmJ{-cm6?{I(m?htGNZsye=Qzj;T=d~vTb-=b~C{^hCUV6kzWtl-Gkyt-`U zOa1i06qw;!l7Ra1L$sPpVdgA?oe0V^nUYsckr`*9p2g&fMIlogiwp|Xbx@{WN5H+R zfo$tTK~E9280Rfmj7kr2*%yZ&GB8aFXt;I0@$K2U58o{mZel}n0u5Feq1k>QTyF9B z>2)eboMGWN^&;v!#{#xkVP`L7l32uL_>(Glv}1ri#(Sy$s_`d#W22&bT1NOMHtQaq zIhA+)avo?F3^l_9r;CS07GkrAL4bmpNlJZZE*N5=t8ahvk9H!#C_Mn_I#(wz85~4SdNrWPugNgeBKw`#rGznm58|F zKkMj^Vkx-;oO^k(VZ6BgKMAkRmu4J@nQj+@$fq7s#wwP{c}_315eXTtz2kOnaaCJ) z2P%B5ySjwsf@*Ja?@iVp6bCo*3-{GAZf%{U3^zLemE;9#E5%Q!;Vl!hP0JkdQBVGD zH~RgO@zG1mi|;q8eo?Ay>gjI!pnot`Iy>rO<_ljBRz*~V`e2Fh=;|iByU+`H+QWZy157_(9zO za&XqBS$5rz+2YOe?VO{=ipi$97E?1m8ppozCw6(%4nb+g|LrbOymR!D)3{nUv zt?IOr68(bDv?|79tur69@(XqUYt=h!$r;vuMY?9TR~VUBs3v~KO~ycGf4^9Rm_1U2 z+rwt;j}}aCYxzpyy5SIft)WupTHj zGuyR2mHX$oB;hH`MfD>AEV8H^lp>lmUmjGVyWGXAVc~yp?QtL4k3LfHneP-2_mOty z$Jr^LW|~&m_R{!Hv_585a8MR8@g|jSG&Q7|&uo(yrkVaRloGoG`a8-y<%y%FCc<6} zg91ew_8vAVkLL!KNV;@$?G;n$#Tph@EKY}KJ?Jd9x@tYT(f%z>5Ngh^UmPvfn4{spr;?y|$@txogZUhCR3@s?#CF^(GX=_khhLLL~zA=9I>UV+o& zMmMkehJ6{l6*4_}i{E#-o~6Y`X~#6uF|^QJe*erhUmwv=H)UN4^UT3`u;*!W1-QGD zPlQ0Bl>>R+5aO^e%!0R1PdJu*DQdz+Gs>WMl;#Qy(a!ubMq-wLD<=WBrhQXC~xMKkK*G%8wxKq7Sg z*$~UYfAl!>9k~i^i-Maq8~^hFAsqmf124gJD#!~wS(;v3#xv>kEtJ{MF4@PvXh|96 z+jB87D51set++=vpr1O$T2BX9%lmDVM?y4O zZX2#A9b(sQ=q2E`KY8eY-2SH#8~{u#UdR|>B`b$6odN!rHGa&`af!)HyG9-@c_6|? zs20FNwS;@rEJsN_-6`)!Nq6A#yQ@j4$KHB#G(Y;y)x#pSF?YMUpV8LF(1s`o(hYwn z;NA)0fBiftmfD5D>{tEBKbEyxHur!zClLQ$^$mncKo(*_@hPhX(+<0RBC58LE4;UE zr3Rc(a-9TkJev!;m9dnZP(Am8Ta$O|@P6#gy7mdt=R-ZQ#nlNlL zsgVm7!-cwyd?D5g=+zehjha?ZV%nnf;y~Xi6&qe9l#seJ6>7_LS08lP`rwJy13o`{ zh6f;I`9;}TI9(7SbOhL;!-I3TFUC(F!%VhO7Ubt5I$YZaI^U+g;N{*jAj-YJ47#MJ z$MvwbV~5)PCTx9mlxW!3Y@?V$o(Ad_jlDg4JYuW|)FK2)Kvn}0_;D4$pR^{ zhK&5#n93S(Z{F!b%3@x0A~~Tjg@-0qQGwoDvH0S5+K3;D1H8KYc=m~&pHS{gAiMiq zaY$&>EA@ZotSSB3i%3Vn9@CC{SwFxpl41C@=fqxjY z1hUHp>U}n}iTFPY+0SByAmasciSHn>3qK0i#hxoWsR@ys z-l5sekfEIwKP(5C@I%@4sc=EBA(T^FmxVeR6sx6qo03h2P~(%Wq;C}WTPgQKi_>xn zY#WzIbs|9oGU@F?hhpCCUpFu=#`3@AxYzhqrXgQZt#;0&61!Uac0aUW;ww`MH%*ou zwuu-=L9#li95lb1cfPkAtg`Tt^s+7vMxA`fUb_H_Ul@IJ7*)+;Ks|e-bgpcd7~8#F z`|1z*C(lWzCujZ^5#cctHyqIJu(E<8(QmK{orlHxl=zWvUsvG1|8)OZ3gpUyCoy2& zOHCo{_Db{J%~+EW3;k_Ba6+6MZSUy3^xxuWa-`C8bNyr5pmS*UsmF>-wV;UE4eIJR zEYIBrH3pA9h~}Ek&DW|8ZYYzampMh$UQUCm@3;tRxk5-23)K<(#dA$~eWq?iVyu!d z#J1qRqPpn>qu!r!y2|0UP}NAHWgqXfi0A$I)8K30VZZ#tG!In_&1C%>0OX!!G8`+eeK2T%57t)MZ~ zb3-tII(Pt~v0q~fqwHIse>toK8oZC#7VHjZ1-W5budM5M%N)m?N~JRUUfC2bcwYD7 z`_FHjWj_K}YxxCp-<(I2ci7Ng?3-YT?OpS%`I}!|>_zVn>D)sP=+|o(-YI*1)bx0* zF5hqJ{)#Uu+Z)rq$M^T?e@hC!F^V$c?2DrxY`B~2aPW~>Z-R?Yq$)VeAuJi)aTRYSzfae?( z4!FQ&>wN1(V&V#rnjye_=f+XgFuZen?(?iF{?1kI4m)IaA(2VQN)?P13b@cfUw0Ccj~-JTa9nBMkI{>&Ms7=EOb&Gy445!JY}Mwv zhtScb&Vsw58^b^W5@?|PtpRI1gb?EX=*i&-*B$fryPHjdUPK~flU+~}krF`&_s>2f znBWkEKm3j)F!n!!4#)sKQc8i+;y;!T8re_asuDvTqbrFCQCAmo=o7yBmp6O`L!1^J z+a54SjR&1om~PUOBfzX96)J#D0C^7p?UI3{B!*&$jtO}OxKA*7OHb+dBCO%ZI&oF@ z9;N#^BjS0>6|W*}u_e6vZIqIEV2V(D(gkG@m?d5{WwAJ47SGN~0M>X;921QGfJ`;* z|6uwHjCQQCg6)x1=T(ogpO=uFI3&xtW*e^dtk#C(?t${7;Vv@Vfkx`d!+0mEFt**= z3drDsOf6xQJwn*E>whuFTn8sm>Wh8q7dogL-T;I8Hmdo6R6JP-7QAmZeDifx&tb>1 zUeE?0Ekl%vxIq(cNg@hi^a?Zra4G{trtfC(EOw>LAf7v)5)FV(Ap})?HM{_c1P@@_ z8~6tuY8;Pvz%g3kT9+PQyNRyjytC~sQ8jYTBR|G*yEZC?m&(*5NcIL$XQhxl0I~xX z;d(v(R;rFvdKq`&WCVdZN!q;|+@~dn^3e)Rg2QQIA%O0OSBcGD2ug`Le z&N<0G9|Kt#AumNIPe*C~#Z)E>hh>PNG-dT8*j4?Jd>J1&?@)DAOA(Ufma~BmP2Sny#ISjFn>=Tea~T zoWN#+{{xA?J~$wA$jmD-Al1rdXV1trqs5Pu_Cc1Zpn`38@embpdtZa`gvDgQJGzP|HYo}_H zYods6|B%;K>C952#N}@@^;BM-kK4Q(lp)P~ebH>j=^AtXjb*}kt#NY7)?SVHq5?PU z83(85-JJpLNU!}OBuC0j!(mZv2I8b3M~dR!;-&TEch=h`x7b{6eJ;_p0BJjXuKHWX z*M_Hq2z2jP12dGBnAiEZZ6Qw&@oRt)RI22DINOJpZ0SF+k2(M?;AMcZ@q!s}NzoDN*qcdw0dn~wKHg!ei6%Kfns@%@Y&uh>cq z4$@9cslmY`)kyts6Rs;9AM;R-)q8s;EzAzz>FSx>a<0Nu&gBlpPbQ}OQ4T)65`uD$~IM8wU+YF&b z$=rnl4`addctXdT87{6Ll)w$e6gSW-vOR}LL92QTxm2^WOKwgNli1=u} z*s!FdPw594I!}W{I*1$aEJ1L^lg^jzApEo5U8;>TjDA`t%rFf>k)CWgA+E*zY`8kX zo?X$7G5w*$>EGf*RF_(Q@&oJrpIc`MIQQZ=%;*7A;6WGFzEm$$2|0%L)6{vO;FY8@ z(R1$~6-tygV=?LFspl1}uiBQoAatGJRt%W@J3P3Q#nc02^+?4JLG7)8X#|WqK1rgoJh5vM9+< z%c*A(sCEjIJd#8r{wc{Yk#6A9yFJ_tDC67ZPO|n%A{XKOWe$WyWO-GPBK(!wkt4qE zK0x}^ISNR3ILF9T=X6aXr2CPqER4`k;;o%o@I_C-IpzprajW8 z<{w%!GSraaD`->=z)@AX+R!Gefg09JuI8UY%8+0p1h;!YJ}(b*(=~pl{FZcR9;0S5G){CxAg@xu&weDqapy73HPL z*Z%&to9q^CG7Ve%3|(Pt*#}pM#X>?^d64oSX=)s&Eh|kH|DRvQBqto$KpK#i{{ZkN zh)y6zj`0Z(0ktzMcP``;h#{WO7UtYe?~rbN{zSu|-=5hwt!GFZpZdYOjR-epHQ)|U z^K;!TOi3w8;g zm|eEA# zQdkiuzb+*JtNya86JBIJvK93FasVZk;t!*OCl@(RAW6>l-F5PmP)pO&SbeLb zkRAa7A99rIWv2(PGN6TXl4LZ+xsXLJg^Ys&1)bD92^~$%J7@gPr2E|CRgV&ICE#^#Etv6&kUG zf;)=ftrKoex?wgJATcZs1;uo95HPQKze=pKd0+j`vX&Ijm7YK4`0Ty&J5_qkM%HI= zbuV{~?F~o=!y9!%&(sY%ZGr!S^ft;ydLAyzWv?evi%sD8fR#2g7ReyHyzS2xr3Q@q zIJVuT*@U<~1pZFNwS#kLrJfF(%|roj+UaTo@bSB74wAb_gp)+XDWU>y4Il~FVG$_B zm|#&DQiT${;j3~;s3t^(K10kf*LxMS7$({(t8PLE5II_Tc9|wB?R5FtX zNk*F~q?f}S+Y%a2J8q%&B99b&t+25Zk6BTz>NXpwm8^91GW~9&b6TpOtn`51%WNtC8;qAKx;Sw=65VVD#3pr6g6Aj6-}g(VhboJ-U2%wGYPol@Vs0 z1`vye*u^=fnMuZ5$bW5$J4A2(bFyCSxYt2u=`+Y=w@f^6h|;jE1Y$@lPfzejlrTTa zRtusH+AThQ14}p_0!kBB@)MIXfI*%}W_0^mDp!CSA zA!IKdscttK_#09&nn_94N!D&plo&{&Cy0ePq}sTCXKHt6%c*n|aU_&u^P2&#;}&0u z313kyrt^wDv3|7RM#u!&*Q7U;$L2x~YkO+h2Rm>}YuasaWnWL|>WhN|lq4#LE?ZfF zcS4^mFGFQ0mP^_>#s44R@Vs3n17~r6#B|qaWu6&1O#&W8<#?Rbzo{5xeVygI{0ll|r;@vu| z+;90HJNM?8OvT^5^1*;4dQT-kKkNru_ zg9i^NKBZtVvS}EU#)+?(xr!P4akO&A;AEehn^nhktu8SIVs9hO#!$Y^3OxNFrjS*v zuGO@7;L)pyG|h;B+_uCuW#JoTWjrD&@xWPJO8R+3KG_yYm=7YW-bpocbbt`v)2u$DUw=;1C zNCCuNnBPEoinx@CbG)@IaT^)KA15U2mAj^5yyMb>_lqsFUF6oj1!@!pRyzRlyDeTrq>QY+J*T3R87A6rapi!CYd}(9r zm#{krTz>Q3FD}vbp)8da2lY)@(cq~>U#t8Te#(fp=0Sg+?K{OtOgPX43x6wSQNC); z3!+Q7>KjcigR^q>Sf6P1tD{*fEK}lYwOl-S_)7P+tF8QuaZEoCg)s+{(Ux81`N}Wp zcjzQK&Q6xQ4snzC=7(Dbj>cOBldHQ@{gSc|L0u1a9y?JooWWZFRZ6QyU5reBCpvz1 z6PT+2xescfRt6||oHlPBy9V|}`DqQz5+6CxLZW`jn!F5bbbNl2GQv`*zc=YrVV=u# z#D#Oz{CYlp{;BfZ+}zgBS{TvDxqP*A7pYuBI9@hxt<4sG3Ayn10-IO_(#5q>ZpuJ# zjWc!Q5xuybWqwID@rMKXCvjXlUy%smdnB`D*$2Ta;nCAaGlge1|1ZMcI~>lf?ca|Y zC5dhb(V|41h$zuT5Yd8Qgp43cqDMpekj%^>uwx_t~p8H3p=LglW9Tu}ps^D8@NOw|p<473HSWZpl^PH$3rGFabBu zauoQ4Pt$7@PwsZR%quP5#2wQ|5}Oc4?}Uq&V$XYNg9Ue#5MMtOx5b|f13u|=42<)A zMUAGJwCEUVHFPI5XT)s>om7b}i`t_7>eeRzSq-d5@YDCWv$1Mhowc$q#8y=ksz6jcKq(E~0sdzupsr@38G4YQxF6BE9kgto| zGI(bhKZWi&vmVc2rK3NXmodp)B;$LCf8n+N*1mmr>jPCcx{E_J|OLahl|$Lz6O1oG+XE7P;1j5hT2u6L#^1>udc_Eqfc znJj8=U#g{Od}npxR}09;NS%)5CR&5GvXUKYQi}rEu0m`gfCr1W6g>-4p5OYPkdOYt z3Img*1}xIG7sM<==**T|aoT}4ZQ@UfjJZRq&_S% zp968+Fig1L!1%QQ z!YzT~qvYmo&x0*;$O{km!}W%v_TcMXi(w`yxB8Fvj?gwHa)PH@-)%~LBAwHJ`FJQF zo=0l3y0F4mX3BkM6N+F|nimrTWd}2Fr$DD8ufYrF7l+VU3BgnF60ZVJUP5Dv$_-|! zTu$}{LWcCc?Pp6w%9dpE;br*Yo8k?7k*57N&MK4GU+EVd3(XbTR=wTuNS<8_6M+GV zM#inbcy>|m2bCu$b0Z1gr4&cV26uno3n992KQt3^9!_#4m{8$-509@2JV$Y!VpB{;j{goj zRET5UCk3`43Yeq$w#$Hy!29a`rXbd(_QIt)W28cUt5c23)V=`xHN-{k_>d;ar%?_H zZ7WGk;D7F`Njo}mO5LVDa&c=(P5T}XX16=9s1I@5;YNr_i@{32r+Ib!;Ae*!puXtk z-p50lY~}P_MV%wdd_?)IR@&x`woy{uR=b2{Vk*LxPSaSWd-G(hk!P|+av0YwjW6}e z&Oei^Qfa(Ti{!|wX5On*?;kUjd4G=F@^O5k-;L=g#ko2%1Or9+?F^&y=+Zo;YIY{O z1Mz)5_KW%l(p;bGC9{#77DWAgV7TBua7Ga(O{VZqPiy;EpOaKqtBYfA;LT_Tk)`C# zrSM{S={rOE4bH2C3J6we0@4pB7|EFZ_Q^-SG8bZ*LdSpC*6m5Q>&4}0C^#vBVM50z zrTp>auto-`tIGW)vC*8{U+T%B^#Ugc#=Q^h7yJffdnO|uM|B}yf~}7q<=d4~+ug?> zDoL!?zS?gMhOFKn8THEqi;nRQnZ}F>u6A$ses7Rd4U#JVKW;UaiXSviiEe!Y1?sJ#+=~*`zjRx09{jl;F272=gIg z5TsX6pPzC&+)#MzTsNr6>nD*mkFwcO3tike_x0^faC6v5b`lW78scqh^l67N%5mAh zmfA%w`pp_6Uc+kM=-K;h6ge%_VV?9U{3VK%9Qt@vu}k~8bThq~0MtCs3 z*@Lrbgs|fCn=fp>TGZ?kN5i#m_CDoP9%5AuTkp&9M!%yh6VCB}q@SL+wtZV{XZ?SI zR?{&YG`yswm^2A(>`WZL!N{bS{d=rMm;__HUC8=v(o|p@$u1YJk1Y2cn?Gs1q#reFnVnz1{D^z5 zqIq?*E%M$wzS4^tTi@n&S3R!QF0>|+d;(LsKrg=S*_$K1)FH(cx^twG7eY!;O`fITYT%a?vg>;H{cFFoA;ZYs_fLprGC|6ix_J znpu(M?%5B5W2rJ+yxdB)%j zmu4NBo_dkL9!d{=1=F^!{x%ubTqMZCgYQD zlNvy#Z(sDoG3hGvf%mZ1o!#lzjpE5qzx2}CV2LB|BT5kNVhSAf>u0eN@~94PftS(e zMLq`?&aJnFV_#Htr5A^VMHZ|Mr?#?r9&U zj?H_vZ}uVEBn-;&Q#tds1Wxul5n!S!9NVU@Wb%qr>ka?ty`V-&qt`LWWHLF_yDU~P z9QwEFrnm}vt9BWc*wZ3Q29|vfp6lmx*)LQsG;FB$GOX-%G${r(TNoH89eKaG5gI7! z)p2#J;i%|BM*8U8i69pD$%3fdKADBTk66RX@r0JkBt}c9io*bm)QEOfJ zukbfrPtr^lZVo7Ps3H%BW)2x`W?I&zdkh+e##Y^bRm)DAK=TzcSEh;!eBebiROB|? z{5P`L=*xY2u)S=;^3wm@*#*VpSxRZ!>!&HqBVol6a>3u8(9LQ!vL;SrZao1WS@qyn zReD9yhK?$?7ilFCCRRG8)UB)`6+Y+ahOomwvAGmh`_n6d3Gn{*N2jn4MX3kL$d){3 zIFt9%v-{YiV=EIS%DFM=qQ1Jr#!nD%dwF8q5xfv>m4D&b!&+9Y@bmN+g*WYJ0W0;W zyE|`08_IZ)3|K+Uu3RiDDvWzdBLSBY|ATMK zIUc4OQz86GWrvubMrB>=6qq6YBFpzEInk?a&FL5L!5NK;lo@;ASw0o&*b^b<#Z{$p zV)$b@k=5)DqnCAQwMHtlKqI8QMw89?4taz={IvIkjnN-8u-|&vCKD0_Ae7LX(Drc2 zKeKM+3!r1$3=mqX+m%y6>7QxyU??TMkIaUeCDI4>B+le+E_L)v29zejKtcB_*+Fog zs39G9Xvo7OjOj6^v{J~&Y;K^W?l&t?HBE)EX2&{>lh?CgjO)!T;yYjA7eTJJDo77T z5aqwRoo|8PX0YHR94WfhPF-tJm30gBwI`969(`V3;k{c$DU$N7Q}vDeKZ)%1!ATXO z3e5qNVa43^Z_U5fKV!?M3J;d8?oHfNuf#W6l)Hxb9D0@$)mXuy75sdVS2o-cM4H#j zF6Jwl)&fOK9`>@0-q#Mg?$qL`gC9E{b@|#;v*(7G-e+BNR}?7XlV}n-E9f}Pd?|`@52F*KE=Wo;0=>#2 z{mRKs#eFaMhKpJ&$;_!nQ?D!wV(}wCmEQ&j>Wt43Me)z4oJR?!J)RXQDqiEuONF>8 z)MN|dRqA@M^Je3N`S=IJ&A3^gu+N*0SdiHSvAW}!`tK_v@g{I}Uuay)^jocc>1j}S zvv`l7??6)jhE)*nFZZRf7qn@Ncp3q6zT{dDlC1P1?S`D|rtm@z*(N4zZ+a=H3jj9_FDUPL1I#3l|rB>(Oxs+?rp3aG$M2;PMivm8Z_dR}nTjB~a`{ic*X zO_%=j;7Qvf%F)|9+9zG>6IXwsTjG^NNP{v7)d|7)VAL6O;eE&Qjgr~7P-veNx41D( zOT9e2ID$RoF8ye@rU~=J?hhi9{% zdx2LF$rutoX)N;Q%eVd7Ub#{&)TQo6Oa0x)VwRce1~}&xLYMb;*F5){(UqR8WGlvQ zOOI<-Lzx~PEElxUZw3OB&t1FYe8)q_g&@qB!!+$T8DX16-{Uj?tIhtu(ypBmED`?% z$voxbjb{Er3S_(hYrW33VZ{)tv9mm65!pb!1P`-8{ozx3$rmv3QUa{$yZp$fFWk>q zRoa7^XAdo>EV*3QT4b$RE(=qU08xJ8aVH7&TZ>dt(s+Z{(ma;4D(Qytk&opm*}l4NGfvjOXZFg4m zocngg^|+BNn!^rn%JuK71L@(HV*$@*b+qb!2CvI{{Drw~b(pnv!~DgOvEYNhi~~U+TS;+|<^KcOtGx*@tu!=t zxAPk|Qss2U>)6yBwK@GBMCVrAd0K6djw zSDE11!|9O(jOzK;dydDjGgDm+#C=GRK|BtR@H6fg)v03}>Xzeg zR$Xn`E@XPH{pH+a>P5u~Ry=P8wE{dptX$qyqq?lg+IhRHon-0d4I?jEsvMatN$K|Tey5Xj3lyy<{P95OIp71 zu_{77W;m+aZ@2}ZZ+UMg8fw+#r43mm-0nzAds!~CB0hT_?j)Hut*A+HTUOJPuUo-W zVkYX(M^7<3eo8I+?4Fx}25{DMoASF8C<~+NWeQh|gKyW%iR9wOC~sj_M8RCH0i-@v zCS3U@;jmq$gGaq#l0(weKe66V7-yGq`7DcDSdw|H>Jkgc#>VO_f42*)=p3M`R>XUI zaLLVao#c~wR2YTx_IRYZV|vm+Lv+aE*o>RFXZQgw`E)6xI={c!6{UCFXH;e7b!F`E zd~a8odA6LXzH|-htjGge6k34;NsU8|%Tv`&g6M>=l){|O1RfT{M0)RJxsrLpxQbes zzBOi%ha6(1&^L?omVl0aCGMwawfRIN{_Xk;C%2^v@iG-!IKlzk`J?6>j~ zR`rEH8ATO?eLhh5d%^}}GI>m54niM~m$}YjW@}sio5`%$&L~gzF)|-mU~}K(s_6%D zTd{bQiar6>mTHY;x+Bd)jHZ@TGn#qn^m!?__S>QFwSDvh{!w*$T^4t~JOWwXb8`iO{E0m3i)E9Va?nfs)a;w~ z+t|b}vTIfp>ifL{U>kR^q_0&CrTEy8^^tkh6*qOiE($qzS%wzP5XS1(NLP#$|1gIi zkyF?Q2`dbZum8(=MeZ|>9VJ$_v@U)f9uaF|*N0@kSfNQs5hl7WBC zYSzZ=+XfWcz7yA`Uat7qR-FDBoyfLNIomEfBIOi$F*c@QpZbDkah&Z#7uwxD$!5`f zlH(vt1AFqT;$XN6Z;B`#+I47(#6;AzqIggHZQfp>+;2M+j*r*5KRb1~2L?L#4-W9RqFP_k8P@RiNaq{b>nAU9L$4)feN5IrI|5X+OA% z=g2{S6;4Yv8a%I@b#7QCpm3weIFeJ!_Q^CAXBy{G?7;WEbk{ zN~>SB<|~*-AWz-Bq81-Htn_w4_8z*u$gAXSAMZ9@RB=5wo6!01@Y@HLdLQ%%@KlkK zAj)AhJ@T5gw48D6Mr;L7Bf{;nv1V(lepR$fZ`t;FwS&?{Nj2_ht&sdhj!a3#B z$Z~~AnlZuP7?C`df6w4iW2Y}|D}FtB^9DzqDrO)X_gOGCrSStCEQ);WeC+mI#k&7HWCf?)iK^~z3T!oEP8i7^&;apk+LBCQiTqk_WgeG2nYcJ>RUbvpHcQhN9b_W~p z8G8RjMIVBlWa3e+j&WohlQz=lq=X`bqrY~ZSd+`cVt$Ihv+E`lQ$?t0oXYMtZ)`}C z*5!HjQj3&BjLbd=KZKPv_s))5<%gs$<&D1LxayPbEg}BA zMEURBiScc?wJz@Z6ybTAsG{@I-gGDnb1wb3_iogV>ASPzy@>!zs?k0_w?AJKj}ToD z(_is+N!NGjWf{l3Fmqo7z(PZ7yMLr~Tm^f7&!P*0`C>^fH9E-m(DrOwp8m_D`vos% z;|lD}{RKZTMP_0SX%4a|+%)Ec#@1HCQ8j*aHc6S1xAvMPldrxJ_Rm2EK4&?++ik3f z5MBOG^W7VJ&B44ADTta&0;~DZn}VxkIw-Tb1bw(;6TMTxj@bql zw_=|DaFz0z=WU+&02TAx6fMR$@^vyQt>_PNYbxmo?giE8xft)6G6%%ailJ&Wao626 zH}~yzhq?rr{(lZocLqL}Jpb@w#%2#(vhq63A6Ew5D7LwHOOo+8AdP|V*&iyvJr_7t z?_Y8Y*Ir#O#4zAshTI>(3doF_IWb3CA3}x=JiCIoZ2U*FX+C9 zCLJDmzfC{1HWshfN_i??#BMx~5-53uc**b1@Y3W$>VRi#*EC8+1GM@DTc7{BeL#G= z^7lK_c4yjbyj$!cTG8SRg%@;O!;yA*K0l@R616Hm5g>TS=u9oFfcVO`w;VEwn`JBu znHd-vR`$B!Z}CePpmcud{xeUf2tuOa5M$8OR%ESm%KjMyZ*8p_!@qv0CgzNmP=zVT zsBiqG87SQ8f;%%b!DiO2SvHTC+TLG{xjnFl{F!1cOdvJn*q$qE7eouF>uYW`t0 zCIq&DB8Z!x*T>#v;qdjW0lP}~C`Z9ADdDGf zP4Pb6zrE=P9W3=5f|J&txzK2a4W8hbr($+Hn_kSRNKq!SSKeV4W$D}}q4Nuqfc0+? zBIAh^o1BK5DetexmezAGgZFA}`UEZ7Ub`8>NY$`Gl(%`NDNPAQQJ6+u1(Wvqq-Qq* zUe+5{*aNM4raI^ogMk*)dQ5uJ6b+8N4l9wVOE1cl&>Tg0j9@}5TElupovD=BH4S0pR+|5yt5l&om}n$X7k)T@hI=^u10ubM`?JqH2VM#;GcTE1 zU36hq8m8#}MT<5V*GgSu**QX~W2!eIK2dZCx1=;88kR~js;meYEtSrF}Hx&J?$n@cCJ$M!8W~O|=ow(yk7Iw>JbXO-( zRJ@qP=B5Vq63_J@>#=Wr-<<&+Ip9lN)mASs5K$Ay)G!CVp&D(y5~58r%6a1fUn*=N zjp>KDMe_`Lm3AXvgO6ld!t!z-@-ae+h-!4)2IwR+H0M7uETrYfZWp!XWHPHK`8dYr z#NRVlcrN{5W4w&Q>pmyRLZYcAB_4R~UR8b5@B$?m4U*QbQ1RUx=hQ2veP<6uM>z5Y z!7!P@s%ssepae5F8URhI!jGO6_x8;#vTM3=KE2uJ$sjLj{rlr})3!Cy3B_Zb_yS|b z_(E8e*94b$T|MTMs*=_lKvy5WP_nJLfaK{f!6V7V;wyDh#tcAY_Whh__xrw|vo9)0 znJ$e>=UxF93uCl`6FWuhX169cK7W}QN zlrO%;C4LcFdsT?eZoPJ4njyGnXZO)yX0Q6)n%mNgNF?U?ckh<5sn;&@_wC9)AUw)j z&ib9MBMSBAsdAY3Au4Wp*kvp)ZJnXB%w?7lB}?sPqACkcGWJ`QWA& zA1O?3!$wNsGLU_3Sc*?p)8o{}-@=ouf2mC^4|#YybvHTC?inEU5&= z#a3gKni#+IF-UOjxSNl#7 zw}idc>xJP0vax*m7NQE#Ea|O>;#UVMFwEBmT2-5BfOtzU#4Cz3$Y(zLFfX6&acPWR zz&)BYQA=y1=}~DRI4Vpf>tN)x) z{D|F&-z7z+b`CE?K+UB(af7LNGOn<*80~e9f_-K2+%rlkoC8fDVs5wBphj6_?$RyP%~JQ7tzhdD)vKP1BZe<158YIzJHEbO23u*6GqG1_6*WTzJ!M z&@vn{jc%(3oo*bc;tFg=-A!m=kBGJ2PACn`>ve11j4NK{w9(`zd@uRBp;6ke6kiK= zycQ~T@=zp>n|M=B4AP6Xr@v}qBR;R||vOTS+n{;`s_qL{!Jt|-U>N$)%}#W zZS=yE*T?mhBP=7Rq$mQLUTUeVWhsJ&K%$N)FSb9ipxX^_a3+*aD39?_eT8duv_~t|f>yf5 z+B?unfgv|j{{n!-33nUupKHaroYT09(ywkEFG=<}c|?T%34{W6gnh zEs#i&?Zn{DU zl0e((O~6eTG)H1}fNsrcOCSc~0^Gm`nC}Cr?w@6Wdux?-k{7ga*2KtCg1*&o9iSfQ zG)%n*ShAu0z$}yw(s&+kYC%BgQ__0Av!6wwS|912lACsDuZz=GuOevAr&y$4ql1Sx z)&)as(R!fwMr|v0=Lw$Bc$Zt2ym3Mwd7~OQOtYIH1pxw25&#PFs*r{=P}G5AoWXWe zml#i7@v$YyINVZEoNrCMdb3;prPjQv?~TuzlfXh_SB~ZQ0J=P{F#CF3b~gjmMrj=6 zM$4Jgy=V_I0O3`*VFv${LkkZ}qPI5gxB)LwT*ia{1qn7W(BiL)BjzbR#mPBT*Ew%OZk`@pg=B}N%NE4{QapQ5}JCmNW%3w zfX4AJ4m>U}U>LTKjLYch}nl7e_oo-QB=N?oe# ze=+baTY=iQ$OSFIQmAU)cGLE&$51k0Ew}snYahM=!FoaLS!^ewBCCyVN z4WeyqhaLJvCXE*=qw^4O~Bt&u{s4JBFgnD&pw!QQ}oz$`23 zC@=fK`uZ0ZNk*%m(fFwfLE7a_pM|1tKdK;n$b=4DuqFUWmAyBrzWY@2;673n2swKL zFx8kRq&9`3Eo#Z1Yv5c^M3n$s{S=c?2hxrqQCB`fWWVij5aVJklmvf$<2OLQP>a!E z11Q8=4SGQ`6bwLIo8^m%94HJ8f6=8*@U**G4X9zufi)iS;XOz?#w-(}mCo-ZL-wA( z1=5z`7b7WGbofV=+|d3Q?1L{dD*-TA)l|yUQDYuJ(QY@#`E98Qt`AXPZ~Pc4SP7KN zlHq`;UejO+dX#gk$l&xiMddo`JEPHxo)Gk@D725Bsbu8tpu6EEgx!wcUA0aRQ#u|I zXt2g;Q?mI8&RY&H=?EU^eo4LK=+CuxB6fBL@V%T*2DX=oecTl!Oa?$FU3mXLmm?@N zx_~P7k~WFE$RdC|N#?jQ+5lT24fm48QJWR0(Ngix2Lt;A!YUFFMqCuxuN7BNA6QRP&rUpp{MyapJU$TayYD z7%KQ=eHMt0fE+)T{PN@;e+R>y#zNbnvqvJAD_PO^x7qCg;Myg^bDw&2K^P<)z>pS- zT?=^NYm+-&^LgDGh+x@G_8)vn!9~k*69cqXUi+)7bPC6ULFoNE;{F)?TOR~N6x7VF z;aAcJefAGb5bM;3a|#0Aq=p;S2Cw5SBcx(0s=T$eF3^DQ4$j`Zd&wER$2DQ-uFYCW zsLXfA96p?m-xOFt=t+i?L7V}hTaSh?XRX{h7M}0Wgu z$7B>>GB(2-MXA#QAx+nz(f)CeeSq!&4MBDHr|N;+q9aIj0|yoWauxeRw(-IPM9+Kc1Z!gUsgM@C zpE|)|4w(&hlYf0{C6J%IWYH%~52rEJ1PBB`I=~H2esbpED?IRjzwKc8W%0v*R#kB0 zPRAY4slxnkOzXd66}S#xDj$<)MDz#!KaV^N{r`V1Zk+pf&ao7E8XE%elqRzKJHPgC z5h~eE^iUbuU8wHgZH>>Ec$+F&GO{llhLb5M=DCLeYm%XDjgZ$IYBD_m=-@0$T;L`X zn)>k1I2&9Us(Zs9N0PF1%L)JiN_wM0HP&Zc{DEDa{O}UMtg@YoGYI?cJ~D+%_`d=P zf2Nlk|Gx0hrF@e%C}a5 zElnY5=l+Aqun&hYtvmaOC0c#fZTuLM$N)UHt*CKA86vST0M?fxa)Y1)8(?n2-{|4S z3%CRY2{U38#{`&*tgC!2a}WrTZI=P~G~3SfC)Y3>#Kl>&I(M3p{^Z~1h*-&?SDQvcmPJxt?U2%XYrq{ zEf&V9tmMc5hw1t%Zj|zYN-we`yl4b+`qXW33sm>u)*pmZ%9!8W?JoP`8189qvvle7 z;yK-MS`Gq_0Cs_UQ1BK}htzOj+;MvWs3y!bm;8X6JE80~x*%{P4E5IfGKf>bGb~8l zPy3i9uUM>f`c+=$en3Wu-s2F(H>nC8W;Tqka0TNDocLjpBJe}`OcPJb~hqGSI_T%Y)Z{?muId)W1afv5YB|~q`)8eWxMKrv*Q6B)*#$2K5?{h8R*LM|cfU=i z3)7ow@LUWYYweWyTk&-x71c%rz*=_BiVanhJTf#v6WHx zGeEN$F@bt_KO~0?UOv!G9#o9Y0Jb+VBlp)dyeP*)0BgfDj)gPL0M$2 zSi6g_4+pTA=Wb|#A0UMSj;xdb3R3_Kqp?Ttt>^Ej;EA3~aY;Ffu7Q$;a?fCZ`9o9x{Ujq%-H$fwjS`L(TnI%6eEv>-? zw}7`Ңn6a1HL;#5clLUx405)-mRyOzCs{}KOF!7>IDbLcFGW6V*M_<80U(!n-4Cb% zZ1zOiq44aLK;6lS(ewQqe*-sbJxdQ2f~mDy>2^jQ8;kq>hfV-0ip+b9A#e0OepK2r zRyvl`SPDtU^5?H0qTqBX>Z>yMDBm-W|HckEJ8F$rTR78e^@ZB}ZzaR$uZ}IrS)QE^ zx{=MQ=v;o|!f|$0Z=F>(_T0Pgm0~T4O?N%PndwP?Y9`S6tdHUr-9c@`M?izh*B3e3 z-B?_6Sr00boTT&dICxpvu+y@g-H<58IH=j|F}L~>R_3h;Ts%PHV~1PU!Y$*2u^~=g zzdv!+P-8(&8IkL>=K+@M`<0|f*UwZaUa>w4!Eprnu@b@m_X)s+WQ|}-*AuNzZFuwh z$G^HCAk!I{biq;gS`DN0)-S1?R)<|ki~s(WU{~izTvUpoI6Mv)iov;lxQ5Kq0Y?3 z-TOi$d1%29oEnbXTMMFhG^5ToeJwt>$oL@e4sQHm(#&h3()nDH65B$m`0gDUr=w<1 zNy>&I)6y%}f|z7<+ee_J1WP9=z~`$wIDY4MH}Y3FgOR2HAZGUK1VB>elxuat7}7Tn zWSpb;BwJwU@rP7`ataSj>)g6?H3%)q`mjKu3n~&GKp5SL!S5nC2NSmtOvbhjTh&i} z?>E2$M}q{#R3CQ!fVR_MSQYRfUw0UhSMU8zZJClb4>tN=WWz|Ko}5cp6o^*wgBEsdozdp>%4tm zy!$PHM*wJQ9EHhfi>_2oE5Q!RUw;*};iJ=_{BosFrAbsTp)=J${P$4kMuB@uADXXQ zQwEwYaC6tA2MS)a0sR)AbvTG#D*B81g0U*zLv%-F-V$@?Id!&B3R%>*>&kjSLl_9z z%%sRi6}P|_l$f}R1wfE|d66a#=+Y<0$)J;;_k0fpz*mk0iX6b92Ua-&S3n=Ggfm;x zr>Ewia`Nr~mcYFoOpGTc=HzZyA*{^u^2-H4xx}sS!Sd0z_CFjCzR@fP>?~i>b-~Su z>!sPNW+zjbfITL$fUQXntM%b@iyD5T=&s1 zPr4Av?CGf7$D;IoSJQnP=d!VOzxRlCNv?cd+yeO}biW0iLx}V7OvSU~R57&YEpfE~ zzjd{_@e4K(PPh8*fAT^xA)}?*+z0jb=Z?#UD)e)NY^3I7`r8}F5dagsCr`hDtMSb3 zu%DnVlhDOW!SOb#hj<>FEq9?Lab_h)VRCZ(8tr=(^bwvOpKctSu%<-sNEt7DMNGZg+<{(&*6cY1!<#n{kE#!*@|N-_4^Q5To^Q9hhVvI-1;@mtlY z2GJH8buxd-gYA<~J&S)gKjL_G`B-iaN$? zT}`T2hkBUa4%m=;siqHAp(5X7(yE0D`%(|uAHU$&FYMjOxyphOP(+E}&bP`>ymkI| z&9$YWcA?0bd~9Y4b@!kycfR?@*}rCB=D5$-O#94Pq_?7YzB0w7%S=KlH`IJ~a!4p& zk-To*Fmh;!38^bEstSb0Ln*9I=Qq?l>-LXPj=*>Mfz#`R<=%v*{y1wBUE>g@Da{%7^Vr?W4A}biYNE zr@!B+COY#k{^o@e+RHbt@OboD_v&`tS$}Oy$tM%cj1a>@To?J>2||SSjmArLqB)7707P14$cnlUYZ|priCdS4Gqkj zv`M=uN9QWt{m}zHRs05zR_G$Eg^Xax&tZYw_}9O*ZKG)jHle0+VAOPI%5#wU*XyOvL$Vr zdQk6Cf66k}K_(_YzN;?cG9)BHP?F8!JFPpzAFSk%St$T7y0{em)HNq zKWy?hI$YWi3u7IA9L=BfR;8(4n4B+oT?q0AuUFLl{#RwO35Z0vd0-BqIo+uAIlGDa zB99-iyr`kIqYkwxXH1CAd>P0!2yZ?NpG6kJdP1R7p(Pd!MWWQ+SR!eCSFO=wT6Sa( z;qdO}wwIjr(ciEQhk#AdvDw5uaej2U6#ZzhJ`1mD#VvN9W37Qs_<$!X!u^`RRqSh{ zX^LFB_1I$KolsG;q*TfADpGgAu~4D<;Y3i@(tA$qu*lAY59og_tn9 zGhV6SyUOuUaeAszoNvUK{T75!6hCF;&AF`SzTSV1DG1CcS>TdC+O<% z)u%E!L7WfkB;?5?uAY1@4G&x1;N$^S?vy8((OB|BXJYVg7>hz@67aodpZtdaN(+QB ziVJ~7WJS+(&^NniL8IH`=N&U?bYLBjiT*?YN3op#fX39o=oLbUs6DXUemsCnu3fz4d zgNZmA0JyM6WK>$bw`_8JH|!n zy(lA{9_DCiQzJ4U$wPAz#<-ci8Adc<=4|q4PwoO^QaZLf)8V#z4K&Ce>Fs;3i;R9M zu9P-^F-wp5Gdi`^36ELn{bRIsa>kmPR$!|pHUHxdxW}J=wXdJ<93V>fq8GaOy)D9C z)Wnw$*)( z{IT4a_di*2sHJN|%zoa~U>&!O9oT*5H+bh>>0vk=!Qko(;bG5mb1Jw4CG(EQ_vS-A zzQ*ea`q7A_#KxQXxvX&;L1fdvx{(3bp_)mhm}YtaWuo#2Grnq!neC;I0IlSw4z^sH z0OON#448T!f=rtNvOuu7%(e}vu;&{_j zgvoHE5c1oC)F$ZmM?F{%+5II4n+!%OK_6Z0qYBz}2&1TJ!GHOly)HoQJBKR)mjgiZ z1CNAIQt38{@Xd_t#CtzY??1A+e?2RP7c?IT5r@CGCb$aS0?T`13=$=nAz4CP>BmeD zSGNX}z?p936?0p39QEgBa-g^`mq)t4=D_V#+T8;dgipAKQ^q>tHY(e}(SyAk;cw+} z@|rvNILaz8uCrGcXAa!Q6@EjqBuy%AS+aMpxKSbL;pH*=Zys?F4*@0OJAFu zaP|7r*n>Fs{H$Q>@7t(<;JoQ}_%*;l;!IomWXezrreC_kks@Jt*2L zD1di=Gge9z1jZR0;O9$9xHYk7_}om~*?geoe(X~?m~GHm?=#v^jTx(fxmnrz`kQA! z$^E}KPB0?SQ0aqd_ADKtaQK`(=Xc(E{>N50n-Ok$1zMdt9zCKmy(M?Uj5lBZHfk66 z!~X9`($1If?k%ZxS686<_=|ONz5a)l`{^JAu*&+FJR!ldfvVDeO8C@3wsLzlccM=u zB#RbA3U~wzE6omziX0CyQMjbT&$0A#>H;Frr1q)FXxob+UeHAJbZorXx(n-wrW>;g zoHcLU?OwlN=&!+a=P&N5-UcnXwCi0++Vuqbtm?O{y%GH(U$!2;ZFH(GVy*0NxIT)DkL zdZ3lca)bHw4MYf@zIU3};zOsXZK{z}Vz%EI!RD@1|1N8Ug+#h<2Hvd zeO`VuPD+IK>nXbKFmPf0=x;u(P7HN&-Z8Nq-dIQ=lJ3K0bfg?G>UWW1eJv!3`&TEX-Tro*#LvNhsB_5wRSInFGt1pB2t z*(8;In%M;d(_1zsT9pIW$<48vp;!24MM^6L(B?ctu+($v2aTe z-M#KW-cY}|W_wzlPS;0bLuLYdwRhYBHI9|>)By6np0X-h$msum5iQ7U@FHE=KCcIg z$}NFU&==zcTIXAKXn0D~gXP?>d#4K+Gt(pLU%d)^#^z@~wV#SU5Be4aOhzJjCym2A z${V4w#D3MWmA$}v6g8Xfy_}`~b@%4xmlwvFso0y5-_$+L^z~IvZ${>Ug_pOXhl};) zabiW!%VebwhHOk{rgqCto*v%PByrse`n)-V63sIqyeH>-npzsU32L}~tmI~PA1+yQ zVpLqce*!82$#Uu!WlR2=Phh_iIONfLxL||9yZik4d3SJE$iJJMv3*TMU>JYvc!eMt zY^dN#?7a!f#B{}kS4|KyTdDf~82_34!`+1Jax1t5PhwjYAhtFJNuWLLYky^uYkyhV znW#K?M+DEETLAY(u`r%Qw8aE0bsbX_3SnwDaszS)B`HA^gBDr|s|C1y$N(5Ukksn! zm?4muc^2NrtF!aj^vGeWF5$mi02dtWXW1f+( zD<_D+fT;;FiM3&pC0FfvV`6EObHjh!NDK|c*w$oJWhwWQ^zWo5Auj@${YsNM(E{KjJb8MqNZ;8l78rzMnHzY#w(2#91uv^@)1u0eWc9Q zf&a#*@tw4a_1b3gqWZKEf}m8H+Lpbl>6jNQGM|byCpz8_pzcZ;=@BC^z)Qwo^BzCo ze%|QC#r{l-LFvMi-BY=n<4sp;8wD(VMHzbg@kM@3{VMoqX(Vqm`Q2!y{KrJ}C!H7$ zE^SVwCmE|Uyq_{?!dL4K(rD}W+ORVvof+D;=k_D+s5lhk%Abekz#l>NtO_|&I= z=td=)oTC2uE9rlmtz=2&!BYwqd7a0L5AHHB+jaeU@$T&nPA6N{#hDC6l{tQb`4CB| zd%qUi8k=7KEcxOx$NE0XTyY&G{pHc!mrQ6SuXcV2N?mh7>GP6&qq4h`7Xuvr`S#N= z1ABCkb1a0k)lHb?0VUYPFTuEGacC$tRDSYMFK7DFh1cHZ3Tfnj;6&a{Oy| zu(HN;Je5@Nab>rrHaBxmjFb(<^1Yyu7@D@7q8%-} zz8xWo|Nh*sjUaIEv()JHp3?o5#%kxQ^i($}eYx+f&kuQ5Nev#x)tZM_`z-BM@kz04 zj{nBoA(1s}sDePY_u46#WC@L3ehdC7&2z{(928=fT@Bw!J&^qiZq2C{lf_c~{u>}A z6ZKuga~8aVanTfGr&rbN*-Odod4B$=aEHG~Sh_j{%3JZrD%Qr+W}9b6BXv|O#=B!u z$~`KjiP@n$)y4FM2@R@3)24Y&l$}3+Ge+@mA^EL&m%Dr8JVAw9Kio2+`!mi~UqIa4 zZy7g}U(7wa>xN>;*MT2y5WPx;&^S<5D^Ioh@_blFp@=@Mj}Kb0K+V1fKIkeo!& z_NIk4x0}#;@gT4hz*ny<+m?W79;kF56mp5Ck>1hk=8raC?a_gUIV7V5T(M;Z)2~q+ z;Hqlp|0Jd6rZh%K&pe}8cZjC~Prq*(&bGj4nMPhHXWmykqh3`|Dra}BVMcEdw|UztfRoyV5~%hja%dCXj|4I__jABTW@SYVtc6DVQody4kPdvt&;wa z_uJ;&k#&vzburs~xaV9HPqzD9l}t;&$i%S49)8)vmehsJmA@wr-kv@0gmMoa)cW0p zuF@Qv8x%Fzsavfxt_wX`jBv5nNc?llrhqiiHJ?3D5sptZ{&L-gFnIE9UF#@9uw;)7 zK2OvhXb@81)&{s>RM6QmY9m{iZSl$qC%MhmD96TJ>+ik<6R+LIICRqsv-*Elf5Ybb z1?P1cSk)&*Mo(q$IQ*3}Pc19ROe23{D+4>>s)V1g;XLQOE!d$12$2G1faXi&@K z?o-%7zy~&UzIH7bbokb~uwOt08;Rdf(&IqUlX{LTGi#*2of*(DVy5@rDG{~i2FJ~9 zQ$qvWGjZ?Vy?b{0)hs1oum8X`y9y#$xJm+>nSvY8gDBWV1c*V<(vLB^hpe?P@!^X}5MAIvI(be|2jnhCo4m>TiN;;0rF3`HDsa)2b&zkh^NSxnaW5&6ItSN0D?H5kh6|iCj5!Qqve>yZe-l+~ar>Bg z+-lSf?Ro$)$HZVwneSB~|ql@S~!DC{+kbQ&dn90wTRxKtu$@1_%TL0wN&N6M7Xe6csUa z6jV@3=yhnJcPWNmL$68b{oPTYbIynN-rxN~%P>hMv)5khU)GXgX-0|hmHHkk9{=@d zF_`pYN;6DnCc2^lvvSzvhR5mFk)j_ub&TfR{_P{XOr8mQbKmtl__&LZ#XIXc%gOY7 zFE|B}r51IB+-#|YNS$W*4Gl z)dl`67gTxyzEzJx7v(r;T+1YAQ!`d!xnPm5KG&?ZGJ1r&jy;l8U~X*7F~xK`MQt+rgZVb z-OhQbwlH{>9uA))Exm!61TXnMg)&!_S;I@z5zRbsZ&BK_&^3Da&gD@RC6t(Y$$98< z9STapfe{`kXyiU<3?E#V8ZUPis*mG^H^q@ClZoU+W&67q;Wl8>#Fm!+)tkLC%Vl#HqXL!# z23_lB;_+m$A)OvHN}^so5<#|4*pQ9iazHUGGU-2I6;h1KSW;c5>5Rl13Xkb9zGky_0{ld$Dx=j! zo(k87fitpAsvABO|8vjk?Y_*TrJq|n^Cx;7DY38hX&b!C;mPZJ#kQr~fQ4=+=A+Nk zdneuXAiD=!^B09WemJ&tKQb%t;YOB;+T8v3L{Tk8bm!9g1O7ymlUDBvN(cfOMOC%s zIJD#k^Ajxe^0XF*M?YOD&VJcRSn*$eEQxp!HF!S(23`-t^6O0#=N}pmvGPX_{`V{w zj|5r9)2nd4syqJf+9+de<8;$nTuCQPM7JNMVS2S@l5W`_U9}SxlyYRb+v9C$MCMEO zKi6bx8rCh4xjCsYR^gZ5i25+NREFJKOx$%nks7R(RfLDct}CRa94R&#RQwYg^}L05 z1Twc(@*6ik#4RnV2N;&_-xG--Uvd;Pgmrzm4^(quX_;mt^TWr6ypX) zq&Gc`V>R2eHM8SJ{Sd9^5q7JE);RMf$C9eL?P#mCisAT2t5z*KbNvQU&k#$VB<-!S z^ZH7rsxzR=w>qMTHIIiD(eq?m5_{wTS85KQBT|E>%W}tFl+h9vy!UD4491_xBTvcv*qlwR3T%2+6`rU!BrJIb9#7f ziXTjjdGr^UehUHdpl}_j?~^f61t}ugtD8m6=I(rEau`v4MvTsAnm9OaS<>#e6r(Vi@+jl9= zG&>33d6}Jcf2u}W3Yj<7CZTHK-e9>yeXnb`>mEXf4LBs!!UGKh1q_IZX;i&IzC^7O8?+Qu8xf$-;l7=K>AFyY( z12WMJN>QRkfrG{_)9P~vaxL9tqw2O?eQHtfBB66{a6K;2b2j}qb(osZ42a3pO^{B~#<00wP%Z1I+OxPSQRZYz@f-zif%mt*K2C zqrPDt!)%0vHP;S;p9jMC=W4VUUy=da5%LcPkWVe7(d!{wp4HL&$iv=xg??Rz4hOir z`sh+he?9q1?*&A5lE^{t;?m0L_b{G?!;E`kKsxIm2KJ~g0msXnDj7VJpv*9e@#RDa z@^{b$tt#!}Y!}{RVL71N^GU#>o`Tz0Sx7Z9C~AX?odl82GzwjTS#(bt!yu?gnhSy= zB@l+=tw5`Vp(KJiLAk^1aJZRpU2aR0-7)ZJ9=8c}gx-?K-XK={_sQOjN4hLFBmeiw zWDp>H%LstY;x6c-d<(4?8br!2nhwE|@c|cb^#)%Iw|*pG%b*J&-J-)^=RB9@Uv$_V zo#p-~SMzzrd@haYI0zU_zq7gP%#8SwZAHy}RG#0eLEWL{VXb{T!#(ti&&PhpDxcHb zG=o`+S3wD}a1;N88Kif9zNeAy&#&Q&|IerN@mu|T0L1Ng?1cIDx*^Z`^yvk``gq^@ z?WqO9+Y0mLw@ZgSeC9*y_qq4-mh-Z3ZQ~D3EEWhp`G%tbo|)VaMHWCZJU& z#|)+>Lej~2^Dt~p8u`?@GcZ|zh$I()M5$=@z5!4y*;ssw zJ2E#I3Qot} zQ@t>^#bEvotbFPz#Gf3Xx9F#%Ula}sP3aG(X=~{aJ+2Vw_Oz|VWO<|8(x=Wc14qw- z$jKAP6sp&ja(9up(g-AVM2jLUZ?NHtGbIrlsS?NyYJsQ5EnQWCR{OwMbu#Sqs;F&> z)O+PK0iC?pePU$kCE-D1N#r-@V(8!pPd9>M&)@xAkTHgj7cOvof$c3Xe5(O9@m_-- zeo&|pGY2Z*L`MW7yYdL4Is>a5f-R7)V9+<|}E&a)Xz>df?G7gQT4K-{Xo+{w&*Ppq|>uQj>DsN4qj*V^4Fr zF2{qS=y-9jU%bL``=?>oBsr2%+b8bBV*DNy6lft~P>v)#|JM#}%tVwN%PD~LSV&ys;(0?oL6m0K{BOy{H*hi1J+K7bu z=y8Zpj!+eesIKsrA*t;%cs zTC%pF;3O)}08Ew!vuqy{RaB?EJZmur8a}r+sgKieLegUO40M`jd|-k{CVa47)mpRG zz!Z`MMLx4$2CF)8k2_Kx40@#kNF-y$x14sLltQrB>dZ|#Btwu9~SIRFld2gy_N#-pM+%?!(YC)J+^Jm9naq3!_(44)3u%6URrw=a@ zsKCfyxvgudG5t~_WjQ)s;j692IoY2m#dI2tI$*>y8O8#i?|w4OZ z99{YS{VX9!g$XSDjCM%?bOj4T5Y3fi@Y+OSLRnMy>p%>eVHwS70`5X@k0gv{!n!g;~?NWh-#|t;ORENeAuYbR4WoM*P!J zA+#W+Ad&*5TM*r%EAXa4+f;*%y2-z`?>c%}*LToCu{c)duM-o}9BC@we6? zPV}~(2fL}l<-h;^miS1u8_G;y{sV&izb(&$+q!&?IW*k{0e+6#fA9~Q(bLnRe>TAX z|3&K;L^E8iGxKd8WOebGLj;t@*SIT;Z!`baO!(?gMzNXaKi=7%*kw|@V+1|3BuYd? zVfUnY*(Y(kr+I{sgO{@LeUd_9R$VRMhPaQ6Sf^0r;V*L! z7UQBSdAdNObaZ0DGf?AM6^f|hiROAs1Rw*~2a|4gc&)V5_oBym%uNP3`c5wOJa~|x z*#*`JWvQP@(68i>t34jWO?u~IUM$eHq)UYF)@;1c;Pjwo!b25*fM&JVzY8hpjQ1nw zqb9vNd+tkBh9B|{6!u@=9G2Xv8Fjon2p+gr#2TAfI73=pCt9gGjhGbbq;EU90#vEu3!N`B9c#% z-Cq<2r%q`@jk<;(|N1);#@W}^S|CxO^^<=?#c=DpU6h1TU$h&U^u> z*jrbK^YT1!FSVo42S8SWEvpB1C=kGsCjHxZ;e}WV#ZLhv6?DK;Vf28_eC9fMjbi-* z@OzC6e0NZ2`tsH$HlsjAw~ztHEN&tgAd&=)+-Uwtx^f+RGY%wN_MC_lVc3aQI-GU} zeKoCZVBG2geYcIq{$}0WqbHa5M^)#9t`8dRnJldexJHaN?r(Rl&g4%yJ+coiYLU;} zl!&psu}{T$#f?C&dFuYLd{xG$0n6I`w@V2g-_vPxR%dB?a$6Xbt&3jf^EOL%C`#H# z=yVbKY&r>j@VwQh`G~lGN{XamS!}qQ;V|3-T}XFcms_jIv_JRuW^0Xpr&h*LM#qiK zff@Z%{;Ni@d0*QRU!k2}7h}GetoI(?=G#SPbYW;GZ*tjZ?KfeD46f>Ke+AD(NM(B} z6{w?>w##ayZZ-Gw4oB8H?rX ziCbE&^`1;B7wVQVLpbA&96_rTaH4YUwjd5s9gbDP2du~tqJNei+tj)6mMYlfEp^USmOXv3R5!XvFylIpq) zY{%IMo^JD7m%arpXv;I&fsF5K(#wyTFWG)qP52(XpTxevl+mP>xKmxuw(}#fGYGH) zB;+nz|JwOmgIV!bh4|uUuS&8*jJe&|afJ&86ORs(S>dM_v!g3?m-bM z;{vTua7($&uUffSzngT@j$y@{UD!j49RO|?oI^p8bh#RyXov)62R3j_)FdNNqvDu* zjIX2+MNfTUT%WKc0H){yXkkk97+lh-?Ki4DG5kvA?Q~D3A|6A2W{^c)G0AdS4P|wh z<9i`6llM12z^umv6+e*1^eVV#A?T*vh>8_i9He*kAUes#(uk1wUn(G~l^n+c>M*u% zD2WLG>cqMH&JBV|i6Wr(jcb=c0w6=?bZDDs%YuTphT#s=Lg&otxhi$y9b z+nIA&8XD^MKS`EAn*C0B=rlXM3P_zS9L#_jbbMmh>g`>_QCAvzwh$OcP{!I$V8{kW%@g{l34!z8Av`ri6=q{v_>wOjlodiU zzGAGN>B^wAie$hId`Vjd`5p7co=S{#lQ)fz^|OQh0g@;*KQ|ZHYJ$%@oftb4fJCGD zGVh>RQhV@CY%hAW4C#`HChW67#4Pya~io_orA(N~@9$M4vV>{>W}1fvRY2%HsLUnH@Iq%GNyyww~v9z&TJl6g8Ig53_iC?q7M)!T514W zXK@j}Lz?7wyMt1(zYFT;(37uWM1lakNz9tUzo(9hm~3T5XxP8otE#0iF!VYyX{dok z6C!|Y(NYY&!Uq-g5TftF2*gZ*42x1AWQ-{nvmU^fcS>@Bzn6Ej*(mMJ*v`CD<&z^dZXmbUcxL(o+@)GPT`RqH zCV9`LO^N)$U9)QL`me@^DP~lYvBSCvq59ch3%Rpi*{^z3Z>ET)Wg?2L^tQ%y$`HHU zu;3)E91yCC7=Gg_NGq`iNCnUJvOAn-L8q?L^5e+T^r`JO&&|}80MDJ|nV4b4%fd!t z5%F%bGY(y0>zIiG*yY>93OlqG3w^6IUE-~GJy)XIzO}inwRJ?47kwME!Mh9=ug<6& z?YgW=NfXoTJAj1vaqrV6R}|b4%5kzb-8R~r*7}iP>ALilaj0o019f2&|GYnEYyXKk=5b={^nJMTyNkom5>meTV4NhMzK{RTFJ9c@`r)#1Ba+^7i`bmgrop#Tn;_e&xW1EX_iJ_~#Ag4&lk8_R2ebs3 zSPAy9zi~a$63J7l_yBo4Ac5P}%@IpDYW&_XOa$6^@30#p^)<|Lp|g&2Li%y5XywbeWyR{tBMc$KN@D@P<_L`W5bZn6y(QIB zk?ld6vdvt(i61LUd$TslyPrv0rxX6MZlByK7|uX=Zbg}=b|0^R$e2K;|>9HRM=Likg!)+6KybmoiW(Yo&7`8sow@v6tG zf=%L(?n~qTf_N|PaW+8hnnpYaEG4;v5Tg`fE1{r0fl69jn^_l}FnkJ=9Jufoc72ZQ|i z5|4hiN<3lfPWhgkyuX&5kRGqG7gN4}|4H@=8?wc4v407&+|rxGc@Xt~xy~dBWo&6Y zQgZg)-+}K7D6H=+T^eo7`FXzW_gg+9Z_+|MCokAyNS3hdW?Z2MKvSnU0jkaE+0fwY z5uXnC0}*Yg_g zH>yJwg+^}bsgrDSXT5Q49G0@kaZ(y$ZCYYQvLKP zz&Xr8b_YLH)auNxDjJ;L0RSVZssu0yo_?_I;>ij3=ZS^@ThX=xG=ef1qzus1Q}Csm z;qJ^}3i6(XQnvJ4{mhAE!7fs`A_)bG*E?iXFYOt-;PubnTbs7u${oYKAD9t7cOFU+ zYP39I2?p|kS){Pbx^FXzorC=HMBBkT9M|!aP%8cJURD7ZT*0(DM`{sCb6O3XAoK1Z zv+C6`up^ILUUlIQI!yoBCzAYL+7YQbICSH6SZ-LGeFPTv7ToFr%2+7!7%1irBIn47 z8lZzh9cGS%A0n!hbOrJ8;@%hO;xT)5v~V+~!-%0C%~Nn`je}s?D0gQyW^Us!)6TQ@ zlD&72;^^nD$FCiC|L33E0cy(EABMdFxW9Fxnr=9TIq02yKfyp**Jx|p1KBK}w9ql| zU1IbpVvCXpm|xNS6b4HURgR7*H?(a}IY>Dq#YBt{2BoV>4)Y6}CI1m>WPu1H3Ioos ztwOmI{nj^rvjUv@Kozn(8o>9}#M9PgtCgir6Sg;0W+K1DG1fhyGaNbO3g6cq1ihO( zW~L0^m${DH&iIA5sii)~(U&_jZ@{)Cqy|1Z%@0I8PBVxLl_BS@`0^!0PQne`DiV*` zv>HY7dd$xp^4vr%gls;FiIMcA%rAERaPH9Z^l(|tIUXF5NpNya|M$h@3a6o+qKDz0 z=H;JO_g<>&axupQw%W3Hy$+N%v&tx+$XMCvY|}Bxeb)QNp@F^ok&lA_zBn|-rd&!` zH}|UhM!V{xW6IwDlr4UfW88Uk#BWu*F8)KjC;8VQ+!u%L-LNkSv9D%h*$AmF_$}06 z)?a29-;-#KG<+ihk_PAXZ%r9xpNqvYoTc+pQeR*Q4~00FHI*4slGbg(INR%epgD72 z$$F_nLqxg8@b)NqrejEJws6W@HB~O_$gJE#EJi5iLF<-d@p@(5x8b_MXBoE6QG0GH z4d$tG6@hh9Lr?CxZ_X3smT*I^V;85sFu+S?t$PZKDE3Q3>&XjYN;A#5E-l4}gumrl zuE#srYTaIDyC)S3RKtq9Lv=1o3S*H*53O5_G7DMjLHlOcUmwXiB^|FkNbQ0`<>-C! zw(^IAy-JiwqsPYG)v}5Y_*Tm(efp2$3I=SARz2L@N(8GH$`3&S>x{EZTMGl2jFcf- zRjZ3S2v_^_%`;rmhVHxRkA}~tp%Ydmb|>A@IQswDM~%PXqvd|m5zoS2W+>ZP6q$F% zYcA>93!d=ad7;%jxc`$Fky1j=FA-{8U)`I`t4nYZ8FU=3itvJV7iqcpf>QcPY-cQR z+AwFM@6HRw$7i~iGiFvNN)*V@ch+|Gj~W`~wT4s0nYUK@Bo?{PK0WTc_@aNV#MQ@j zbjRlTC zmYZr`=UT94Y~0>4!S1XWSy2jfa(<2vJnt>(!3-Y%QthXEK7SIn`@L#L4yy&s0A0kg32ORnpENZ%DXC-L(!>mt_rSG|KE2j93 z$JoGsES@;TmMM^CyBg-R+M-JEyJB_v+`>#625ovg~<bLKUclz^cN||_9yjN-6t9VxK2LrEN-;TEIXvPlu)QyvXHM?#GFR%CT>B(GAC`dTH zn%OP7>atBwSz=$KJvTy^&z-^`Hyb+ty@&lbFpHsII}gpOCDYep#9BXt-AVH(9Kg7w zD&k>2;shY>D8@iWbu$KdXlzH>qcFq@OzAxZMAw5xA=aF)Kxq{ZHU*K^53p`B1MdnO zt<(T%dX^tqYCEj@OT1H_@%|G{y|Q!(?c+d27A8o#fcIYp1M|+z&WB1k+p_p+$?uk| zkce73I9n{oncgswC0!$rmJkaY8~LK;;AshFbGp&%ZMqNYaPX3EtgWfTdiwPJj&o9v z%bxBJFyJ0?CwMH@%xYd;I18o*3m^^4O+pl`;fP|W_*U8@qlfdW>`RqWV|9L1l_yl7 zrb}E$!;GRP4ISNoO-(ZJ`61A|=|6a_rkC$urMH}9C|9Lu187M_OE&@4y(dQ?Wfi4J zAW;ZCq_wL0Elnok3Qs6*RkPwVubgmzMuq)pF?-H`)|gwoTPJ9^cShW0M$Rs%g2R+~ zQ0U}<<5d{^r}2CErckcQaekmz0?55xrFte{;so3n@A_SUXK=xNt^~-!ZSE_hZ9@*r zf@;tb;tb;se1Os}sKYoVL4l9lLSnN`A&Bj~MR-BbZ^0-I;HrI8=W|(X3sn)InlR6Jg z3^wDoj)DDyCuPQuiYa|DE@E=1Q0iwi@`hdLM-g$NHJ{*T8l!lpywG*uMI+{;NwMyn zfQ81VMPzqzT4t?Y?3pOE+6h}P@%Y=Hd3}x~mwMm4ZHxhAs^2L`R;!`0ds7qi_hx;u zQm<(K@wktZdbL3d2Qr9}5K7`w07%%l{8$k*AITIBd!ot!ijR{H(Py5BhBAXlfMj~{ zqUp3(LxGBYg438rb7PL|!{F4# zGwKi6o0cq&_~2{}?+${?F$#Sc8Ej4-XVL82dXa74~oT$1J7MaEY?$ z9$geTyD=@am{PVg8$+A1kwUozb`^wct29>0KPpL0SKORAVrOO2Sl<%=K7{XU_+oE@ zV4lDS9DQx7BDje)XhnOV%9v+?idLnXNxbhBeaC{%$+8=tfCoK*Io>aj;@k+dh5-&I z2IgeFjQ_?!6;<1of;MvP9Pp>ViNvvq<;21JhSe4;fQSEe?xb1 z7bH!=T(uKT7#rNKXq=tVo=C<$Am8{6VSr%YtO6pVhkZ-H%nB_aXZ`+GWuQ!+A>ADIo8s*$`OC;bfxjrkpM*;Uzg1+q-ElZfO}Bvl z?-RAd(|WJm{TC@zxP;v8dhb3nCP>k@JNiH8O{$D&gJL{f`ZF9_mN!e?{6|H>+1mZL zJD_Cn;*=lj<|y&SIEarNb6}h=uNUZ@-^Bv;mDbr+R%IFHr!?IA0!XbY?i5TFLrTp0 zgcmh&)kdtSFy1}bXJ&7|rrr!9cX!E{Kj7u?!*uYCf)OZXQw-+S#ZeVNS!|pf1@!LB8yN8_BXj|zx&wMs>@9KP6Ds>ilTH8d z)(lUS>G?Z*TMy~yL`VfNYjw|80d-m>_KisWS%i zQ4SD01N#KmnIGosb6z6dejRxkEkHnSW=YM?HOI0actO`ES5Ah0^m_*MPFe5Ism8W> zx8g&?9dCEY{G{)(GuC`S>H_T`ffzUp2KqC$f3L-awjS6Ih8=o;h#{0r@d?u*EHF?5eN-Te^eB+P*Xw8_RtHW_ z{4!fSx0Wdcdg4*+rY;a@jQji~O!n9MZIWpd5ch&u8^swYCTBj3e#5Tup8JtLu>WzZ zNF&y6F7%SeJ{hm^i~CvY6KWXp!|pfEgJqo6xAGH79vp;b5FwEo`=Jh`+&uQG+UG-3@*kDh-uI? zE!>&0>9;39VnTB)PVt$61#Gl3YjWRZ#sjG#v#=I7(-Gj5ntQLT$GM=dQEJNCGFRs! zT-6}UTGq)1{lY$vQ9QZfaG;8Zi@jK}K+hAY=Ng4Y>(^(&6`Z2`D;(3n-&`Se>NneR z3qsETAAPM9ksKvllQSU;!0f41@*AKhJOJou)BG`ZqCW^?{2&-z=?8)yrwj_k3apKV zFQhB_cR<_7C|EZ-&PDfxUbXH92TnJk%yHk1Z=goo)jeB?xoyjj+wn%iz_gmczTU28z zkMkVA{_lwi07oh}EXt|N2>}A(q%(o+@hx9YBOaO|LW`U%WamP6jlMlNS*B~inzXi) z5%j`1Dl)&0&usang!F>9!Y>5jhU?!Dddi8mEuMWs<{?}6ju6i~JS^%$AV-7~cx~&| ztBwysx*s;asS(V&(;S_WZ~i05+AptW%WAZz-}#&W9L4CuiN-%Rgwd30Cw3=|HK#gb@Vlh8LVuXIR~2qJ zieR-~e&Is(-;`g0nTsy0xNx%QVKu$;F7u@miwA$?>Vy_q+;l;b*J>>(HTED%FH8EonZ~`a z`wy1@ktX{n=MWI~>}LnQ>}n~Om;V0R(-z=N^5)8=ZZ1vU|BUyZlUx(HU_bKUw9(N( zZoA(xoW_x{wyEEX8{u@?r;aM=DtEJMsRU|vYl`vR5S^D-?9&+#22xamgW>Yn%Nva6 zDyV@RdX9zXkeYd{$Nz!5O&A3sDE>O*T^IVk?329#EwWiB52Aq^KweBz;CKw@Nuc&J z2?V`Afk>p94`_PW5bxMnLtys6|HD}+^j;3wgSa2yQ!x?;M4I78-78#YU`I{v!#D11 zoArB^A~+8@SywRiQThF}SAEen>!0wPKP5|UFb`8@0!(`ZBk(;KTjLD$t{`A=Q2#s& z1s-yBf6=);BeX23wcnB_g0dY%vCx8t?ts>78qk({9E?2p%*0>bsIh6Re=MTs>)W>U z=C$Zd<7L&EX?;bRolwbuhPZ&O=$~yuLTU6@LNVt~L#?Ol|G3YtYm%K|ld z#))fjJ*-nQ4mB96B7jMkBnA@h>NYy~WG)$xcv77*`Lg%$_(0zq*wmdtMkG{nrF zVCVl1B(+M7xW(S6c)jd>-(y5mD6qHK&=Uy-!#sc*+1vR8Ow1owEhW|uF7Z~T>ChOJ z@^8g#QB}yVRnb{yPu2IYz!n^h3lP`bffu1zlHiw&omZcd-)WFFwd5)LIId}vZUQu7 zH<8pG`_lP}%ba$n0nhax+W(xu!H}`TC%@30l5J7n*tos@PD`0&?A1T1@~N_k;R;b@ zFkUxfnR1X7TtETS53pTE_|7UwC;eb-$$qdb%Z4BH24^2<^031>c!2yHNYtMCQPnN9 z93YF1|KtNBzX5_Zbu?%^xSY68t|T7DUVu|KXPIy3obi}}-orT1a}Nl<8{!Oc^t`w| z3Gc|052F+-nUQ(qM&!N4odpKh&K9_5_TH~< zSsM{Tx1XBcoE2o3S@fJ3QT2(6vz{^ap8jm9&tzBt#>50<^z%hn!yJ9I`<=7}IrFGL z+GfTzfy}G;Mfv|8^~|+LLaw9uP>E`;9_}+nwlc-KB!w*(>Dr-C(y9J{7Blv|R9jBe zB40fHZ+1)r=E)-lVnjS;JV?Jkia<-Cye+>}}IZ%!a^WMwK4Zz$^uBER=gL6yb#}XZVQqp@ven0GF-848Gp&z{-9!j83%r}!_%xC#P=(UMqCs>uW zs7Y9T0EV-c*JH6>RkwNp1kP2`KXXSW^MFiDp8b#GT((mcXJ}$YR0fB>4S~Dc>fj<7 z|2o`J9x-ov8#vQIz)pmEHU?1Wfbdt+Z^#rz7g0p8t%%nw)KjQHOY|=vdc9DKX&LMv zr)F2SAt8fVtEj9^1&>=Ov3MX_XMj(J)`7l~q=+s4_IUVyL9-NB2^T+ZfJzy>i z=%bpL?>Zfc)@(Y?U2(vt9W;!sqDf3DWfT}?&2{j@o=9PMPm?9Z3D^Ku@k$0bRabi8 z0DN|hA|ebAVCt}eF$eEC31Owi>l9#J7vh3`nJYXxq0@@(x3R+QKxC}AYg zVr!s@Dcr0C!S)h`be3J+s~q$72!QS7H>Ad!+(AuD8pmO!5RQl;gK?Lubss4B?)L*| zGEpu^f4&m47e>nFe$WVtJA2R^_pi*E^C$pGoQtl2- zpAz6s^S!&j?QJP-)>o9P?0uNyEvq_!+7MeABGcOPwjWb zcj1vesMv5j>2%se;F&`E)>R&7K^*fU8=s|cliE;O4X%vmp`I?t4+x+ug<#X?Av$Hv z({D&)vQSD4pdv&1fV7@|gnNV$F6je=L(y?0bhKAGRxSjl*j!6pk~Ci&U&A2j>z_JL%DlCD6@7qwXKCy3di17;uQU?yAK8d4!xgAnDkUsCyJj8JCpC!@KO~TngyQ!S zqYDz-9a0%R@h{ycI?199mOg@MA>D=PKuaBRMpG1Re}JBjQ?_mH>MZhFN_PzNIRT&d zW1velf5i-)%^9Hyl_@j>Sd@0+5RomsChpW2)eS?Ud1y27mAq$ zP;|{1E>n#2g%H(N;P^nwn`V_+%s3|z0Q9vO)=?FsycP_3={;5|UlMi}*j#fPFfNG@ zXA+WXgJ=Yv8^*4i{f1=lZ&bSzfbl~}#XAiPGlEjqYOX^~{Z2K%eL2$x)kZc@A@77r z$yey5C)#;y%_cv2>f`anVc-?OQBP0)Yth=RS-M1gw8a7%mk`?}>#~;^+Sj1e&&07}w6Qm%UMv3ys5BC*0EdwzOq;ws@$p z(YJM_u7C*JKjkm5xf5HvhqQ)EUaCIHlG%qlAw_v(_dT{6P5V2N7LRp4{Bwa0rlHbXzE0Nl(pB7UY<4E@Q6!HzYR8t}Sl6 z%PvJ;ZT-TlCScogI#JMmdt3YWYyiRyA_4(ORPmUSwA*eWxJW>nd7Y zAP6ww-b3~x%WoFh^}yY1gR5snhWq%E8M*G4>gD>-*=mdvz)8SHEg1qMwga!^P!Ccmu)B`d z;}k9#ws^;=#LcppT;r@{YWcO5VHlh|_zgA!S&K=~fMzRRyVP0##`Zpr71mE%Jmg+J zr4aHaJaro4rsI?orhPB-6B+B}%Ll<}HD`h-W6E(EC~eLU+;`CJ9xGQb5r6TeIWk`+ zz2W9+90w1}Ledt&pPIcp6BD7#%I@il`YedMPe8|t@oTZS(Cryy3hSY)7tkzQA)h0n z{Xssd0BGng(E@8rUJc+x#JChp89ddBWhxHKZ8Z~XAW1~E3nR%Jwo`vxMj8Dsl_ZYG z^dvE5{t0Ejx@p3&>7a-J;3q!yoU>CYK};|*=5gSv%bB&zBvIbo?m2%&ybc)I0ZN0w zn{GAcze|a+SD<$51ZBj{&z{rWKydMbdd)=<+6bo7ED85fpS+nLbpQd{akAVhQ0jZT zhfT1$TQ{&Zv);a5i;1La@Ows50T!o?s164$?7)Y@91sk%Rc|KX(g!HlwwfEjE(5+LArwpFzsH}KU1b$XJ z2{#Uw1g9KF{hDH~u}dx>w5`3gVn@|n6YSf7ySDKT$v(-#C zwiT)g4UJvHgLce!y%>GSi3s>0w-_1COTb}=A+CBj@fJo4}-)qDS z+doBc!Iuu*j00E&6}bqen({f%qL9M^IBn`K6P_(_4{q;M)L{8xuv9+Vy{c{ms<4GKtLmT@MH%;Hvq7Ov# zVDvPUHj>2ZmO zpJ~{Hf!xj0{f|k7UiK*=8{dt# zxedxb6vdVDuM&21u?Y@8u#ZnYrq#FODc$N%9Zmc}9|Y>&IbT+UK|&Km z)zQBMUQGqWQxH>HYcfzssz(r`=iveXQGEUfta(3Re!q89i(q!(ymV@4%2|kpQKiek z-!SA`uJ6Kkm$2eT06n@pLx&Qf2NMq_Jom_zH^M2N@tIlePVbwVnw;F;2u@(IRxfk; zOnb8|r*1c`A$oHi)t?M*Ggr|bo_s%X(9iB%-BUSzOP{}HSYh9qSiFyZ&}@2thse~+ zsJ9*E+cULVK`r|mm=@ZMR-R0MgpyXH6s~z+@Y#tC5|OF1SRtl1(Q2s1a9*oAuO-L! zPe+o|xq%+qEph1)OKCE%9@USUJ1=g1=#k&`F52`SUxN4UyE&E)?7Hk}muNqY^^-82 zh?e*jdrHSv_;(irI$=DA3)?SwzGAc8twTD8nFI%ON{ZEY6Eyo`TT!nThbD%L*4JmY zyO-|nZ?{Q)RZ=ujFJ5%sG8NF+$4v+)xhEtpjij|;th3C@WQW_jnrd5Nm>uZN>Ww1Z){Hx z+;mM}iq*}A`2`>P%{kG0J`eQACEl5q6}1{MRPl9<^wGJN(#`6DY#!%7haefLBKJhI z_9?xe$GN2LfKy?>sdixX`Soc^j8_;z5*%w*<+es&J;U8|Atq$`x3k=AsH@3(hkN?6 zIs2DCPhIJ!k|E(Iq_f|%J%_5RXYK3kFeI+KmA-88hUng z;#xAY>7q2ku2vN#cHuWq<^X`Ydv+B{M3rmf6+pIUGir(y1>DiS1ac$&DIa!3m-;^w zpj$}Y68A_5fc}Ib@WPB6gVbt;3a;U08Zkkj1r>GM5m_Ko8_)os75K$Q=BGgZ2_WWk zLssNNP@Z?h2BMjhuf!6*B+m1w~ zpM+Sp+a@LVd3z#=TE?J5kFi(=-o@4iA*`4ECAvTcsqp)KpXEm#vS}5b{O8@ZgUf#W zm}6?ep`=)|Q9-meR9%1LMs^*O)Xtrjm`%0e5h0BDn9JrH&};)=ZEWB# zaWImp=wsNC^pKoV0gwpMzaAesf69y4_AAQRrx*wi+wxnRawLrU=Y3`2 zZ6LRG)Mo-a%Ysk0sQU=A3%av?4eFcn7j`;9J|&DxvF?WC^_Ly*@sT|GYD4|8Op%tZ zH+Jr^|3|TYZ0B@tUbj*JP#=SxAo~z5_fTJwVP670r>bJJtL><_?f@~c*tVwdsO3uo z(t&Qfx=&(bj7{?Iu$)Sr4wfc`1)>eb@aIeivFn{dAl588dwcT;Yg=%)rF zz{@EwwiZYCXZMnus}`IGB0M%zA9T=`7DiqTPRg8QsB9LVDQ$PP?EoLSF zyUauS*{$~O*gL3w{Y}qDVTS~?brWm}GLNDj{=^#XG9IF{cqUI=w;i{+|di%6vTrAAJ^ABnP^X>`vf3O4C_~tcjEQHtJMe^Gw@5#iqkt zX?<$uM=DOkbNwsskDPtMdFT<@4Oh1`zuc0kL$}udzW@c`SvYVw46NAoQu^P-le{Dv)YeZPp+U zAcIe{A}wTOksM+$QfnP6=09nCl14+Mx!& z)_o>d>S@2asn2)2|KeHGZNL3CdK9tpqR5V&vz}y_$TF_ptt=mzp?_(^8Mzu56_St1 zkS_H7yI9_1Y2&1@4D|b+$TK*`aVS~M?d&|ehq=7SjpBsdbT;&7tmoS9r@KS&8XX$73G0YAw3L+N>J0$j~c0s~6}364nTgq!JUAGV5I zWRVcKq*g5L1gCq9qa{R~2D{hp*usIvh*<47mdS~5ThKVcQEGSMarfCt({=%@hl`a8 zGm~|()X}ENI#vX&My^feb&0-2O^UZ4~H9b2onOuCd>2wP_$#i5NrNey)cYnU1esw)?!*%w( z?)-j^i6ovr39__PN!ZJ^TYX>Tj*8Wrx|uAd3Qgt=@4q&oIICVS<+d2B6_gOBG=F>R z8f1>vnZ7E{?~UsD$1Y4+TR7f(A+A#KT!|tN660rZnWLq6s+@3%FdL~9k}#a$t-t?O z)t}r7x525;vjU*0`pVe0mlC)!NVu$hDk<~xqR8a%{&8b1+gi(!0Ip6fXEQaPW%#pu zpjK#=E}Mh4@fWUufrpF>w8$3+PisM^GsjpZ`6jD`^1Q@(S1ZK~V-=YCs?6NKPmO(B z0@1s-Tb?u$&A-G12)Kp3YZOuJJtuJX&^QuU*Tf=w%R6`3)b#76_c;s%@y57@AsS41TzxWvGYgNdCwR0}oq z8a=;=p*ktTyOVq3rGJDnJu|Oi2nl^N zcK;wPm>2gtNA*$9ZGMJ6)t478kixef@NZUvwj377ZvkODI9DwT8 zl&GW^l)-c%E*Cld019R0OFnL!B`Ytt%sOa+bKP;84Y|llIE`1pm?3r!8}`MiPjM8$ zf&OT-xu2bAx-`oT-)BDSZ5(Npdhx$uM^1$N+4b(5m?JKUh~_63tfOA)$s`Qz&%O&i zY^Z9g+LwQmCvx4fcJQ!l)&Cj(snz`&-*F`IddpKMCQE9(rODPi`>p=Je|4hHsN*!* zWO1rO`e%|eZ*kY`Mbv7Vciu@9wU*6};?|c(2=iy9Vf5EGrq@X4Cx&2Xnh75*!^#@nAO$!djbdJ&=$kG^RdnvGZ*LC4jcYIo<%=W+yS zySGeejjotaocpt&6o?hOQ28nZosjeHFriRQYiZ;nGiI5IKw7w>PAA@B?HAsOlrf91 zE1H_gcAZ~ulwO?hM_IZKIyWu`c=@Ss#H4*J7L(dX*}DJ((to1MlM3-XTgNR`U3rLp zakjj8Lq)~Ak>j0~sNz?-B=9Vtj3oMQOEOS@seDv;0p<55(x?^|dku{$#rBT|L zGkQ^bi5hIO*k8EVWfW`SVrf=z>!eu$*jI0NHQU!0B>P#f&FApEfy{tNqHn{##&J@E z@A$j+NwiSFoN@jj{?9K5=NbZL5mw4~bCWE;uv9$Z(xr?4*@*Fx+e+kwS*|1ML&KCR zlU1B5RS@ZzEIEO|U(ZLGb+i@M`m$#aaS`3y@pszPkS3m*gu>~CY~T8V(HB^35?*`4 zq>;>_;I-VWa)rzx;@OlbdD&faWgx4OJdF1hJTo~*6JkoU{Z>)?RQ{;WRIqv~uGm{Z zXDmZ`T<&w{+qH>Dm6waY1+R?Fi&}WsN8@$WKAGm_-yCkrLY=uMfsW?V$m7v;wxP>}M9=A`qjkPQhTUp?)H?$T#p$}J7W=U{ zgi%^s;Plg$MlmBzw;ZKFuae89UY4Z{3uxt`@?p)d7Ky8{+Oo4dK#52s7}*-xk5 zGeif@BoGYQZUzj0jj^p6`_FzWTRNQ~q&4hLamLUj5idVk8I{wSNRM{htj??^9xxOnv|ez z@AlAA3^6CA4y3lXb$oU3%wJycZ9EyR^>iYCvbW04?;hiPeF1IdBYv5U$3$q)(%M&* z?$4k3OuF2#(UKN<-@33I9&@|>b8LkDQk^R#&+%-vvTP7$-V~jf9Q!5E!ugZ*@RmKlAq$tEZ6ER zXU3fK(mrkw3j|X!Sw3UME#c{fluJgPbt@wu$8IGpM9q|l85v?3o36RfAi#Z}k{=oX z(_7wy|2E6#6W&K-=h$nwTQP?eOa{K8CBIMdSI}CnVQU@}thj8N)hXyE^QBX;MT^gB zHm8Y*hdZjRzsBxrjgesV7q#HKQC~p&rCd8acSv2FS^NI&+uIX@8okw~euN{4)Pe5y zCO%N@;en8yS2rEj3miBvs$iO{eC z>}hCnO0)uU;UJ!B0fS8?&VUcj_Z~lR=!rj)E;>{uAqem>V8A~p&W))#xl-D0mAwAO z(5pdz&s1h(Mq8gAR^Pi+yb9-yPF&&|EYH>R`kJT_WpdCH4&Zri{1jmOD+DnON(`7s z@KV+*j3H8gNxa~ZHETjVuZEGIc6=V52L#cc>C5-6#*lX;mUeb#4k<*e2uw(r3ZTjH zi+%N7z5>-L0d( ztL%xOt*bLz!&!YfybzsGELYtMoZR2`6AJ;=E!B|N!3tNbXlF1v^7#C@R{{_#`X zpJji-*5q^>eoAydA8Mx>GZ0rvry+izn<(zLo!lS31aE`kOh#ReGYoV_@7E$sKh45N zfUB(ck8lYTR?7J`Y&sCEWndnZ$neh{4M7sO>cyPr)E*v7A^kqN4Qdrgs#Uec91|Kz z>Iso0X&Rp@O7X8C10nFgB;WQ>Z2}9LR22@sY6EDElv*fE2uj)dmuuCBtSw#NEawId zKL`w<<~C}$~NnXQY~T6X-}&K%OT5?9=sXtUh(@lLaQ z&Lz(*XOesIl)%CCQLJx8e7^l_{Cc7ytzI)Zpi8{5-qe@8A}RTzZ7I^dFe$%BX}V)h z#sehGc!m0R1k+_hca`p>Onr-C5L#>Z+ilDa3}_3we9-a4ASp{BlkVN3ye8JQ9Vbk) z@B?AFO!VJG}0`Y|p1t7~Npp z8KPHjgBN9V4tBagENDNQIzjaYKAKE{OMzvG8-PK!xeSsklzUO1W2bCv6g|OwMK`_q zi^E1a0SBE?$oPPVD;nwl&!z0bM7+xcXA2UrXiFV~a_JP?3Xq}x32;rh+ALWImQYmBP7xL0B2knlw zTEU@?moKj7xe+N${@%uPq{u9U)q&?44~A!ro_oAx`qc-m-aFv-Q6t+=AsW>$a0qm3 zyel&{Tb635i*DMlce0NCWtvW$-Q2JoOt0QzM_FW#1CHi<04es(CEULEEMk^hHR#k$a^KR}>e7;t@OW$BCJ{}o&6hqg!MmkBO&OAW| zKsvaQaX!jbZclY}mVgq9_5ff5cuWIS)5zDGNEr!E7*g^=M%G%M^x-SDenK~n^X{lX zo1(CV8v_>?KD?LreC`%HbSqZvuKI_9E)OV$va1H)MNeqiQBxE8#am@G;8^j5pLXrr z1pn>0+G1(`(+?T>?qqT4NZK`ssP(*_o{0%~f@I_z3d;@!f@N}kw@OY#~<1V zb{@hP<$1wGnm}w}T8$itGcZNOg_bEgT^Q~w$ED?EJ#cXwdM+F(R$*Z3+|`_rR=BJS zj)M4)hAP*(6o+?yS|(C}F@W8+%r26&Xp?CToeh1c>k_Zq4|gqN^BS#JJC8dWq$Ir@ zjgQSaMnP&B{p4ga#>n_oR3nbk9Z}n#SV54513Vjm*H#2cC6CGW z`jWn1-aJP^g>*}@q6flMGP5%f68$6zdFK1qY?3*ms$)@vB?Z)#g|LN>Nk$1oN3@2I z+pYHxs+gV#6nNmX9Vefn&2#34>e^_Ghum0qXWEoMIAd|dS<70WHqnXhR0X`?;S*QU z9S^5&l$oCfC)0-(gu%gEF_Ho!z!$(1vT%pb1;_#H*Vgy8W<)uQj6XoOfTd}bmu=JR z3Dy`gn-FwaofD8n2x&PGrw-X?sD5|8?rFY^-@gylWmIZHyV_8h z)V-WheNPszilrbraIrEUszSoq8XXA5Nf`g}uX76_JM0_!${DNa-h|(n!2YPsUR;;o z*r&OwG{O}fq?}pP4ci?Fu~_f7E3Lr1&i_=N?>qZA>M3-xqGx;@)>o?*qEx!sfxej{ z7dgRo@>YAr>X+B_V$9pKX?b60C4o=kPS%JIDQ!>S$W?Y`uC*KrL^^c7R64#D77F7j z-Clp9njE>!uGj9>XPm=?tBXk_S-@-GzNevVZlN2h3MRCa_`I?gHqd>Ea|qR)m-_cE zLY_#<9C2f2XDG3hMYMF-pFDUwXLsq#l$4oK^6E=@lei-iAD^GCk%m)O%}{&pU-ZB% zOm@zrrL9cKvvmonym4~~`(0Di2E^7WDPM}Iq+{ofzz5a{6kK#lX`8sR)f6K0R|vb* z&UQpFPUMU1&1HQ5zrIFIq6@P5@8?!ORtzM1ZEVuHt7)mJ>~b}TU2YXj^QyR;f4^se zQl3BrbxqVbM^GJ6yA#l)%tWPy>LJG=$_BC{DS2cSRU2wVK|MRNtTL(-eUy(}d z`AB-u4)+ZPbkbi=(#t!NWZW+$Kx58&D-c7ynoFtrLn%?8Y6<)@*PTeNFQZctIyb(- zeZtX={KL4RF;#WwDR5WsT~~IWD*Pn^)t47HL$wlP7(?|FmrCCNhncAY+F+I*{wWU3 z-km7solzIzM)^AznysbZx*k@f>6nfQoA!XqkHj(oG0tze!bAIq`KdQ0tJwJbuB9dM zqdxCP(dvVL*lxx-8)1sZ4ky@;UQRAql}p>%lsh}|zKAHF+aaZ(o;+nB2u*%g4DG5sq!59uO6356 zis3vAD`P>0#A?5Ui~zgh_+#F}a=LFbtNiLAp1rxXT7J zJSgjv4R*?{2Sa7;cnLlqJg)qaQ`_OLhy|(U49_g&TN}jIdjWyxxkqpRiF$(j8~h3i z4`x0N1JL;zgJ>I2+T##14e)AEEQRS4s~O_@Ca0<%^kDfp+v#I(nI0ksn!dexg+T>3 zd|>A-AG3n<9&N!9FtX_QSg@nhRdbn6m>{AUap~T;6g&}@XlTrNkU$grOVV(0iMugQ zg2mf+x5{~%J)At<4RFzr-ZW&;V^c%-En4)US5@ zB`%>G%=VW>0;e+%{mIE$SJC2_IzB#Jso(0DTALICOmc$vMbsarp%U%8j@*XU$tC>M z*AJu0KeV?ua?%yLVk9cHv%g&Mt}$kbS#lM270!qyb<~1G4a0`2#kRm{4FHh~b2uc& zzhr7^`}&^ePJ@;v^E>+PP%pt46KN-nQm%K#Azv18W0p93^=d#el4Pr23V*>a(Sh!p zis>-$7l&E)Z8eD%FcJoqJ}k!|P{qjgQr198fB>f+00>7njf!5M&imBy&lia&AUHl> zB+)aLN@3~;D>JA^qgO0yb~*45-7qjrpO%o9bacr+NQVwo?_5y}Hzn@?+6%!eS{2{^|!j6HlLcN;}+w)9v z<8B9Haz?CdNv>?HFTe*IVT?Hl`cd#zho1*@xBG9(51!ZNUiUYfwKn_A!8C}}8D;F# zWVR8)LS`h_s?uT(+~{B%L=4hJnmFwetly7d>Jdz&B|zFQdRX7Om_a3U9!?m|gYoC9 zwk|$yj5x53s86il=oFDPC5(k|oZhWQ$dedOxiT`dut>ZE!uX!y%UP1J?ZfBndr=A( zkc`1XnC+#f+TLFz1&axv0ov1;j4K0J=DxBF!Ge460K{mbk!yyc_hMtGqCf^qiYqb3 zPvtu$JJxdY;PSrGfdfCM{s(9p{s&08C9MT!WGQG#l7{#S|Y9Em1x^so4B z;C5z;HsRa+i86S7gV{1fZjkWX9=^fUKMN{(7Q@DW%(Cep8@|DP)B7TUlO>)7a?wwA z9HyNcqq=PPCkTKn43yWPEI^zPFy=j{Y*(5;J4O;aWXx)2@UH z@QGMB&Wg&~M=<)24)(uaJbxEJAS`76O)C8Ns}kI{J3C)!f72i6>I!sxSSzji2tC{C zrQy1Z%lGA%Wo16xsXdHV7hROu?Xaa_7x(h^6@NCBGXC9OYs!Xkh6Sa3q!O>)PtW=% zyxpV$Ud-(4w1D4kfC{Ed;y1v#tA&R#5FDk}137DQwb9SZT4=ERI0x886NdT^>$s7E zrXM)Q>%ao}ZW%i2uI%#)NID7S63&6a2eYZ&a#{m(}fA9u2` z_(4{d{MN~F6vrReQGpYNBy9?bMUIa4GvH-Q`_We0TQ&uJ&{EJnf>}yrB-!i4l*MrZ zK=F9G3SFhXxt@g!C>XvNEIGk5s26wu<-8rTw$Ca+Fi4l=b06*v-%YI1a}F*u0RRbZ z{x-d}407;PJnXoi&1kix62`_rq$=;R#lAtk-^P+|4Tzv7a2tRJ$J!MLycSjZ#RIq( z+-EfIS&n5(oSUF;@0l3(GzRAdGcs{7{zkLVtV>c`%wL>lQZ<~G;Be=9(ljo<>e8#1 z0gLeo`ZAyZi!cPY5)BO6Wi%lMCU#@GB|#I=3o=CM!c`i2!Ui@=ftj%c-_nr2>CYFCLQ+{CFZewD@SDxyWc{nr5rin;>O4V1m`JWGmHbO(V(+4Ok q{{1uoBY$Dr|F*!8BOm?R;P|SMaIP;)MsOSaG0-#A{d~eIt literal 0 HcmV?d00001 From 1831486638a76fc165a59eb330b6ef0fc2c1709f Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Mon, 5 Sep 2022 10:18:02 +0000 Subject: [PATCH 16/34] removes the output from the notebooks --- ...ular_classification_model_evaluation.ipynb | 2796 ++++++++-------- ..._tabular_regression_model_evaluation.ipynb | 2910 +++++++++-------- 2 files changed, 2896 insertions(+), 2810 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb index eb9d7f14d..cc5de4927 100644 --- a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb @@ -1,1373 +1,1427 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `Datasets`\n", - "- Vertex AI `Training`(AutoML Tabular Classification) \n", - "- Vertex AI `Pipelines`\n", - "- Vertex AI `Batch Predictions`\n", - "\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Create a Vertex AI `Dataset`.\n", - "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", - "- Import the trained `AutoML model resource` into the pipeline.\n", - "- Run a `Batch Prediction` job.\n", - "- Evaulate the AutoML model using the `Classification Evaluation Component`." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43", - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "from kfp.v2 import compiler\n", - "from google.cloud import aiplatform_v1\n", - "import matplotlib.pyplot as plt\n", - "import json" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print (\"Resource name:\",dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple classification model the created dataset using `Adopted` as the target column. \n", - "\n", - "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if TRAINING_JOB_DISPLAY_NAME == \"\" or \\\n", - " TRAINING_JOB_DISPLAY_NAME is None or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\":\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", - "\n", - "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", - "- `column_transformations`(Optional): Transformations to apply to the input columns.\n", - "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", - "\n", - "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Define the AutoML training job\n", - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"classification\",\n", - " column_transformations=[\n", - " {\"categorical\": {\"column_name\": \"Type\"}},\n", - " {\"numeric\": {\"column_name\": \"Age\"}},\n", - " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", - " {\"categorical\": {\"column_name\": \"Color1\"}},\n", - " {\"categorical\": {\"column_name\": \"Color2\"}},\n", - " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", - " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", - " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", - " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", - " {\"categorical\": {\"column_name\": \"Health\"}},\n", - " {\"numeric\": {\"column_name\": \"Fee\"}},\n", - " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", - " ],\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if MODEL_DISPLAY_NAME == \"\" or \\\n", - " MODEL_DISPLAY_NAME is None or MODEL_DISPLAY_NAME == \"[your-model-display-name]\":\n", - " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Run training job on the created TabularDataset by passing the following arguments for training:\n", - "\n", - "- `service_account`: The service account configured to run the training job.\n", - "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", - "- `target_column`: The name of the column values of which the Model is to predict.\n", - "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", - "- `budget_milli_node_hours`(Optional): The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", - "\n", - "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", - "\n", - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Specify the target column\n", - "target_column = \"Adopted\"\n", - "\n", - "# Run the training job\n", - "model = train_job.run(\n", - " service_account=SERVICE_ACCOUNT,\n", - " dataset=dataset,\n", - " target_column=target_column,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(\n", - " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = 'jsonl',\n", - " batch_predict_machine_type: str = 'n1-standard-4',\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000\n", - "):\n", - " \n", - " from google_cloud_pipeline_components.experimental.evaluation import GetVertexModelOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import EvaluationDataSamplerOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationClassificationOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationFeatureAttributionOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import ModelImportEvaluationOp\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " \n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - " \n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size\n", - " )\n", - " \n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs['model'],\n", - " job_display_name='model-registry-batch-predict-evaluation',\n", - " gcs_source_uris= data_sampler_task.outputs['gcs_output_directory'],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationClassificationOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " problem_type=prediction_type,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", - " predictions_format=batch_predict_predictions_format\n", - " )\n", - " \n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format='jsonl',\n", - " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory']\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " problem_type=prediction_type,\n", - " classification_metrics=eval_task.outputs['evaluation_metrics'],\n", - " feature_attributions=feature_attribution_task.outputs['feature_attributions'],\n", - " model=get_model_task.outputs['model'],\n", - " dataset_type=batch_predict_instances_format\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_classification_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_classification_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\" or \\\n", - " PIPELINE_DISPLAY_NAME == \"\" or PIPELINE_DISPLAY_NAME is None:\n", - " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", - "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", - "- `target_column_name`: Name of the column to be used as the target for classification.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", - "parameters = {\n", - " 'project':PROJECT_ID,\n", - " 'location':REGION,\n", - " 'root_dir':PIPELINE_ROOT,\n", - " 'model_name':model.resource_name,\n", - " 'prediction_type':\"classification\",\n", - " 'target_column_name':\"Adopted\",\n", - " 'batch_predict_gcs_source_uris':[DATA_SOURCE],\n", - " 'batch_predict_instances_format':'csv',\n", - " 'batch_predict_explanation_data_sample_size': 3000\n", - " }" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Run the pipeline using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=DISPLAY_NAME,\n", - " template_path=\"tabular_classification_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if ((\"model-evaluation\" in task.task_name) and\n", - " (\"model-evaluation-import\" not in task.task_name) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", - " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualize the metrics\n", - "\n", - "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(5,3))\n", - "plt.bar(x=metrics,height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if ((task.task_name == \"feature-attribution\" ) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", - " feat_attrs = task.outputs.get('feature_attributions').artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print (feat_attrs)\n", - "print (feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print (attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - " \n", - "plt.figure(figsize=(5,3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "community-automl-regression.ipynb", - "provenance": [] - }, - "environment": { - "kernel": "conda-env-eval_comp-py", - "name": "common-cpu.m90", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/base-cpu:m90" - }, - "kernelspec": { - "display_name": "Python [conda env:eval_comp]", - "language": "python", - "name": "conda-env-eval_comp-py" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.6" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `Datasets`\n", + "- Vertex AI `Training`(AutoML Tabular Classification) \n", + "- Vertex AI `Pipelines`\n", + "- Vertex AI `Batch Predictions`\n", + "\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI `Dataset`.\n", + "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", + "- Import the trained `AutoML model resource` into the pipeline.\n", + "- Run a `Batch Prediction` job.\n", + "- Evaulate the AutoML model using the `Classification Evaluation Component`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8d97acf78771" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3390c9e9426c" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2011a473ce65" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6da01c2f1d4f" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple classification model the created dataset using `Adopted` as the target column. \n", + "\n", + "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5dd3db2d1225" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0614e3fb19da" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "73020acd076d" + }, + "source": [ + "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", + "\n", + "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", + "- `column_transformations`(Optional): Transformations to apply to the input columns.\n", + "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", + "\n", + "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "531f117e536c" + }, + "outputs": [], + "source": [ + "# Define the AutoML training job\n", + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"classification\",\n", + " column_transformations=[\n", + " {\"categorical\": {\"column_name\": \"Type\"}},\n", + " {\"numeric\": {\"column_name\": \"Age\"}},\n", + " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", + " {\"categorical\": {\"column_name\": \"Color1\"}},\n", + " {\"categorical\": {\"column_name\": \"Color2\"}},\n", + " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", + " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", + " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", + " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", + " {\"categorical\": {\"column_name\": \"Health\"}},\n", + " {\"numeric\": {\"column_name\": \"Fee\"}},\n", + " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", + " ],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "391c51c98647" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "454f077b984e" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "21b5a27e8171" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0299c1f24a87" + }, + "source": [ + "Run training job on the created TabularDataset by passing the following arguments for training:\n", + "\n", + "- `service_account`: The service account configured to run the training job.\n", + "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", + "- `target_column`: The name of the column values of which the Model is to predict.\n", + "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", + "- `budget_milli_node_hours`(Optional): The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", + "\n", + "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", + "\n", + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "15463d5d2243" + }, + "outputs": [], + "source": [ + "# Specify the target column\n", + "target_column = \"Adopted\"\n", + "\n", + "# Run the training job\n", + "model = train_job.run(\n", + " service_account=SERVICE_ACCOUNT,\n", + " dataset=dataset,\n", + " target_column=target_column,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bfa52eb3f22f" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d56e2b3cf57d" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bd2e1da7a64e" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "19c434d8b035" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1db1b1337f20" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e6e1c0ecc3b6" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationClassificationOp, ModelEvaluationFeatureAttributionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationClassificationOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " problem_type=prediction_type,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=\"jsonl\",\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " problem_type=prediction_type,\n", + " classification_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1abb012ce04b" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_classification_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e526b588cae9" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_classification_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "26eef4b83c88" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "63b84f5490d2" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e0a18b803bb7" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9149549cfd4d" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", + "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", + "- `target_column_name`: Name of the column to be used as the target for classification.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6223d67277f3" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"prediction_type\": \"classification\",\n", + " \"target_column_name\": \"Adopted\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4d089ca32516" + }, + "source": [ + "Run the pipeline using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3e7703929a21" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=DISPLAY_NAME,\n", + " template_path=\"tabular_classification_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d1b840a79c4e" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", + " 0\n", + " ] # ['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ca00512eb89f" + }, + "source": [ + "### Visualize the metrics\n", + "\n", + "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "f9e38f73f838" + }, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "049c9bbae2cb" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "03ca8c149bc6" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "719d2cd57d10" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "82e308dd8aca" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5bfe517357f8" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d7a7dca9e3cc" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "automl_tabular_classification_model_evaluation.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 } diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index 6684da910..69380ad72 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -1,1443 +1,1475 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI regression model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `AutoML` \n", - "- Vertex AI `Pipeline`\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Import a pre-trained `AutoML model resource` into the pipeline\n", - "- Run a `batch prediction` job\n", - "- Evaulate the AutoML model using the `regression evluation component`" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook. You can skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. \n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com). {TODO: Update the APIs needed for your tutorial. Edit the API names, and update the link to append the API IDs, separating each one with a comma. For example, container.googleapis.com,cloudbuild.googleapis.com}\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Compute service account storage object creator and viewer permissions!!!**\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for {TODO; e.g., Vertex AI Pipelines}\n", - "\n", - "Run the following commands to grant your service account access to {TODO; i.e., read and write pipeline artifacts} in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "from kfp.v2 import compiler\n", - "from google.cloud import aiplatform_v1\n", - "import matplotlib.pyplot as plt\n", - "import json" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BiVlyW5OUnjK" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bViYfWfpVAiF" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "20S9En09X0PY", - "tags": [] - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset\",\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print (\"Resource name:\",dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "A-QQkeUnq8Xt" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple regression model using the created dataset using `Age` as the target column. \n", - "\n", - "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Bxn6ATUXrET6" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if TRAINING_JOB_DISPLAY_NAME == \"\" or \\\n", - " TRAINING_JOB_DISPLAY_NAME is None or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\":\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "An AutoML training job is created with the `AutoMLForecastingTrainingJob` class, with the following parameters:\n", - "\n", - "- `display_name`: The human readable name for the `TrainingJob` resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", - "- `column_transformations`: (Optional): Transformations to apply to the input columns\n", - "- `optimization_objective`: The optimization objective to minimize or maximize.\n", - " - `minimize-rmse`\n", - " - `minimize-mae`\n", - " - `minimize-rmsle`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "3l691PEMZFdA", - "outputId": "f0757841-1901-408f-a587-28b6f3a4181d" - }, - "outputs": [], - "source": [ - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"regression\",\n", - " column_transformations=[\n", - " {\"categorical\": {\"column_name\": \"Type\"}},\n", - " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", - " {\"categorical\": {\"column_name\": \"Gender\"}},\n", - " {\"categorical\": {\"column_name\": \"Color1\"}},\n", - " {\"categorical\": {\"column_name\": \"Color2\"}},\n", - " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", - " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", - " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", - " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", - " {\"categorical\": {\"column_name\": \"Health\"}},\n", - " {\"numeric\": {\"column_name\": \"Fee\"}},\n", - " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", - " {\"categorical\": {\"column_name\": \"Adopted\"}},\n", - "\n", - " ], \n", - " optimization_objective=\"minimize-rmse\"\n", - ")\n", - "\n", - "print(train_job)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if MODEL_DISPLAY_NAME == \"\" or \\\n", - " MODEL_DISPLAY_NAME is None or MODEL_DISPLAY_NAME == \"[your-model-display-name]\":\n", - " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, you start the training job by invoking the method `run`, with the following parameters:\n", - "\n", - "- `dataset`: The `Dataset` resource to train the model.\n", - "- `target_column`: The name of the column values of which the Model is to predict.\n", - "- `training_fraction_split`: The percentage of the dataset to use for training.\n", - "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", - "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", - "- `model_display_name`: The human readable name for the trained model.\n", - "- `disable_early_stopping`: If true, the entire budget is used.\n", - "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "IIfvPCGYyFCT", - "tags": [] - }, - "outputs": [], - "source": [ - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=\"Age\",\n", - " training_fraction_split=0.8,\n", - " validation_fraction_split=0.1,\n", - " test_fraction_split=0.1,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " disable_early_stopping=False,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rYirKB_9yaa0" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "KkgCdQQAyZP1" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ktMsqtibAUzz" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(\n", - " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = 'jsonl',\n", - " batch_predict_machine_type: str = 'n1-standard-4',\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_machine_type: str = 'n1-standard-4',\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_disk_size_gb: int = 50,\n", - " dataflow_service_account: str = '',\n", - " dataflow_subnetwork: str = '',\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = ''):\n", - " \n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " GetVertexModelOp,\n", - " EvaluationDataSamplerOp,\n", - " ModelEvaluationRegressionOp, \n", - " ModelEvaluationFeatureAttributionOp,\n", - " ModelImportEvaluationOp\n", - " )\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " \n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - " \n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size\n", - " )\n", - " \n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs['model'],\n", - " job_display_name='model-registry-batch-predict-evaluation',\n", - " gcs_source_uris= data_sampler_task.outputs['gcs_output_directory'],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationRegressionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", - " predictions_format=batch_predict_predictions_format,\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name\n", - " )\n", - " \n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format='jsonl',\n", - " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " problem_type=prediction_type,\n", - " regression_metrics=eval_task.outputs['evaluation_metrics'],\n", - " feature_attributions=feature_attribution_task.outputs['feature_attributions'],\n", - " model=get_model_task.outputs['model'],\n", - " dataset_type=batch_predict_instances_format\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_regression_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI regression model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML` \n", + "- Vertex AI `Pipeline`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Import a pre-trained `AutoML model resource` into the pipeline\n", + "- Run a `batch prediction` job\n", + "- Evaulate the AutoML model using the `regression evluation component`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook. You can skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com). {TODO: Update the APIs needed for your tutorial. Edit the API names, and update the link to append the API IDs, separating each one with a comma. For example, container.googleapis.com,cloudbuild.googleapis.com}\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Compute service account storage object creator and viewer permissions!!!**\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for {TODO; e.g., Vertex AI Pipelines}\n", + "\n", + "Run the following commands to grant your service account access to {TODO; i.e., read and write pipeline artifacts} in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BiVlyW5OUnjK" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bViYfWfpVAiF" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "20S9En09X0PY" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset\",\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A-QQkeUnq8Xt" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple regression model using the created dataset using `Age` as the target column. \n", + "\n", + "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Bxn6ATUXrET6" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ede4687dfd89" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "45c61ad01158" + }, + "source": [ + "An AutoML training job is created with the `AutoMLForecastingTrainingJob` class, with the following parameters:\n", + "\n", + "- `display_name`: The human readable name for the `TrainingJob` resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", + "- `column_transformations`: (Optional): Transformations to apply to the input columns\n", + "- `optimization_objective`: The optimization objective to minimize or maximize.\n", + " - `minimize-rmse`\n", + " - `minimize-mae`\n", + " - `minimize-rmsle`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3l691PEMZFdA" + }, + "outputs": [], + "source": [ + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"regression\",\n", + " column_transformations=[\n", + " {\"categorical\": {\"column_name\": \"Type\"}},\n", + " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", + " {\"categorical\": {\"column_name\": \"Gender\"}},\n", + " {\"categorical\": {\"column_name\": \"Color1\"}},\n", + " {\"categorical\": {\"column_name\": \"Color2\"}},\n", + " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", + " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", + " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", + " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", + " {\"categorical\": {\"column_name\": \"Health\"}},\n", + " {\"numeric\": {\"column_name\": \"Fee\"}},\n", + " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", + " {\"categorical\": {\"column_name\": \"Adopted\"}},\n", + " ],\n", + " optimization_objective=\"minimize-rmse\",\n", + ")\n", + "\n", + "print(train_job)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "391c51c98647" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "454f077b984e" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "40153831d5be" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "587f38260598" + }, + "source": [ + "Next, you start the training job by invoking the method `run`, with the following parameters:\n", + "\n", + "- `dataset`: The `Dataset` resource to train the model.\n", + "- `target_column`: The name of the column values of which the Model is to predict.\n", + "- `training_fraction_split`: The percentage of the dataset to use for training.\n", + "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", + "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", + "- `model_display_name`: The human readable name for the trained model.\n", + "- `disable_early_stopping`: If true, the entire budget is used.\n", + "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5caae7fc10d9" + }, + "source": [ + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IIfvPCGYyFCT" + }, + "outputs": [], + "source": [ + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=\"Age\",\n", + " training_fraction_split=0.8,\n", + " validation_fraction_split=0.1,\n", + " test_fraction_split=0.1,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " disable_early_stopping=False,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rYirKB_9yaa0" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3f4d0c17150d" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "581a188f0453" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5adde0951eb5" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ktMsqtibAUzz" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_machine_type: str = \"n1-standard-4\",\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_disk_size_gb: int = 50,\n", + " dataflow_service_account: str = \"\",\n", + " dataflow_subnetwork: str = \"\",\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = \"\",\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=\"jsonl\",\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " problem_type=prediction_type,\n", + " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a712dfa762ee" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_regression_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NOvOMTEgCVcW" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bdd9e2fd6841" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8f17c5c7b3e3" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1aa7d7bbb1c9" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "90f424d5dca0" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", + "- `prediction_type`: Type of the prediction (In this tutorial, it is \"regression\").\n", + "- `target_column_name`: Name of the column to be used as the target for regression.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"prediction_type\": \"regression\",\n", + " \"target_column_name\": \"Age\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e8dce0638349" + }, + "source": [ + "Run the pipeline using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pdHib_yUEuEk" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_regression_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "625960707c60" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U2zocUvk2YVs" + }, + "source": [ + "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mtHA8rhGGQv3" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", + " 0\n", + " ] # ['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e69f183f902b" + }, + "source": [ + "### Visualize the metrics\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "a912129939ab" + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " if (\n", + " i[0] == \"meanAbsolutePercentageError\"\n", + " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", + " continue\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(10, 5))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8737e6e31129" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b09056628b26" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4d9d6a82d826" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c26a2091f4fc" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "77151be8d776" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "069bf017e0de" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "NOvOMTEgCVcW", - "outputId": "e9918c90-562c-4c1a-ce37-ffce17cb0af7" - }, - "outputs": [], - "source": [ - "\n", - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_regression_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\" or \\\n", - " PIPELINE_DISPLAY_NAME == \"\" or PIPELINE_DISPLAY_NAME is None:\n", - " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", - "- `prediction_type`: Type of the prediction (In this tutorial, it is \"regression\").\n", - "- `target_column_name`: Name of the column to be used as the target for regression.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "JeSiA6-TSgV8" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", - "parameters = {\n", - " 'project':PROJECT_ID,\n", - " 'location':REGION,\n", - " 'root_dir':PIPELINE_ROOT,\n", - " 'model_name':model.resource_name,\n", - " 'prediction_type':\"regression\",\n", - " 'target_column_name':\"Age\",\n", - " 'batch_predict_gcs_source_uris':[DATA_SOURCE],\n", - " 'batch_predict_instances_format':'csv',\n", - " 'batch_predict_explanation_data_sample_size': 3000\n", - " }" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Run the pipeline using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pdHib_yUEuEk" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_regression_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U2zocUvk2YVs" - }, - "source": [ - "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "mtHA8rhGGQv3", - "tags": [] - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if ((\"model-evaluation\" in task.task_name) and\n", - " (\"model-evaluation-import\" not in task.task_name) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", - " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualize the metrics\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import matplotlib.pyplot as plt\n", - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " if i[0]==\"meanAbsolutePercentageError\": #we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", - " continue\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(10,5))\n", - "plt.bar(x=metrics,height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if ((task.task_name == \"feature-attribution\" ) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", - " feat_attrs = task.outputs.get('feature_attributions').artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print (feat_attrs)\n", - "print (feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print (attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - " \n", - "plt.figure(figsize=(5,3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "community-automl-regression.ipynb", - "provenance": [] - }, - "environment": { - "kernel": "python3", - "name": "common-cpu.m94", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/base-cpu:m94" - }, - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" + "collapsed_sections": [], + "name": "automl_tabular_regression_model_evaluation.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.12" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "nbformat": 4, + "nbformat_minor": 0 } From ea11f9d1179d7d736b0e1896e3cf446505412174 Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Mon, 5 Sep 2022 10:19:42 +0000 Subject: [PATCH 17/34] removes the extra matplotlib import --- ..._tabular_regression_model_evaluation.ipynb | 2963 +++++++++-------- 1 file changed, 1490 insertions(+), 1473 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index 69380ad72..302a07443 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -1,1475 +1,1492 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI regression model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `AutoML` \n", - "- Vertex AI `Pipeline`\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Import a pre-trained `AutoML model resource` into the pipeline\n", - "- Run a `batch prediction` job\n", - "- Evaulate the AutoML model using the `regression evluation component`" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook. You can skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. \n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com). {TODO: Update the APIs needed for your tutorial. Edit the API names, and update the link to append the API IDs, separating each one with a comma. For example, container.googleapis.com,cloudbuild.googleapis.com}\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Compute service account storage object creator and viewer permissions!!!**\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for {TODO; e.g., Vertex AI Pipelines}\n", - "\n", - "Run the following commands to grant your service account access to {TODO; i.e., read and write pipeline artifacts} in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "import matplotlib.pyplot as plt\n", - "from google.cloud import aiplatform_v1\n", - "from kfp.v2 import compiler" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BiVlyW5OUnjK" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bViYfWfpVAiF" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "20S9En09X0PY" - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset\",\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print(\"Resource name:\", dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "A-QQkeUnq8Xt" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple regression model using the created dataset using `Age` as the target column. \n", - "\n", - "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Bxn6ATUXrET6" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ede4687dfd89" - }, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if (\n", - " TRAINING_JOB_DISPLAY_NAME == \"\"\n", - " or TRAINING_JOB_DISPLAY_NAME is None\n", - " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", - "):\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "45c61ad01158" - }, - "source": [ - "An AutoML training job is created with the `AutoMLForecastingTrainingJob` class, with the following parameters:\n", - "\n", - "- `display_name`: The human readable name for the `TrainingJob` resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", - "- `column_transformations`: (Optional): Transformations to apply to the input columns\n", - "- `optimization_objective`: The optimization objective to minimize or maximize.\n", - " - `minimize-rmse`\n", - " - `minimize-mae`\n", - " - `minimize-rmsle`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3l691PEMZFdA" - }, - "outputs": [], - "source": [ - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"regression\",\n", - " column_transformations=[\n", - " {\"categorical\": {\"column_name\": \"Type\"}},\n", - " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", - " {\"categorical\": {\"column_name\": \"Gender\"}},\n", - " {\"categorical\": {\"column_name\": \"Color1\"}},\n", - " {\"categorical\": {\"column_name\": \"Color2\"}},\n", - " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", - " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", - " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", - " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", - " {\"categorical\": {\"column_name\": \"Health\"}},\n", - " {\"numeric\": {\"column_name\": \"Fee\"}},\n", - " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", - " {\"categorical\": {\"column_name\": \"Adopted\"}},\n", - " ],\n", - " optimization_objective=\"minimize-rmse\",\n", - ")\n", - "\n", - "print(train_job)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "391c51c98647" - }, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "454f077b984e" - }, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "40153831d5be" - }, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if (\n", - " MODEL_DISPLAY_NAME == \"\"\n", - " or MODEL_DISPLAY_NAME is None\n", - " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", - "):\n", - " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "587f38260598" - }, - "source": [ - "Next, you start the training job by invoking the method `run`, with the following parameters:\n", - "\n", - "- `dataset`: The `Dataset` resource to train the model.\n", - "- `target_column`: The name of the column values of which the Model is to predict.\n", - "- `training_fraction_split`: The percentage of the dataset to use for training.\n", - "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", - "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", - "- `model_display_name`: The human readable name for the trained model.\n", - "- `disable_early_stopping`: If true, the entire budget is used.\n", - "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5caae7fc10d9" - }, - "source": [ - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "IIfvPCGYyFCT" - }, - "outputs": [], - "source": [ - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=\"Age\",\n", - " training_fraction_split=0.8,\n", - " validation_fraction_split=0.1,\n", - " test_fraction_split=0.1,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " disable_early_stopping=False,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rYirKB_9yaa0" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "KkgCdQQAyZP1" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3f4d0c17150d" - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "581a188f0453" - }, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5adde0951eb5" - }, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ktMsqtibAUzz" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = \"jsonl\",\n", - " batch_predict_machine_type: str = \"n1-standard-4\",\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_machine_type: str = \"n1-standard-4\",\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_disk_size_gb: int = 50,\n", - " dataflow_service_account: str = \"\",\n", - " dataflow_subnetwork: str = \"\",\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = \"\",\n", - "):\n", - "\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " EvaluationDataSamplerOp, GetVertexModelOp,\n", - " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", - " ModelImportEvaluationOp)\n", - "\n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size,\n", - " )\n", - "\n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs[\"model\"],\n", - " job_display_name=\"model-registry-batch-predict-evaluation\",\n", - " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationRegressionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " predictions_format=batch_predict_predictions_format,\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format=\"jsonl\",\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " problem_type=prediction_type,\n", - " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", - " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", - " model=get_model_task.outputs[\"model\"],\n", - " dataset_type=batch_predict_instances_format,\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "a712dfa762ee" - }, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_regression_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NOvOMTEgCVcW" - }, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_regression_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bdd9e2fd6841" - }, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "8f17c5c7b3e3" - }, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "1aa7d7bbb1c9" - }, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if (\n", - " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", - " or PIPELINE_DISPLAY_NAME == \"\"\n", - " or PIPELINE_DISPLAY_NAME is None\n", - "):\n", - " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "90f424d5dca0" - }, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", - "- `prediction_type`: Type of the prediction (In this tutorial, it is \"regression\").\n", - "- `target_column_name`: Name of the column to be used as the target for regression.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "JeSiA6-TSgV8" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", - "parameters = {\n", - " \"project\": PROJECT_ID,\n", - " \"location\": REGION,\n", - " \"root_dir\": PIPELINE_ROOT,\n", - " \"model_name\": model.resource_name,\n", - " \"prediction_type\": \"regression\",\n", - " \"target_column_name\": \"Age\",\n", - " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", - " \"batch_predict_instances_format\": \"csv\",\n", - " \"batch_predict_explanation_data_sample_size\": 3000,\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e8dce0638349" - }, - "source": [ - "Run the pipeline using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pdHib_yUEuEk" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_regression_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "625960707c60" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U2zocUvk2YVs" - }, - "source": [ - "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "mtHA8rhGGQv3" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if (\n", - " (\"model-evaluation\" in task.task_name)\n", - " and (\"model-evaluation-import\" not in task.task_name)\n", - " and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " )\n", - " ):\n", - " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", - " 0\n", - " ] # ['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e69f183f902b" - }, - "source": [ - "### Visualize the metrics\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "a912129939ab" - }, - "outputs": [], - "source": [ - "import matplotlib.pyplot as plt\n", - "\n", - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " if (\n", - " i[0] == \"meanAbsolutePercentageError\"\n", - " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", - " continue\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(10, 5))\n", - "plt.bar(x=metrics, height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8737e6e31129" - }, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "b09056628b26" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if (task.task_name == \"feature-attribution\") and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " ):\n", - " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print(feat_attrs)\n", - "print(feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "4d9d6a82d826" - }, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "c26a2091f4fc" - }, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print(attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "77151be8d776" - }, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "069bf017e0de" - }, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - "\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "automl_tabular_regression_model_evaluation.ipynb", - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - } - }, - "nbformat": 4, - "nbformat_minor": 0 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI regression model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML` \n", + "- Vertex AI `Pipeline`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Import a pre-trained `AutoML model resource` into the pipeline\n", + "- Run a `batch prediction` job\n", + "- Evaulate the AutoML model using the `regression evluation component`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook. You can skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com). {TODO: Update the APIs needed for your tutorial. Edit the API names, and update the link to append the API IDs, separating each one with a comma. For example, container.googleapis.com,cloudbuild.googleapis.com}\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Compute service account storage object creator and viewer permissions!!!**\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for {TODO; e.g., Vertex AI Pipelines}\n", + "\n", + "Run the following commands to grant your service account access to {TODO; i.e., read and write pipeline artifacts} in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BiVlyW5OUnjK" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bViYfWfpVAiF" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "20S9En09X0PY" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset\",\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A-QQkeUnq8Xt" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple regression model using the created dataset using `Age` as the target column. \n", + "\n", + "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Bxn6ATUXrET6" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ede4687dfd89" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "45c61ad01158" + }, + "source": [ + "An AutoML training job is created with the `AutoMLForecastingTrainingJob` class, with the following parameters:\n", + "\n", + "- `display_name`: The human readable name for the `TrainingJob` resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", + "- `column_transformations`: (Optional): Transformations to apply to the input columns\n", + "- `optimization_objective`: The optimization objective to minimize or maximize.\n", + " - `minimize-rmse`\n", + " - `minimize-mae`\n", + " - `minimize-rmsle`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3l691PEMZFdA" + }, + "outputs": [], + "source": [ + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"regression\",\n", + " column_transformations=[\n", + " {\"categorical\": {\"column_name\": \"Type\"}},\n", + " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", + " {\"categorical\": {\"column_name\": \"Gender\"}},\n", + " {\"categorical\": {\"column_name\": \"Color1\"}},\n", + " {\"categorical\": {\"column_name\": \"Color2\"}},\n", + " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", + " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", + " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", + " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", + " {\"categorical\": {\"column_name\": \"Health\"}},\n", + " {\"numeric\": {\"column_name\": \"Fee\"}},\n", + " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", + " {\"categorical\": {\"column_name\": \"Adopted\"}},\n", + " ],\n", + " optimization_objective=\"minimize-rmse\",\n", + ")\n", + "\n", + "print(train_job)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "391c51c98647" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "454f077b984e" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "40153831d5be" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "587f38260598" + }, + "source": [ + "Next, you start the training job by invoking the method `run`, with the following parameters:\n", + "\n", + "- `dataset`: The `Dataset` resource to train the model.\n", + "- `target_column`: The name of the column values of which the Model is to predict.\n", + "- `training_fraction_split`: The percentage of the dataset to use for training.\n", + "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", + "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", + "- `model_display_name`: The human readable name for the trained model.\n", + "- `disable_early_stopping`: If true, the entire budget is used.\n", + "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5caae7fc10d9" + }, + "source": [ + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IIfvPCGYyFCT" + }, + "outputs": [], + "source": [ + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=\"Age\",\n", + " training_fraction_split=0.8,\n", + " validation_fraction_split=0.1,\n", + " test_fraction_split=0.1,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " disable_early_stopping=False,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rYirKB_9yaa0" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3f4d0c17150d" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "581a188f0453" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5adde0951eb5" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ktMsqtibAUzz" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_machine_type: str = \"n1-standard-4\",\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_disk_size_gb: int = 50,\n", + " dataflow_service_account: str = \"\",\n", + " dataflow_subnetwork: str = \"\",\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = \"\",\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=\"jsonl\",\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " problem_type=prediction_type,\n", + " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a712dfa762ee" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_regression_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NOvOMTEgCVcW" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bdd9e2fd6841" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8f17c5c7b3e3" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1aa7d7bbb1c9" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "90f424d5dca0" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", + "- `prediction_type`: Type of the prediction (In this tutorial, it is \"regression\").\n", + "- `target_column_name`: Name of the column to be used as the target for regression.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"prediction_type\": \"regression\",\n", + " \"target_column_name\": \"Age\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e8dce0638349" + }, + "source": [ + "Run the pipeline using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pdHib_yUEuEk" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_regression_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "625960707c60" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U2zocUvk2YVs" + }, + "source": [ + "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mtHA8rhGGQv3" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", + " 0\n", + " ] # ['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e69f183f902b" + }, + "source": [ + "### Visualize the metrics\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "a912129939ab" + }, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " if (\n", + " i[0] == \"meanAbsolutePercentageError\"\n", + " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", + " continue\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(10, 5))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8737e6e31129" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b09056628b26" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4d9d6a82d826" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c26a2091f4fc" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "77151be8d776" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "069bf017e0de" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "automl_tabular_regression_model_evaluation.ipynb", + "toc_visible": true + }, + "environment": { + "kernel": "python3", + "name": "common-cpu.m90", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/base-cpu:m90" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 } From cf439808d1170e1aeb3bba66ee96878ffa7fa32e Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Mon, 5 Sep 2022 10:20:06 +0000 Subject: [PATCH 18/34] ran linter test --- ..._tabular_regression_model_evaluation.ipynb | 2961 ++++++++--------- 1 file changed, 1471 insertions(+), 1490 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index 302a07443..1166c7283 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -1,1492 +1,1473 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI regression model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `AutoML` \n", - "- Vertex AI `Pipeline`\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Import a pre-trained `AutoML model resource` into the pipeline\n", - "- Run a `batch prediction` job\n", - "- Evaulate the AutoML model using the `regression evluation component`" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook. You can skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. \n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com). {TODO: Update the APIs needed for your tutorial. Edit the API names, and update the link to append the API IDs, separating each one with a comma. For example, container.googleapis.com,cloudbuild.googleapis.com}\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Compute service account storage object creator and viewer permissions!!!**\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for {TODO; e.g., Vertex AI Pipelines}\n", - "\n", - "Run the following commands to grant your service account access to {TODO; i.e., read and write pipeline artifacts} in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "import matplotlib.pyplot as plt\n", - "from google.cloud import aiplatform_v1\n", - "from kfp.v2 import compiler" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BiVlyW5OUnjK" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bViYfWfpVAiF" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "20S9En09X0PY" - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset\",\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print(\"Resource name:\", dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "A-QQkeUnq8Xt" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple regression model using the created dataset using `Age` as the target column. \n", - "\n", - "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Bxn6ATUXrET6" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ede4687dfd89" - }, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if (\n", - " TRAINING_JOB_DISPLAY_NAME == \"\"\n", - " or TRAINING_JOB_DISPLAY_NAME is None\n", - " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", - "):\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "45c61ad01158" - }, - "source": [ - "An AutoML training job is created with the `AutoMLForecastingTrainingJob` class, with the following parameters:\n", - "\n", - "- `display_name`: The human readable name for the `TrainingJob` resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", - "- `column_transformations`: (Optional): Transformations to apply to the input columns\n", - "- `optimization_objective`: The optimization objective to minimize or maximize.\n", - " - `minimize-rmse`\n", - " - `minimize-mae`\n", - " - `minimize-rmsle`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3l691PEMZFdA" - }, - "outputs": [], - "source": [ - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"regression\",\n", - " column_transformations=[\n", - " {\"categorical\": {\"column_name\": \"Type\"}},\n", - " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", - " {\"categorical\": {\"column_name\": \"Gender\"}},\n", - " {\"categorical\": {\"column_name\": \"Color1\"}},\n", - " {\"categorical\": {\"column_name\": \"Color2\"}},\n", - " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", - " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", - " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", - " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", - " {\"categorical\": {\"column_name\": \"Health\"}},\n", - " {\"numeric\": {\"column_name\": \"Fee\"}},\n", - " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", - " {\"categorical\": {\"column_name\": \"Adopted\"}},\n", - " ],\n", - " optimization_objective=\"minimize-rmse\",\n", - ")\n", - "\n", - "print(train_job)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "391c51c98647" - }, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "454f077b984e" - }, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "40153831d5be" - }, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if (\n", - " MODEL_DISPLAY_NAME == \"\"\n", - " or MODEL_DISPLAY_NAME is None\n", - " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", - "):\n", - " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "587f38260598" - }, - "source": [ - "Next, you start the training job by invoking the method `run`, with the following parameters:\n", - "\n", - "- `dataset`: The `Dataset` resource to train the model.\n", - "- `target_column`: The name of the column values of which the Model is to predict.\n", - "- `training_fraction_split`: The percentage of the dataset to use for training.\n", - "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", - "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", - "- `model_display_name`: The human readable name for the trained model.\n", - "- `disable_early_stopping`: If true, the entire budget is used.\n", - "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5caae7fc10d9" - }, - "source": [ - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "IIfvPCGYyFCT" - }, - "outputs": [], - "source": [ - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=\"Age\",\n", - " training_fraction_split=0.8,\n", - " validation_fraction_split=0.1,\n", - " test_fraction_split=0.1,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " disable_early_stopping=False,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rYirKB_9yaa0" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "KkgCdQQAyZP1" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3f4d0c17150d" - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "581a188f0453" - }, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5adde0951eb5" - }, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ktMsqtibAUzz" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = \"jsonl\",\n", - " batch_predict_machine_type: str = \"n1-standard-4\",\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_machine_type: str = \"n1-standard-4\",\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_disk_size_gb: int = 50,\n", - " dataflow_service_account: str = \"\",\n", - " dataflow_subnetwork: str = \"\",\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = \"\",\n", - "):\n", - "\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " EvaluationDataSamplerOp, GetVertexModelOp,\n", - " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", - " ModelImportEvaluationOp)\n", - "\n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size,\n", - " )\n", - "\n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs[\"model\"],\n", - " job_display_name=\"model-registry-batch-predict-evaluation\",\n", - " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationRegressionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " predictions_format=batch_predict_predictions_format,\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format=\"jsonl\",\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " problem_type=prediction_type,\n", - " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", - " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", - " model=get_model_task.outputs[\"model\"],\n", - " dataset_type=batch_predict_instances_format,\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "a712dfa762ee" - }, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_regression_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NOvOMTEgCVcW" - }, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_regression_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bdd9e2fd6841" - }, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "8f17c5c7b3e3" - }, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "1aa7d7bbb1c9" - }, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if (\n", - " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", - " or PIPELINE_DISPLAY_NAME == \"\"\n", - " or PIPELINE_DISPLAY_NAME is None\n", - "):\n", - " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "90f424d5dca0" - }, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", - "- `prediction_type`: Type of the prediction (In this tutorial, it is \"regression\").\n", - "- `target_column_name`: Name of the column to be used as the target for regression.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "JeSiA6-TSgV8" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", - "parameters = {\n", - " \"project\": PROJECT_ID,\n", - " \"location\": REGION,\n", - " \"root_dir\": PIPELINE_ROOT,\n", - " \"model_name\": model.resource_name,\n", - " \"prediction_type\": \"regression\",\n", - " \"target_column_name\": \"Age\",\n", - " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", - " \"batch_predict_instances_format\": \"csv\",\n", - " \"batch_predict_explanation_data_sample_size\": 3000,\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e8dce0638349" - }, - "source": [ - "Run the pipeline using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pdHib_yUEuEk" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_regression_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "625960707c60" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U2zocUvk2YVs" - }, - "source": [ - "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "mtHA8rhGGQv3" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if (\n", - " (\"model-evaluation\" in task.task_name)\n", - " and (\"model-evaluation-import\" not in task.task_name)\n", - " and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " )\n", - " ):\n", - " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", - " 0\n", - " ] # ['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e69f183f902b" - }, - "source": [ - "### Visualize the metrics\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "a912129939ab" - }, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " if (\n", - " i[0] == \"meanAbsolutePercentageError\"\n", - " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", - " continue\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(10, 5))\n", - "plt.bar(x=metrics, height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8737e6e31129" - }, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "b09056628b26" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if (task.task_name == \"feature-attribution\") and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " ):\n", - " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print(feat_attrs)\n", - "print(feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "4d9d6a82d826" - }, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "c26a2091f4fc" - }, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print(attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "77151be8d776" - }, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "069bf017e0de" - }, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - "\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "automl_tabular_regression_model_evaluation.ipynb", - "toc_visible": true - }, - "environment": { - "kernel": "python3", - "name": "common-cpu.m90", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/base-cpu:m90" - }, - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.12" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI regression model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML` \n", + "- Vertex AI `Pipeline`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Import a pre-trained `AutoML model resource` into the pipeline\n", + "- Run a `batch prediction` job\n", + "- Evaulate the AutoML model using the `regression evluation component`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook. You can skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com). {TODO: Update the APIs needed for your tutorial. Edit the API names, and update the link to append the API IDs, separating each one with a comma. For example, container.googleapis.com,cloudbuild.googleapis.com}\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Compute service account storage object creator and viewer permissions!!!**\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for {TODO; e.g., Vertex AI Pipelines}\n", + "\n", + "Run the following commands to grant your service account access to {TODO; i.e., read and write pipeline artifacts} in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BiVlyW5OUnjK" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bViYfWfpVAiF" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "20S9En09X0PY" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset\",\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A-QQkeUnq8Xt" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple regression model using the created dataset using `Age` as the target column. \n", + "\n", + "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Bxn6ATUXrET6" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ede4687dfd89" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "45c61ad01158" + }, + "source": [ + "An AutoML training job is created with the `AutoMLForecastingTrainingJob` class, with the following parameters:\n", + "\n", + "- `display_name`: The human readable name for the `TrainingJob` resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", + "- `column_transformations`: (Optional): Transformations to apply to the input columns\n", + "- `optimization_objective`: The optimization objective to minimize or maximize.\n", + " - `minimize-rmse`\n", + " - `minimize-mae`\n", + " - `minimize-rmsle`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3l691PEMZFdA" + }, + "outputs": [], + "source": [ + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"regression\",\n", + " column_transformations=[\n", + " {\"categorical\": {\"column_name\": \"Type\"}},\n", + " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", + " {\"categorical\": {\"column_name\": \"Gender\"}},\n", + " {\"categorical\": {\"column_name\": \"Color1\"}},\n", + " {\"categorical\": {\"column_name\": \"Color2\"}},\n", + " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", + " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", + " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", + " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", + " {\"categorical\": {\"column_name\": \"Health\"}},\n", + " {\"numeric\": {\"column_name\": \"Fee\"}},\n", + " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", + " {\"categorical\": {\"column_name\": \"Adopted\"}},\n", + " ],\n", + " optimization_objective=\"minimize-rmse\",\n", + ")\n", + "\n", + "print(train_job)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "391c51c98647" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "454f077b984e" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "40153831d5be" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "587f38260598" + }, + "source": [ + "Next, you start the training job by invoking the method `run`, with the following parameters:\n", + "\n", + "- `dataset`: The `Dataset` resource to train the model.\n", + "- `target_column`: The name of the column values of which the Model is to predict.\n", + "- `training_fraction_split`: The percentage of the dataset to use for training.\n", + "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", + "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", + "- `model_display_name`: The human readable name for the trained model.\n", + "- `disable_early_stopping`: If true, the entire budget is used.\n", + "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5caae7fc10d9" + }, + "source": [ + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IIfvPCGYyFCT" + }, + "outputs": [], + "source": [ + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=\"Age\",\n", + " training_fraction_split=0.8,\n", + " validation_fraction_split=0.1,\n", + " test_fraction_split=0.1,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " disable_early_stopping=False,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rYirKB_9yaa0" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3f4d0c17150d" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "581a188f0453" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5adde0951eb5" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ktMsqtibAUzz" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_machine_type: str = \"n1-standard-4\",\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_disk_size_gb: int = 50,\n", + " dataflow_service_account: str = \"\",\n", + " dataflow_subnetwork: str = \"\",\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = \"\",\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=\"jsonl\",\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " dataflow_machine_type=dataflow_machine_type,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_disk_size=dataflow_disk_size_gb,\n", + " dataflow_service_account=dataflow_service_account,\n", + " dataflow_subnetwork=dataflow_subnetwork,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " problem_type=prediction_type,\n", + " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a712dfa762ee" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_regression_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NOvOMTEgCVcW" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bdd9e2fd6841" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8f17c5c7b3e3" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1aa7d7bbb1c9" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "90f424d5dca0" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", + "- `prediction_type`: Type of the prediction (In this tutorial, it is \"regression\").\n", + "- `target_column_name`: Name of the column to be used as the target for regression.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"prediction_type\": \"regression\",\n", + " \"target_column_name\": \"Age\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e8dce0638349" + }, + "source": [ + "Run the pipeline using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pdHib_yUEuEk" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_regression_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "625960707c60" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U2zocUvk2YVs" + }, + "source": [ + "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mtHA8rhGGQv3" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", + " 0\n", + " ] # ['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e69f183f902b" + }, + "source": [ + "### Visualize the metrics\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "a912129939ab" + }, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " if (\n", + " i[0] == \"meanAbsolutePercentageError\"\n", + " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", + " continue\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(10, 5))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8737e6e31129" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b09056628b26" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4d9d6a82d826" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c26a2091f4fc" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "77151be8d776" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "069bf017e0de" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "automl_tabular_regression_model_evaluation.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 } From 2f8b659ae5617a5b9b473f10335d342ca5c14888 Mon Sep 17 00:00:00 2001 From: sudarshan-SpringML Date: Mon, 5 Sep 2022 12:36:52 +0000 Subject: [PATCH 19/34] addressed soheila's comments --- ..._tabular_regression_model_evaluation.ipynb | 2909 ++++++++--------- ...tabular_regression_evaluation_pipeline.png | Bin 0 -> 37994 bytes 2 files changed, 1440 insertions(+), 1469 deletions(-) create mode 100644 notebooks/community/model_evaluation/images/automl_tabular_regression_evaluation_pipeline.png diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index 1166c7283..d314dba71 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -1,1473 +1,1444 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI regression model evaluation componenet to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you learn how to evaluate a pre-trained Vetex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `AutoML` \n", - "- Vertex AI `Pipeline`\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Import a pre-trained `AutoML model resource` into the pipeline\n", - "- Run a `batch prediction` job\n", - "- Evaulate the AutoML model using the `regression evluation component`" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook. You can skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. \n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com). {TODO: Update the APIs needed for your tutorial. Edit the API names, and update the link to append the API IDs, separating each one with a comma. For example, container.googleapis.com,cloudbuild.googleapis.com}\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Compute service account storage object creator and viewer permissions!!!**\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for {TODO; e.g., Vertex AI Pipelines}\n", - "\n", - "Run the following commands to grant your service account access to {TODO; i.e., read and write pipeline artifacts} in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "import matplotlib.pyplot as plt\n", - "from google.cloud import aiplatform_v1\n", - "from kfp.v2 import compiler" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BiVlyW5OUnjK" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bViYfWfpVAiF" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "20S9En09X0PY" - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset\",\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print(\"Resource name:\", dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "A-QQkeUnq8Xt" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple regression model using the created dataset using `Age` as the target column. \n", - "\n", - "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Bxn6ATUXrET6" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ede4687dfd89" - }, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if (\n", - " TRAINING_JOB_DISPLAY_NAME == \"\"\n", - " or TRAINING_JOB_DISPLAY_NAME is None\n", - " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", - "):\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "45c61ad01158" - }, - "source": [ - "An AutoML training job is created with the `AutoMLForecastingTrainingJob` class, with the following parameters:\n", - "\n", - "- `display_name`: The human readable name for the `TrainingJob` resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", - "- `column_transformations`: (Optional): Transformations to apply to the input columns\n", - "- `optimization_objective`: The optimization objective to minimize or maximize.\n", - " - `minimize-rmse`\n", - " - `minimize-mae`\n", - " - `minimize-rmsle`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3l691PEMZFdA" - }, - "outputs": [], - "source": [ - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"regression\",\n", - " column_transformations=[\n", - " {\"categorical\": {\"column_name\": \"Type\"}},\n", - " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", - " {\"categorical\": {\"column_name\": \"Gender\"}},\n", - " {\"categorical\": {\"column_name\": \"Color1\"}},\n", - " {\"categorical\": {\"column_name\": \"Color2\"}},\n", - " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", - " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", - " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", - " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", - " {\"categorical\": {\"column_name\": \"Health\"}},\n", - " {\"numeric\": {\"column_name\": \"Fee\"}},\n", - " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", - " {\"categorical\": {\"column_name\": \"Adopted\"}},\n", - " ],\n", - " optimization_objective=\"minimize-rmse\",\n", - ")\n", - "\n", - "print(train_job)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "391c51c98647" - }, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "454f077b984e" - }, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "40153831d5be" - }, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if (\n", - " MODEL_DISPLAY_NAME == \"\"\n", - " or MODEL_DISPLAY_NAME is None\n", - " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", - "):\n", - " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "587f38260598" - }, - "source": [ - "Next, you start the training job by invoking the method `run`, with the following parameters:\n", - "\n", - "- `dataset`: The `Dataset` resource to train the model.\n", - "- `target_column`: The name of the column values of which the Model is to predict.\n", - "- `training_fraction_split`: The percentage of the dataset to use for training.\n", - "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", - "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", - "- `model_display_name`: The human readable name for the trained model.\n", - "- `disable_early_stopping`: If true, the entire budget is used.\n", - "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5caae7fc10d9" - }, - "source": [ - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "IIfvPCGYyFCT" - }, - "outputs": [], - "source": [ - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=\"Age\",\n", - " training_fraction_split=0.8,\n", - " validation_fraction_split=0.1,\n", - " test_fraction_split=0.1,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " disable_early_stopping=False,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rYirKB_9yaa0" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "KkgCdQQAyZP1" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3f4d0c17150d" - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "581a188f0453" - }, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5adde0951eb5" - }, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ktMsqtibAUzz" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = \"jsonl\",\n", - " batch_predict_machine_type: str = \"n1-standard-4\",\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_machine_type: str = \"n1-standard-4\",\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_disk_size_gb: int = 50,\n", - " dataflow_service_account: str = \"\",\n", - " dataflow_subnetwork: str = \"\",\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = \"\",\n", - "):\n", - "\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " EvaluationDataSamplerOp, GetVertexModelOp,\n", - " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", - " ModelImportEvaluationOp)\n", - "\n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size,\n", - " )\n", - "\n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs[\"model\"],\n", - " job_display_name=\"model-registry-batch-predict-evaluation\",\n", - " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationRegressionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " predictions_format=batch_predict_predictions_format,\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format=\"jsonl\",\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " dataflow_machine_type=dataflow_machine_type,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_disk_size=dataflow_disk_size_gb,\n", - " dataflow_service_account=dataflow_service_account,\n", - " dataflow_subnetwork=dataflow_subnetwork,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " problem_type=prediction_type,\n", - " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", - " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", - " model=get_model_task.outputs[\"model\"],\n", - " dataset_type=batch_predict_instances_format,\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "a712dfa762ee" - }, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_regression_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NOvOMTEgCVcW" - }, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_regression_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bdd9e2fd6841" - }, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "8f17c5c7b3e3" - }, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "1aa7d7bbb1c9" - }, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if (\n", - " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", - " or PIPELINE_DISPLAY_NAME == \"\"\n", - " or PIPELINE_DISPLAY_NAME is None\n", - "):\n", - " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "90f424d5dca0" - }, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", - "- `prediction_type`: Type of the prediction (In this tutorial, it is \"regression\").\n", - "- `target_column_name`: Name of the column to be used as the target for regression.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "JeSiA6-TSgV8" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", - "parameters = {\n", - " \"project\": PROJECT_ID,\n", - " \"location\": REGION,\n", - " \"root_dir\": PIPELINE_ROOT,\n", - " \"model_name\": model.resource_name,\n", - " \"prediction_type\": \"regression\",\n", - " \"target_column_name\": \"Age\",\n", - " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", - " \"batch_predict_instances_format\": \"csv\",\n", - " \"batch_predict_explanation_data_sample_size\": 3000,\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e8dce0638349" - }, - "source": [ - "Run the pipeline using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pdHib_yUEuEk" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_regression_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "625960707c60" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U2zocUvk2YVs" - }, - "source": [ - "![Screen Shot 2022-08-26 at 10.50.03 AM.png]()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "mtHA8rhGGQv3" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if (\n", - " (\"model-evaluation\" in task.task_name)\n", - " and (\"model-evaluation-import\" not in task.task_name)\n", - " and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " )\n", - " ):\n", - " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", - " 0\n", - " ] # ['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e69f183f902b" - }, - "source": [ - "### Visualize the metrics\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "a912129939ab" - }, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " if (\n", - " i[0] == \"meanAbsolutePercentageError\"\n", - " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", - " continue\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(10, 5))\n", - "plt.bar(x=metrics, height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8737e6e31129" - }, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "b09056628b26" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if (task.task_name == \"feature-attribution\") and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " ):\n", - " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print(feat_attrs)\n", - "print(feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "4d9d6a82d826" - }, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "c26a2091f4fc" - }, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print(attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "77151be8d776" - }, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "069bf017e0de" - }, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - "\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Regression model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI regression model evaluation component to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a Vertex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML`\n", + "- Vertex AI `TabularDataset` (AutoML)\n", + "- Vertex AI `AutoMLTabularTrainingJob`\n", + "- Vertex AI `BatchPrediction`\n", + "- Vertex AI `Pipeline`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI Dataset\n", + "- Configure a `AutoMLTabularTrainingJob`\n", + "- Run the `AutoMLTabularTrainingJob` which returns a model\n", + "- Import a pre-trained `AutoML model resource` into the pipeline\n", + "- Run a `batch prediction` job\n", + "- Evaulate the AutoML model using the `regression evaluation component`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook. You can skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. \n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY", + "tags": [] + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). \n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "from kfp.v2 import compiler\n", + "from google.cloud import aiplatform_v1\n", + "import matplotlib.pyplot as plt\n", + "import json" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BiVlyW5OUnjK" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bViYfWfpVAiF" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "20S9En09X0PY", + "tags": [] + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset\",\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print (\"Resource name:\",dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A-QQkeUnq8Xt" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple regression model using the created dataset using `Age` as the target column. \n", + "\n", + "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Bxn6ATUXrET6" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if TRAINING_JOB_DISPLAY_NAME == \"\" or \\\n", + " TRAINING_JOB_DISPLAY_NAME is None or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\":\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl_\"+UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An AutoML training job is created with the `AutoMLTabularTrainingJob` class, with the following parameters:\n", + "\n", + "- `display_name`: The human readable name for the `TrainingJob` resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", + "- `column_transformations`: Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using \".\" as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. If none of column_transformations or column_specs is passed, the local credentials being used will try setting column_transformations to \"auto\". To do this, the local credentials require read access to the GCS or BigQuery training data source.\n", + "- `optimization_objective`: The optimization objective to minimize or maximize.\n", + " - `minimize-rmse`\n", + " - `minimize-mae`\n", + " - `minimize-rmsle`\n", + "\n", + "To learn more about `AutoMLTabularTrainingJob` click [here](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { "colab": { - "collapsed_sections": [], - "name": "automl_tabular_regression_model_evaluation.ipynb", - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - } + "base_uri": "https://localhost:8080/" + }, + "id": "3l691PEMZFdA", + "outputId": "f0757841-1901-408f-a587-28b6f3a4181d" + }, + "outputs": [], + "source": [ + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"regression\",\n", + " column_transformations=[\n", + " {\"categorical\": {\"column_name\": \"Type\"}},\n", + " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", + " {\"categorical\": {\"column_name\": \"Gender\"}},\n", + " {\"categorical\": {\"column_name\": \"Color1\"}},\n", + " {\"categorical\": {\"column_name\": \"Color2\"}},\n", + " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", + " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", + " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", + " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", + " {\"categorical\": {\"column_name\": \"Health\"}},\n", + " {\"numeric\": {\"column_name\": \"Fee\"}},\n", + " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", + " {\"categorical\": {\"column_name\": \"Adopted\"}},\n", + "\n", + " ], \n", + " optimization_objective=\"minimize-rmse\"\n", + ")\n", + "\n", + "print(train_job)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if MODEL_DISPLAY_NAME == \"\" or \\\n", + " MODEL_DISPLAY_NAME is None or MODEL_DISPLAY_NAME == \"[your-model-display-name]\":\n", + " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model_\"+UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, you start the training job by invoking the method `run`, with the following parameters:\n", + "\n", + "- `dataset`: The `Dataset` resource to train the model.\n", + "- `target_column`: The name of the column, values of which the Model is to predict.\n", + "- `training_fraction_split`: The percentage of the dataset to use for training.\n", + "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", + "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", + "- `model_display_name`: The human readable name for the trained model.\n", + "- `disable_early_stopping`: If true, the entire budget is used.\n", + "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IIfvPCGYyFCT", + "tags": [] + }, + "outputs": [], + "source": [ + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=\"Age\",\n", + " training_fraction_split=0.8,\n", + " validation_fraction_split=0.1,\n", + " test_fraction_split=0.1,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " disable_early_stopping=False,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rYirKB_9yaa0" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ktMsqtibAUzz" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(\n", + " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = 'jsonl',\n", + " batch_predict_machine_type: str = 'n1-standard-4',\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = ''):\n", + " \n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " GetVertexModelOp,\n", + " EvaluationDataSamplerOp,\n", + " ModelEvaluationRegressionOp, \n", + " ModelEvaluationFeatureAttributionOp,\n", + " ModelImportEvaluationOp\n", + " )\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " \n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + " \n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size\n", + " )\n", + " \n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs['model'],\n", + " job_display_name='model-registry-batch-predict-evaluation',\n", + " gcs_source_uris= data_sampler_task.outputs['gcs_output_directory'],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", + " predictions_format=batch_predict_predictions_format,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name\n", + " )\n", + " \n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format='jsonl',\n", + " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " problem_type=prediction_type,\n", + " regression_metrics=eval_task.outputs['evaluation_metrics'],\n", + " feature_attributions=feature_attribution_task.outputs['feature_attributions'],\n", + " model=get_model_task.outputs['model'],\n", + " dataset_type=batch_predict_instances_format\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_regression_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "NOvOMTEgCVcW", + "outputId": "e9918c90-562c-4c1a-ce37-ffce17cb0af7" + }, + "outputs": [], + "source": [ + "\n", + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\" or \\\n", + " PIPELINE_DISPLAY_NAME == \"\" or PIPELINE_DISPLAY_NAME is None:\n", + " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", + "- `prediction_type`: Type of the prediction (In this tutorial, it is \"regression\").\n", + "- `target_column_name`: Name of the column to be used as the target for regression.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", + "parameters = {\n", + " 'project':PROJECT_ID,\n", + " 'location':REGION,\n", + " 'root_dir':PIPELINE_ROOT,\n", + " 'model_name':model.resource_name,\n", + " 'prediction_type':\"regression\",\n", + " 'target_column_name':\"Age\",\n", + " 'batch_predict_gcs_source_uris':[DATA_SOURCE],\n", + " 'batch_predict_instances_format':'csv',\n", + " 'batch_predict_explanation_data_sample_size': 3000\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run the pipeline using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pdHib_yUEuEk" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_regression_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U2zocUvk2YVs", + "tags": [] + }, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mtHA8rhGGQv3", + "tags": [] + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if ((\"model-evaluation\" in task.task_name) and\n", + " (\"model-evaluation-import\" not in task.task_name) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", + " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the metrics\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " if i[0]==\"meanAbsolutePercentageError\": #we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", + " continue\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(10,5))\n", + "plt.bar(x=metrics,height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance.\n", + "\n", + "To learn more about Feature Attributions click [here](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if ((task.task_name == \"feature-attribution\" ) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", + " feat_attrs = task.outputs.get('feature_attributions').artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + " \n", + "plt.figure(figsize=(5,3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "community-automl-regression.ipynb", + "provenance": [] + }, + "environment": { + "kernel": "python3", + "name": "common-cpu.m94", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/base-cpu:m94" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" }, - "nbformat": 4, - "nbformat_minor": 0 + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 } diff --git a/notebooks/community/model_evaluation/images/automl_tabular_regression_evaluation_pipeline.png b/notebooks/community/model_evaluation/images/automl_tabular_regression_evaluation_pipeline.png new file mode 100644 index 0000000000000000000000000000000000000000..114dc4e4c07e133f2776a449916d17530a16afe3 GIT binary patch literal 37994 zcmd42byQr+21cJK+*PsdR+PFKxwQ&-h;O_439^4_gLvU$a8#tZs`|W%8 z8E4=7_hAgi8f$fT)he5F)_kA(p(rnfjzWY2007XxNQ)~205B;40IVAl0`!R_9Aggj z0^_7C^%+n(L2?Mad1EdrCkg=6M596s;i31)4$@jq002hc-+!1v`(hISAnx^xxTvbT z-f0G+0Z#wjpuXp4>GwfFLF;ANuwjce@9B+t)-UYA!u@(Rcg(I5HU5kJu=K7r+3!2t zX2d)dUar7uR6;Pt5oQ2%(=u`G||tQ;&`a2B#fX%PIP0mkuINw zUyfod9w82-wx;krBJ|bjzpeiGD%J~^-tQP&6BZU#R#rww$&gn2&8Z~6 z9)|)BV8O=to^5*iH}41kpZqk+7R6;tOY~fCi8i*1hctsF22D*(>plAt^y$^Sk8IQhM!9 zGbSpRBk{GiWjB8~z#EJnYWCEh0m>tzBSnUAh>FP0EhVGW{a?hA2IBMx(JR-`J*fyYqBaM+>2>;ZqDgw`XX2TyROM6 zz&2Xmnm8NiaMBzmFN%(kFnf)eVx5!fRWWm9MJ0{=jKIU|7=z8rqJGZ`0O%Vsfv&ka(9-)aEKRTb_9tOdQTa}&e`@wEo3VI769=B68^Cx zbjKg`N}3yhVB%u~HWABB-;Pjh7&5SD-Uc@IK2P27q%4PzXb=DZkRay3vE4XQCZ<<| zDtou~t*o9|I6!w+Hrge*9x07+XAGZ1lW68w$-&Wu%DkF9$q`?GUpLL-;INH$LqZ7p z0gKA$xh5g0&fus)c?8G(ufJ1sitD0e$bH%pwmVTDq$7Q|diTCUvu0*-JChvrB5@1s zFsrD6uUA*Ry^mJk9fmY^rc-yt$SibbDd;$rMpX33xVm!rG;Ja-S3Bm_^n1)#=&)q9 zST8Ue%Q*CSe0psI(=iFIK})+i<{s)yN$CNSz7cu9(>!#)jT`{K8$j50U_!-;nlLr? z(iSuz5F-&i$-sV#BtKsclyci}687oZlrj##@)iW^JdDYuaTDtREP&l*7fgn33=yVM zo2aF5+%M*Pf0Qv<{i9wE-w$lS-J{Eq4UkznNv+&eP%GrSJFx7G5O`et)#uXLMF-N@j1Hd z;EH8$Cb2@NBscfx*RNrgi&HrRT0k5;-09J(PEnmE+q@hxgwe(X294=*Od(Ydpx9cQ zTV95x<@NlS+d-NA-gF2$ak2h}?d;7HEQ zjB?ob@7bAxUPBE}X6xRbuYtVR`}*O3oII}&W%?XB z9u@wA%rI2ul%Lofl_oaXuE>`hrMBM;Fw~18uTv%Io2Ca_3~h=k(>kR|>gjd1Hqtq7 zU&I|m%*y9*ucv>hnsk?$QPR>X2rBWbY`dke(Fg4?eRY2+3{^|jimzb=o)M}UKdEnIAtG{f9pNPhxDkV(O`dy*|3pJzR&ygB>`7h5;(?wZ6qS1m6_1J%C z!=j^;z{^E$ES{&k_Vb0|d82OS+9zqa?% za=YPYxC~f2ANJ#qxNU=PH+^Ogy5VkuKsXd&V0xcb$kAw(Mph z+&jtZHu9u|h)RM(~o?0_2Q=nZh z_hzpfZeS)|3p_Q(t(VYyS<$qSGfYFKu%PeNaKdf5T|>c&i;Jt8pZUwh07m%o+_BbY zI=7$V67}_748{7EG1L2zSIymgM$m0d9z32dnxyX;>&^v9A;QZ@(sz#{^SXy=-5n_5 zG=#V>`5w!uNEpml>d*wUdFhc)_(lu4+f%!orAt>aV1{gQN=i!3)J=9tD+petueNjR zf+`lrESWUOGD12+6x4jZUoJ?fbdlo=gRWbxD|4fT-LneQ-5O7>2K7P1)~!Wd*R@E$ z`0YDH$tOHDkXIb*9cRt3$c4a9b;9`=q)Vf_pOca@1uVFDD?Gt9N`Lq|4zqJr5Jq3pzQPOsa(ISW}bnwPfh6^b*3V5 zbJ+1vA0|Gr*Ly~!6DAVh#T?~DmdAba8YW<4dM4d<{IaZSHX;SA=&V|=>T$Z9Ktne3u%9lm(r91Aq|-E=Qj}FH5V>x|U6dfmD2KV* z)Bo$2w96J*U&v77^WBERwqOW)T(aQL8)7bHM@MAReDP#K!M*rlny>9P4=4|^cvWz7 zH9vm*xXcK}a8V;>AJr&-i^UkTe%UCg}NHgVjL<^eFLTY_bbN{S+y@jf6$#gBU8ktjr<|2sC?DsrSthHs7}922)jvz!btvDX(u}7RQ>bsEQ9Y> z#U)`CwY8s(?22ZpdDtZS`za=5BZln?#gTF1iry#`Wk`h1>H&8lVhGjd=AsB4uN%6p zccJ?Smy=*CR>t*~RUVW#45F0mtzqFSs+oL(dO^8TpJ!Jzn&r%1Ua=E~>Z@=G+9qJv zgI2_o6Ur8(#~kJoLplI^f8ylhxsWgv1aFb#K(}-y!t)kcA!X(ADc;JqQwn#X3i8JWfB$*lgy9sqPWWE2 zG&R%1j_>(=b}}U4&!LdIU~t#tBk}cP1oz<@)DTG|>B-NYd5N5~y3JR0y-;`bt@d=} z5!l5D%Q77!jJ=CWqg4}oZQG(g?797=GC4C`lu;m%so!shhzftz6G{%XHm^f&25g2f zUUrY+oX3H@&cnMJv#*8gAJ=EnI5s|G{$|kKZ`u5kpFy3Y)uD@OM*HnfBRynq3U@l>XTB*O^|y6#^lars-7>c#YF+!^h%G zV}mx@y7jug?e>Vls%(wGnhXHGI?7$D=USPgcHb&iIo#nWn!VmBoF)y`TQV^C0NWF} z^UlPx4Ap6&VcJUf3~YyygeI2F7ryv-(P)?V2y8v0cqO{MI|K=u;CG#iioQTiwNO| zlTE|fcZJA^?O3Mc5i&2z3to|+s0bfT=SfaKMiN`x95ryheD6DmTCrohg?(RbSuzA^ zK2&J6T&JuRc!hA&G1nfFDzw844F4WnxXmtQt_y7Yv=Rw|`gF>yp<8!>NN&T)K;kv2 z2m<+6CLPI~4>`^~5BoEbDpee}fb5$-GPF3Nq~9O?Q`*e@uLa;L0<$ZiWt2@iugI$0 z-buI+0(zoY1XDgOr5&td7LBlTL!8=VWkg+4K**lZZS~tlG}-dB-OqQ-PKe-9QaFOM zZM10Wd;cK-fF!u>L!Cyi0}BV)_ocUt>~B-&;h=l(O&&&~RqDQTk+IRPW*G#tZ%^UC zO&>%8vbRCh&7x_teoez;?&cVz950Gr<&SJ^E|_^++GqtS!+-L*VbN2i8@)>X4jcG!EaoP9hY@9{@j1Q?|X4`$N6xdXSrOlFil9KkUQC# z%92S3826XDxbhoIi6hU8o^tL%$FaOxSExK^sRhS9b8P;gK@kK)yglM~Q_?T0* z(X2I}1=?kzyVtTPT=(r-Bw+sYAV4DY(tZYH6^20gfo`K6-`2gzeI+MG&w0zsh)HFP`h=^6kAdP{Ai}c)|dLL0STEDI2Q(? z4~U7GqDH2MHm;5B3T;fFb6^D)&O7?_ILYHEGNf?BVr*iz=wO(1{FDQ7y)v`Fa;3^9 zfKRA!HyOw{&OvwODyWd+fkK!$crwywba+EY>howeR(?WK^2qnHc;Wn_MI-Yx*o5kv za3k@`oM3r|h^o!$GH1G zT9v|HOT_K>WJpuev@;s9Bd98P<~o;tp^Wvm}KbBwtu_fBjR%Rqdx(nr_zve z`?J^8stv{ZJl-DCeV(c@7C={D|LW^6o0*+r7^ zezrl!vic5KP~F|dr?s!CG>oU2G6{>ocS&*l5#Vz(pV=2WD8>?b3Hh#r=jDtO`=>CpIS5DKg z-pOY5kLRi5&&!Y4{4V*O9bPkDPD1a%6cZl_UjTPvJR&67W0)HU((OcF-h@r}Q=HqsPYh)zN*`{;SBWzK2ZX&Djz#XxD!*D`^6@<2PT^ z&iVeiF9{6|m2#F|Ws^KCKtybj3*z-|?O?Kp82z>9jqvFe5567Cm&vhs$(J~4Ql`j5 zSqn`9CWsSOhSz|JrH0{8RsRXFQ%Nyx(T6~Up+^7bCxVXF&n9n|x?g+>j%>$@1xvE4 zrF)Z=yr!{sFyfzEp4n?uU|qmg?w_pS!tpWlO_YvPmnBY~N_S*NM%u1bb?B{eZf0(z$?$*D$y1JiCj&`j1OGin91QVsX z=QNtuo{Bk36*4({7GVK^is}hzpJe0%m>{anqIz#b6VP|^q%>5v90ZAqUoQEl+qic z`xPo_sZqiJ6l#Q4w+$DXc!cCb1_y^UQ;2Xi)TbYrbc*$>^l&!1pO64k7ge);XP zV9Y<}9y9BPS!EIxo5uM-gsQ-_-WD+kfpd&<14l{S|C>JJpM>92&rYb3p8zLxbhHJz z0oZg-1V98)3-9ne3Kj67|7!`4Y)grtiV7_gL)^UWG9#Eplo-9*cbApV8RWqfZ_B_Qeu6aTq0b@M`+i3p%keQ^iQ~Q@^ zulXqs8n9jh!ZL=LVT|cbH1Vy+5*2N`4N`=*8L|BWY@XQZMK@OaqYz<;!l2I-@7YX2 zxJ?-@p_roU33R7AJp6^e^2*hDXlN*A=V$y{)EJtEJ^rFjQwO8K4`y%SAoehmhD40m zx$E z^E5~cwbsP1NMoX4Vlq5=!yKzLDvP#^NVt=VzV$LEf(}nfB+^eY#XfJ8WO=4@e14Ge z^JP6!sV$L>=9KI=)j9eJ@|-9OX^vnxJ>&!eAM!@2HU*$5Y~yRX2J_schC!pKalMFT znf}_g-s53d(Xp!yd?hW3Pe3=E0Sy=qZC2e;ZSitg<0#;hn2mRE9!1}8N}b-F1piL&q>dD)^V0K zIVnuG$KJ|k`l|B6l6TvYOaKA`qMqFC$9-%p3~k-|nC5VQCmw3{2%!dGd7uXnT7#eC z57|g;LU;CTA+wxK(>g!Rg;S$V#;7o~JO%s6ln@(NPBX-cq@u1}t`wryi-X$%ys3E$ zSsWtl*btj;EfRdf0SK1o(qoN~Gy;-QW;)m^uH}KioxNL7{i6CEE>eVj&1vu0bGnPZ zIln-I4hw5^-n7qLgB**K0WsKk-3;T==dCLjMs<_CS1}g%oja+4eTQp^?cSw?y}A3+ z?ByuRN#+u1)_WD>j;XzHd%cC8RReQ$n?R_Vvs+|EG=6$`u?O%vZcGa#Jmxz`7b;<{ z3n*pVq*EAU$gdqCYK5OO?G}5_{L*ip8A0f#Km=%}WDWzB-go#F`m<9zoQKoE`4zQ8 z#IQ_p)n+ROd*fi2E) z9Qj}S^*wTQPkIPrPv_}9Rf2n(p?C)vaz4bmMsQv!iC{cv}r(c8uGxqinA~PyVU;vw5@7cSR z+SP|fFPQj47iM4q7VK{cH<4U}TUui71kRDQT`@RZSD;!MA_ZK#ycm3|-Yw@_wX(o6 zTibDDqGLWlciZAE^Q=$0ii!$VBXnFMGBQB&mX?;-Mfv%^?TkxHON-0f(`RO9O$B** zwTu`D2??>&POBiQ;=2i-ocj$=ZBNZyn?e;$Ox{QYg^m&xv3&d}J~LP&$%6Rpg&ivK zy$Oeo@i~#U>=;{U@3xV4gRdZyR?E+Ms7hg)@YP0jvDLW9wz^K?AWI<*0l&E+EpV>W z@>pHF^&4)5W?SMV3X0Wi!E7-b3ml*wZeSQtWx!PLlAf6&u`)Yb>~?n+KUG5+92z|A z3W`a|V9K3sQK30XoatXYx->WuAOeVt`VaZx?Z_LTgoRn`Bvx=bcLSi3UA*l}VR?Ca zVwIMdV4G_}6zR4#d_N;6bc(ejmX3;qR}1AO2b(=ja2{o*Vza{bIcPuLok{4poyBve z&f3fK1G@P|O#C!h@fS%nR85fhQR z8iwyK{y=SY$r?7cf%)#h*#XPnho)nr000=eLYM}$6*V+8WRR|mDM2bzhe6GNO_D~6 zzJHgzLFn2wK@7JIeK_b^T7ZhOet3q=-;bG^95UJT zyAKnDDc8=#;Q>?LQ^ACss7^iQNiH`kYA(3Acz7yi@)+pEoQgK%1=M#^rNfYPD|)wK zPwCZ6oK#N?C3h(;d*)G%s+_2c(-PxQ3>-Wn$($T$CRoGF*&N|-yG81%!y+&QT+mR_ zqhp6hhHV>%e&aCalok}dO|au?{7q9on1Hi<7iAWX9xfZ&d>fxF9?~DY60NB?_PGxl zxmXYuGsEfSsk=9qmcvIgpLD1`w^Yd(X!tnPf(UmK7ME!;URmwUR+#06w7T*!`4D6}A$FeAxq245HF(@)w%#d>b$Ng^{ z=7Q2sDyAEWQz!MX03vwk0>DgK8fR=OYZ536qK^KIjiJ7M*=*~gDc$>lIY;UDN&1&h zRB0wlDpno17HCjuJr)K=;zpf{J!}dl`pZdFO!7KTFDU@$lTKD#Gw*Lty<#%BZY&## zfmz=vbp!inS2C$>j7C#4Pa%0p0qos=MR&wQiFtyYFL56$>Fza*?0XLF51u?IU!mhk zS}~5HfbfP)?v$I-%Ic@|E7n zRJ@sJD)#J`P`1V6x>hr`p50^~bF4YdN9AzSpI=MnHFM@Rb2dcCHie4%QB&no_`5wp z_2eJa%oyKC_C&tOs+7(H0532UaIj%9nHpboJGMFzydu@?o0F`*fv~Hj%%B$O&BwQd zZxzqo3C=pl;MmIWqLchn)HMxCxz}k%mF4Dp>7ZYl_}ep5i{CMI=ghuZKu5Luz``H- z)QLZsKyVBI@JAMhwrqrBOX&q2Yv$tp!m262&u`#AmhN`m-rhT1wy`qeoIslE^J77c0zE$|-A^9F|;pdsck z3vGV70OsJZ=H}tc+C*)+`1hhjOJ-;xe)fJter7o!E^PxIcO8>-sJZpC7jYp(()`nJ zT6mk>>Y8#KDF}T_4Cq@jZ;>7~4IXUn*``1hWw<%hY^{T=$V&R1O>vhL+csI$6=MR` zB$!R&&0#;kkbSD1r1&{hCL98QAGjKqP~k~b$G_udg_xKoE@3yrk^Kd*V*(vqFITAR zw$8Rf1{lEGvj0=0UG!GBEn6$FPzxM)@xdYvrknfkAaw^jX?e8N;3d-{vn6f@!))UG zwcc1*SgIxMsUpGftjYyg+pvChe{D_dXPswMOw_!gbB7TYT2@AXxS}6r*SUFly2RNW z03IlU;Z9l^VpQ|x>xuX=mu+EfZFdQmF*s~CK6x|MAa2&OlFN8=4X`N)br}rOhPH1y z);odxVd0SlyVmoHX0DAkcQhwGjKG?o%l+a?pS^Vj+1QFQ{@UHt=>G=;K0L!qPO*YLEHkdPed>h z{WCL@nGGb&UxOSEfdVc3uzy!UJrZ+}i)eY$_y z)Jr-FM%B;mn$jJL$yqc#?Wb2%1=rKL*iEiZZ~3!?1>!0C`u=ddOK5f=;u~lYAYh`` zRT+Rw4^d^vb->4aLPd!q#voxk$B>97z&N`igU`M zA~+fs^YVK&m8<>Evp&$1gwiRYB0C~SWZ>roLm3Bvh0wW)-@+6;N2sD&Oeif&VI{x~ z_h`>QdhfFkqiO~B+s~g>EqZ+$_IIwX8a=bLXjne$akDu2Pdd80H`lB?Anm?guJ>^hE|Ql55;?OL;R7~(po3(65TN~LfVR7Ul^eyw zsb>6Z|oo%V*^#8++~|l_H&gJft|ZJDp%Erh^-p$E*d3 zs8C48;C=ft(N=VGDh0lV&ZO) zKZ&vUY`lnhiXovRa?HDI@_Q=-_s#-#dILk;Z$hzCZ`ERJ)U&KHB6CWl28pZzCsDV0 z<#T4ml2=So;Smql>W(Fqm4)pAu=I3Hai)N=bPkU6c~fErJmu}r2Z3}6IAmdnxQbF1 z7PS51Qc9uCJX&TazYsp}@RQ)CguJ2V;7@BjmCZ{|wg`)~lrl6ltR)4EizZ0S~CO<*h9Ch>i*NmB;c_jMT$7F zXRo_l_nTiZexy>u31TvhY6R9dhqJs9zU?)2mnISD-E6LSrbIHL;-8e36VZ;+V?;`Z zhi}B8Z=^%gB~byZqN>XX%VD!Af`n!fgNhNsmgs6e0(1JFwOjD0DRzeA2YYQ@!5CYT zs3+k)f|Q-g75v;mcv@oY7s1v-WmVODLzwQ&cXv@aY1H*a2j5$Z#=o=XG7R+3EIqa0 zzXt?Oy**mw)f*uJzF1t>dJsDxNDU4@q{D{31UQES$jS1U}`xR1N|dJL0FljiW0`1mmUHUMP6q}ksOaw$B)MOXyEQuqHwC?CoJ)fx z1y!MlCU|RGDoL!nh-d!1iltDG%O1K5)Hl+=NJ;W{a#u4UYBgxi1^__Qnh$}s!@MLF z%(M9F`Xaj006$^YB+)%~WS(e+WfKAyEk{EbfUjr}d}Ob0kGe_ozYL+V>se^!#$osO z=FLYae}{}OR?7HQL7_@RNXydl0TDXEC;x6Bk=`igZR@o2$xCL18IziL%c>>(5}>u} zZ@nR)*^zKPU6z0k4u*>BDHN2%nWZ$sU4ECBSH(_7<|)t1(3E-YevP2gQbtB&Fz6ls z?^${N8{QuSvMv7Ix2OEHi~VPRwgWoOd%P+Z&mrgocz0tD2gXZo)7$+6iH~CsQ>L$1Rb7!x6R`;y zOj$R-W+w@CU;~{BA>?ZgvlX?F(WHB`;RLnFDLtQUSXEfTykYte7`KviD?9g*zq zGIr!oZ8Mt^Owd^rDL)-Q&EVoTudYiY z2h;fN)#%8G#qqrD(?$=l0R?KBRfh zR?IYZqY?3lWIQo{{c`vj%AAF-yE{<}x^Hmuo%=oa+exkEwKlqlr~9 zwumRfiD#WxTwzgRxpfm66h^(a0*vhT9TzAra&iSLXYD$q`=zia@i8K9>%ORwL@iAa zp{%qQit)^=*hSBqlP^$sQpNH}p2NJoQOCESprFj!Ntj8$M)`K%+G%B+d*o9J6o9CW zB;=_U?zo*n9ne0XU0ybEZbMr(@fK6`WIxG>08xaJWoi5_?q=0-D*IJ-p@F+xD3^8! zahfQMQ{WTL^d$GXBvgCrnww_dERS-L<@*wx$fQ##-W+Dvc_Y8v?D$TO_s*sj${_qX z%#aAqTJ?Cyd?HhYcg2$91(WcB(>dnexdSz1ZyB2fKj`4rYm~vh^rBAW%~{XSyi*W1 zKAGlp-Q62DCwDuU-ewcLi6~q9GY&=K?Du?RLPss)P(J)>p=O2X_>W97iwo@HiG;D@d&Kz+q8o z!0WfK^721zPkU?_csFmJH)>7meFic3pdesrwoo*6mOCua;l)GQ!MAG@ z(Zh3}zB+G!Jf|^BWkv=# z-?zT9wv9fnA8|b0o{C*{Ie7>q0l#69a%|B)ZhX?1_BFnZh1IF+!Y-6y=~VPd;;_`; z-HlI53g^DuQ4n5`(T4({j&G(jn$|uK$_XT%sjZV=9ou)wjGe0-F|Y_$iSK}FU!DWV zOLP`0f9mf<^Q2SL=Zuc8Nib+ssApY-pY%P+EAJcI|BP-5EMrc3X_74f>9;B(LxIvL z6$%j74R}N2hlLjrn>XfCv|wGwU8#Nt0xW7o+(m_8u|UfG!P}BYE948d6YK?Pm$WJ* z^a`gz{$GQmwPzG0hN75Z7Wt#`JVbBNq;i$ zVo>&HH~I`3{;C<5&8oCh*JB}<$^P%9abV8V?I9CexQ@6xPsimEv2*Wm(QL}Tyb{^3 zeIuv4>&ervvK#A_P7Z2Yd6;88BT$Z{JET%a&zR|`t;-)*)9eXdVoKU3e!D{*zgt#r7A5Z?4(mtjFeh+i`Fd zDR6&2o&DkCLbzdf>BthaM4sHvI+Y4mBu5SZIm(&zv=G<)qH@-4P30sx)DhS{^h7??gk+RW5K#&- z%Wwbu*$VC*beuCWp7YAaz)SQ3xkp7MJCgn7=bRt#-98ccE?*Zp=HG2{-+m+9jY>=e zRWCFlM$#DDDHzr~oY%hbJgUE^_TXZacKZD?8x>uKi*L8!;jP!#k>Wj`4FmVb613h9 z&Zoh{&vn}KW^9c0I{otebQfo8=e{2TvP;`)Bth5EVzXYT$lL0>iy5$5r{U)hHOh4( z_dbl$$A4iy6d7@zraEtxoZf#5T*O~a*i-9$zg5A!VF+ZK1F=f|4_7lyS~7q#tWe=hneCSKkY z48C(fNhQR582jtdY6=3z0n5{&-Jou;rjj8Gfj2YWg}yIPns=mZC(~vX%fEvUPwqB= z;4A_0j0OcO_ik~3b1p|hY}bRf84>~~ze1X7nycR*W||6U z)-?iYx(B8ms*1x|+C^G1vYd}noKN!O4ko+sr@zq1{%l08B;!+Ja=MwkfSyV*ju!p_ zT?8g1ax?P1=&QtBHo0-~yn#_dXX2LkFIV8MTZHw(VWTjL$;>zLLBhbB6chl5<2W*QzAP!>=9FkRUzic>!N?&?OO0gamr(rbhq2)jZT=Uv4l%pwFx%4y9 z$Ph-I9Vf?|0%qY(aplc+S66NwH5UqRdC4LvwYi?=TZMSV4pXaXIyG%~B$($eXz5Bu zuF4plW%4UCIjN~+t#=%JRQvvyR&Q8iXxU0>qjZx$5#Jdg{cKLFv}M-{A%tkH&UxIE z_5Hdc$QB^S%b?rO&{>GeiQ4tycXnui*D`!cNYbZw70UFgA@yD4sk!iIVU?3UC>Aou z0Xi;GDMN$pSt&jwhq1YD!1yLs&p*1JTG0K{p(e!U4XMr%vl8SBu?;2r z_Lhn^n)(w}a8#N*8qlM^;HU7a%<1<&j0ErFxw0f@j7{*+sX;5#Sh~LV+CVYHmV804 zeZpx%AVXfzwDx3B55w(w&wX;MZ&#oqxHqhozIX52;GM$EM1U+70bY_=ynFrtm5Q#a zP2dOmu`4S_hPK*Ex-ZW!_hg$JNQhVLh?0ZdZEwbTQYuw|=v8KmDjq&xrLa)Zy=WNO zTeZt`E;x&#q6&-T!3O32B2v+D50D*RWpo5@F`2>PYJ8`&P!&^;N5@$UlTv$lf*lix z%#>=Ojf6mMCe3Bm^ zZS5Ek2u5pa(`7Z*s2iB|>;ALqG?DKG?hG_CULn*Gu?~dagEG=TH{+)a(j)9oMNfw{ z*-*ng$T+6>2x%m_I$~K>7*=PF)7y-7E&MR zW|dnaw`B)#Mse>k^Ivp~S`oi&q(TV#C6=RYjocZbD$CY$OIYNJBLa0RUnnPH{qTJf z^w1LYIODss+OcW}2O7VA{EQE?B;0D~SQeP70(o#_P9`uOXFa|W|~B(@>2cN{7wBcnxCUC*(2&nkDR@OxPW zqlXU~>$ydRU6NP$5JpUom%Zz)!PZt*w~AxKfy~IZZ1|8Z)67w{H{_xm1aX7E&P7Hc z%6Hsc(MJoEPsC}KDTwy#p0aD496chL^xp2v5GZsM$%KCOu&{yMH3~WxS#ah=kBv={ z>+p0g;&Jhv;?ccl(dkT;JYid0oZ$icp40tB*|ZzXI?&EYgnob8oE@2EX%{s$-E|)F zf0_~*T@FULc9Oepu^i>{JS*U~ysz%ztOk)k zLdxvv=-I;_ufL6d-K1M5h$z`Atu{uC%A1YfJc$*E8Ms}90-wvl4NGb9&CNn-Wh)bF zqc4JDk^Vi&FH{!dAyRkx&a7@g^c-7WwML^^oY*aoG54QxnONQUuzgqVK2Lot^+q>* z4_L1)2OG=1eyF$lzOQMC%=&bd-bO`H@fmgDzK=JUH=H`ssIR)RVhAmJ3q~^>b+qgh z0YCX@Mjn5@s?V^XzUGeWG^yeE_0V$1{JqRA!}S;Fl64e9c7}aj6|&%N}Yt<}Rs!rWfz}9%L=TEPQkIki@JbK~+%j`x6vS(J!6IO3095P-}hFYge7v`qz(B`a~`?HN9OPo-@eY5zJS-F7a9>y&p*+=={<#r+o_wH#iMC*0Dpr0Se zQfTZyZOb-2;Sd2(*@Pd!414S zB6ykjs((~zCb)kVflWI|%e$P!@xY%Q7?t4b<2mxHIPG`CR=ZAR`v<%U1AlYNy)T?8 zi6{}I&}5d5mty0vJNvG1I4bHp@aQ|M$Db0Vep^NJ)>ZeysRD-->gh>4OWe*yO zJ9;qBklj!B#U}mMO`x5JLN_k=+9sQ+yw;=8R+PaLE1&L;1N7bEjnDt}r;S_}p14n8 z%J|Om#k!J3=oO7`H^|s>hE0mrzK81L$&!Q1T6~>+&MY;#6ddApUT~?RyhgyVQJ?>Y z{p%cr{Gd56Iy;V9E}MacR{>=ndJJhcx(kvcUc~`q-<7SA6WStploc$nmOTBWyW9W6 zIrh=U_f+(?v*QrLp-_0GN_L?lkR|4{^qRr6@jMbf`Oa~**#q_YWnWlTNlSD=mn%3* zcuzW^sYu#zlEue31cOW@N!KCK#{6RUI*~3K%s^FLl*t-PEzhGXAj}-Eg|kLl_nNGv zY~co;Prx8)!^FkG`NBUEyMvLdA}^pmQ(i_|)qJnZ=l)u#a5;47CDN{)N$wF(cQ9p7HT?h(wFca9&Yt`?2A84=O9$?_0jjkEn25BE~ue<)$-eWWl{1> zf5JjC2WR0wG?^jA> zohEQTXF}AvAkcpJ&*#e!YpB9c&!k(yovjZ9XNx{RDO+l#2-tG0`sRO;l`i%Qu@y9?=&C9mVJM`JEhDTMq^*jU9>j2b~9Rp+dR_g#28zt6Z#3h&b`zn zsH8SNiK^)h7{Ww8xmm4hueu(S#TwgtTR1bZ9z&>y$=#7&$<{ob6?~aG%A5^7EcaCp z;qEZ$3eDP}Z+2qU@c=gp1#SJG3G2zOejWe1N6#tPFN_-7^*tgipPdT9)nN{;%RxpZ zKeO?{vYGCn)#GY(w0WdLto^uOd)2Hvxb*=&Y29x{P_wyE&*w*wPF2^>2|?ppciq3J z_ZvJcJNwPsx9DE*>FV;_8g-VI$m>*D{?YtYK+^zhuwBIFNa$m z@M~(-d}BIo8GLJbz2ErZwjGZ}%S;oF{EPsAS$H!=FD_|}hoj%n#`$X#<~hG6)BgOM z<%Xt_<~_XWbN)LKJ3BJm;($NH`}zf7-^cOl`5s%MRZY7PR8!_RUH6at9#n&{U+7)fCTT*?!e$nY1S!~p6{FLwk?nx$)4@B*zyorH&%ssDW zXc>ZbL<6j7sR$5EYdZsdz9r@|dH?)!C%krk@^4#n=6TYqE7*ku)lLvs1iwn40?e;^H)c6~aV{)7-Or4EC+R(Hs+} zgPEM1OyVrxs86wLJ^pa zn*GD1e#DMrWMKyF_n{rHLA;BFITdhcYpI25FOOVQGdOm01dG1VCI?Yd1q91 zUf5aYcR_mVfQZHKw1q@p%)uwG6TFe?<=T{kt5uh^DpI}+5uVbHPH1(5Q=-aLF5tdN z>4uFQ#M2_7JM>n`4C6pgRvzfNhuRGELb&+8Tty*LPczkJSQE6V|uk+7mR;$ zN|;}|wnoNwr`#-^brhGt8T*3(@;=MR&hb;V6Rh~=tB_jedC?{0w* zIPQu^1T>MNkMM`9`0Y3G+f^Fsz{}pSX>19Fc2df)90!`~eOGG?b}oVo=kE^*%sEm| zKWgH$vKn?yr}l6eR#{(A`T4nGK_c6ct}lLM$CQN1IkbDIkW+>sCGiRR`XPGa&1K#W zi*afzj}^>Af|;PvuyuVFF0_UxFpTe60+0WoIgI#MI|?JKw0 z-WwX7aoTy-7t05~5!p6qUKo(ymor3a5VP4S%J##z!aET%h4C~E1P8K^rQBxUQFOFS zbm06b2uCV|zy|NXlq0~>kFyjyS}@k&D>%l#Tqu`X7Drg#Fa>SlWUG25gHYdG@Pwc9DBzyP6{fzuC#MdzVauK(zzl9zBHnI;QLPQGx=D2@hs>|5@^-B@rl@MxCxNN! z6iqkzRBO#)zLGUU-Kp+=NiRq5oyulcvJypSl00t!YjH{dTXIL+?V2A;+3bEA;br_} zz6P@^7DpJt3-qqOV;3Hu=;(8chfz_syQ6&bt^*{{SD$+h{puPLIdrg#=}6Bb@}Xmg zHkDV&*4>=>xZQ4@_K-o#aL6Pb>0-BQzry>=rTZZsg`(bXz!9-)g7iGXGrcT4)6=we>qR=Ebbfl?*xIi6S+wPGDGx4# zZo2H{B^*jKe$Oda&Wj>X{UYPeZ?Rjir_LMb`|ef>bo=dmHr6p7p1I7u(;<`I?4*#_ zV>o}-0lrZ5S>7^bNesylc6^7wV^Vs}$-*OTwG|r?@$SRp{2vGO=42ER(XN+oECeHZ zUb~d3udY6(W?|I`Oq^3Vn zrBZbpX(6#YEYUk_9Xb&SoEcZb{}Rb!2+L12ckS(ZqX`LnSGaqI5-%z22s6G++uD~c z6m_OJR@CK~EeS)Q;@KC|YO+Yc?}MgNfBj&ipSQ=;{zPi`AhwAYvF1k55eFnqDsuSZ z5++JOQG>uD>nl}N)oLcb)%MZ|_(jjIUk9CowJEr(vn6w@7yNQv@8}D!u!e)%YfFlm zaYgs-rjkvK!=HBZp3c_p$ktfywcXRhfvP-fibTbm;l^-zO$*sDhl}Y=gNgIiaSQz% zg(p-xG9el=rAmGe$dYuX7M~SQge^Fy2YArCZ@2im=hRF>$@kppU)&G*$jOC8C@3l_ zikVFep*X0Fp&KdMrXAXMy$9b##8rzA@_Y{Rq_}{7`)k(=x4UT3LAFwEI^Fxr#pYeK zn*)!|s(^9Jh{nCIC$e9tJ$@59fvWHH^$2@Tatq-bCs+Rz%gabW^!Pen(K17nT=K<)&;0Mu^o7a1IFAYMwlH}7`w`+<-s#mK*R>=bK^(jm#xyH^GeAhhdbc*u))pYaR>uDvQ1!h+tlOs=$r>IEx*QJ|^LP6G1ay#>A+WX`ivPI3@xAZu^Xm9vWFgB|(ce zP8~eZ@c6yy)0<)v3{s1u#!2ZuC`r%^&YNPgS|53a#VZAIEXKDJ(tX5uB-G>cK?7Jq zY+Mwbtx@6~m84jV5i9l=kSn?kvz0?aFLogtF-eP0sdTN{e3lT7Hh&FT!ktJw$`HD) zFjTGcVx?be6Iug~gKXH1&~DeLN6$gv%{P8qw-fxCs{UUSh)kB~ zJaj3-H;fUf%^j52E?^=fJYE;cgC~IjJ#QVBqtT!pBfEQ!5gr3sS6p(hP8lwdU}|2=L*)e;g&1EQVIK9B?Rk}OMFU=SRrKs3MPw)5B! z7$rr{p-Dc+jK_Q4lD>EBEjyV(5LDbVZIw!hhQ;$A%N=O!?ot$*JM?vuKz_6a2zSr6 z&hI5oaG$^?ygqi&0o0O;h-eyGaLyZw>YCi${(^J@FGym@`~+xd@MzwOa**?F zrRdHgGl3(wQ;%R;*osG5P11OWCv&;JlqtFzBigtz0TvD)#Q|0io!s(BkR&EODVnmj z@$lK2Ie;*x_w$zYEA>A@!@!G(&N1V((p2SFc?$Ss04y!~fQFg7gNo#>qy>uL;^8L> z%#L4|X6i>)l9K50$1nGO9r4|n-?U~+qT4W+?2*dcJJ$5HP?2!`M*_IScZ1rnYy!7C z>>L9!zy0|ucQor5dY|1ATJxcUhIJGH5Ze3@&a@_Pd|gJl_8|x=1VFMryJL18v&x=c%5Y`xXx9guyO6*)EW%R=2j3Z*k8;i5PmidlHUG5AU?4<%yyMW+crM z^S<=HY{Bdq+hvXBM?>Mhfv6oV4;7q8@cg||w|C_n>j3TOEU>7Q2QvVdehn93LNwb+ zIyi9zaXir|m)Jk)><57`QTT5l#C~;^(dXPaqd-jF&{9D6b_VT;iG)g`R$t6W^^G=_ zb1LSw{G{j53!a|t|9Qi;{=d%^cPPVL2o8c4Mb3lERuVNke034UPY;P^%`ScdXC3{O zb9xGktK|TvPi+y5JXIy=+X`IPE+Ep@}h&3-+A;q!TJ zngU72aVLlE6JocI378vQfIPN&aoZgUL%q%nSYO*_54``&`k@^J4Yc(7kpd6W>9B;r z!~UgckG|; zYLW3Ha7a5%+4u6uj~_kN0X|k+oBt4ER9hZ>zC|vj;F7)TYhy$Ct0^jP6c6voqem}b z($X#TWgJ$+LtQ`9!H85O8;CP*aBy&EAF7q?SW`H-Lp@&+zU%8N_dr%o?wxeF*A58< z*uix}^Z7cB`G`p8(j?qaT`vZOU_6-X@IJw~1>s^m`(Lr(^!-e_|G;cZdwN2x4VE_^ zk#Y8FcyuBfUhh+}Of>Y3Rj;r^S=i?iN#McP7u)QxDv(%7&avfPTA-W*^h#Gk-)HJwv8vijjyzuQyW%^p_slW=|iLU{-3`Phh7vy)&ac|2}P;7Ha z2@rfNdZH2hv$zVD=&7mD+3wjq0)kL@mmnj#H|YXgbNSKL%>HU*hLy5%ew+g8?s zVp*iZ>T(_)oqW8AQ<0cO>tmkq;Yx>{6-d!J(_)f*QdNg0xuolL6}(rWGN%YC1OXE@ zFh1vD90n2fPft&4y?g(PFemr{bee*kLO~dFg%MN|3^Og)rnSLl=-a0@4As@r=rCBi z622eK`OOWyh(`4IIw|-iZOeq903#RJi-WDzu{YBg(zf^HKs*kJ`daoNXh@4sVy8-{ zDy9xOhqrtF^^`a>G%|3$;DA2olXCkV;wB^of=^2qH$ci@@2+Qzjnf`!SP43GG{D=*aYBC}KFd-6=B2OSx)qd=+R4 zHu3=0yn8O9uMo$K+V;g7DhF|pbWa<$DpkKY$nk--Z~}PLXcMvgM&C|RKz({LA%_3( z(c3E=j0^z@Rfu1`^Khrtu-R>L)t`1>5LN1ohTX&DchfmH8gFkNJrO`~n%g4*2rJh; zLtr9ksk7*X2p?aAz*Zf;YK#fjW*QN~Z|$juUTZ6#!?@&J4TF~mJawZ2`p)m>J=L$Vsh?0Cy{O<5l;DM4M`@qEra9f?nvkW~WR zs2oQQH{lLvI-$67(88wvN{6QRYwv9pEgsAj;P^rlp^WLZi}Q^pV7LA|t-vYSLj_?~ zF>QqUnf{p@-ok2>AAW=Mqr5>%jAfd4q6$F zF?H;X<0rz)+}>^L zCp!aIpzCvz-~qwtKT(I7%%SFQ`$G&vUrLxU(z2bGlTmnARd?_Q``L?YNA)Ywe&bU= ziAJh!8mJtM83U$(+!z0z+I?Eg&dy$17Q|C4t)kep%TA(DLhq3R4JaC2k?SUX1uZiv ztNO}pgKh6>M-EWe+sTdq+4FtD&@T&Yl{wlC86D&Fu;*YK%>*f{ASfEIbQ({!<{&Mi zKN0wRacjQ^N@;K{&+IZsC1yj0fI(Kujz^;C>sI#3tEG3P%qXZ}{@M7q`NXD7FuOEf zxEH&Cw>D?Pvy~G912llY)da@avom*7uS^Uzw9>U0gJ$)^r;D9DR=u#J)X+v6kovu` z7_+EoheroSoQbo~@TXBu*Sl@MwI0sJ!QVs3K|v~}WfvX*j(nXb6Fz_ZRM@KWYExy{ z8?p?-)#n8!tLOn3XBy250@x>2b8oxyRPZm+ym*ZL%GVF_rtKFlmg0@YF`t2E3_(6; zgZM?ow4@9zy|Jf+3>ynFc`pW1N9uK(4g03W$$p`j-;=|87v!wrU8%`H&X0CdK zL3~gR=}VU4f4Ac)M}@xIXTBq)t~xZD2BY|vu4z1AiNfoWmm4IVY zFN4`}-I#*KkR{6=&E?;M=oGOCV{u4AK2-G~She`b6WBq4BOs z_eiKl~|31^(cl6HQivBtBmTXg8OTtL@6~<%91ZDq|oOc;3uY zNFpDc!fDY|unF%4Vrmg#vnsG=O<=7nF*Eh`x#TArGmyiQw7>89Y}Ijks5~wNCS8ae z$_zyKv9d7dIf_i#tRIP4bW4WN+=)OMONfq-&rV`6%||4_A4Yj`x`oEmgMkyqwL2RM zI+b6)7^;)dUcLkppC9s#ScBapDjMF%%W243T1tw?rz9n50+Y{QrS_~?ic7#DTmJc@WSVorsJi6f( zX}B#+ORE$15G8nDyi(I7W` zw!@)oBCBoA-FVwlH~UWv3d$Ycfx;N%Wchn#JjvWIla`@hIBJmi?AF`O$IyZ8SH&ld zI4vkP!)C>(i0jvanAwa})T#`DtOz}M8;Xj*0M$^B?HA_Z~gUV3pa-m0o5&njx}`6bx3FVsQ^ z$A(04Lcw;b-L1KQu+id>E4QZ?%}Ym1gWZXVPrh7aLcoG>kvFD0?CwM*qCm(51~38Gpe@#)iNY;Ycw}H^xNp0eW0uGq4Xag zE1hC5F4V1C*Ri;P|+^^NhR2fJ(A{uGDDOZYInyAI`Io#LuDT63}4V;jG3=tT|H z!M&y9Ls&A^22}Gq3!i~iac})kJ?jy4;q(4y0(YZ?b1|kPB1o_*e)O#1A)76B51nII zTl9opPo43MTI}_<25!K2H%Is_C8T4)uo6^t7KSk8AaJWaDB$MwCEdLcz_sscJ*mDE zryYI|%?LU3-8~!Em#P~B#Zn=;g*`BCjYh6=kv128_IKb|u{7=r%kfFy9Z@QgO1SoB zh|qgvOo;ol7;mZuM__N-CA24M1c}49b{L_%hRFn&CPk4^S(SkBRFZ;V`KD zOfK2(=^1o(WHdD;s|S@;<*i}x{^M|9c~8UNx8MB4)p%=`W6N(!>d-zm>t{MK?e20F zoHlBBz?a8%`Zfk&J4G^(QqpL^EYlf1p4+U;rE17SY7g~l+k%Mtf)&189av419@6?X z(SPBms%8d-0C%_#c5%GP;~~}Rfk)A(Z?Z3Q_CC#3r}&5R*9`x?Z#99#V>X_b2cx;} z!~%v!I>c+c6}0h|vF>+RgJ2>oViGD+L?svePqZ-z+ze4Q?XvZr^W!xDq(dev9BP6y zy&b^CabKEtQEuFr#kQJSMCf9hcRRmR^y)_j9#5;uw(xX)Bc1!345S=q25>L5VMz8 zU2brpc5CDFX>Em@td%XM(+}a^qeS00|zLh*My#gbQ)b$AW{XNO6ylb62LJ=fv z-nbY5ejd{U`gUeD2}6+$&pP5`Q7cw-162Ty!O}9@XQCvs^0ST?Y^ilHoFI|sJSsmI z2{(x`nAu|~*83J0)5qVdW37$;p%kXUXM(D2&OG7VAyw|rW({85B?V5;7FQ7?VH-py04 zA(4=JL?Z03#b+duN9P{>tDd2@e>B0i%oBKh{}}^KFR~VWyuHNO*;SI;-Rf#ECeHtk zVpTm$ZTmFMg%j@2LMP!s#5Ey7?#%>YIJvsl`{m+i9`|#4?-~~M#ck`38~plSs-T&Z zQyL3|@9ebg5LW)N zvljr=Iw*j~$9o@BqNW3(vTt`i-ZFCg6GU%Lh(_^fW#tL9ORR{INgYUnd{^=1M6mFx zRg3{w=OeUDhj&FLrj;9_P&XB$c!|A-)d-Bs&xQ zWEhwZLKdd40JT*;Og?{hA8J(U=dd=Gu3>2)8eUh$u@*4eR#Q(`D`LMHutVE0TRZOj0AGe5 z7p+J4Pm&k{*MuWOlo>wGNcwyWzmyZ)*zl*#xB7dIBnJeVn zXW*{V=w<-K2NI0|6HgD*WyR3R>qz?73_@R|Wu%K3Wp$IhH7s$|7=y|aobuP63ysNG4?P?buWz9v6 zaks%CR-4@>$A-WCDp9^I;m-~;va%cxZ4^kPLtxx7b`C=u#6stE~G461;8A@#jwAXg{GI93JFp0%R({VjkQEaZqJV zur~?=cC5gPEYkHPt*JHS!|eg4@3!YA3=p01`Ihy_immxwFyBlBpiNZRovA6wr|&Sp z;iKcoS?F|6sQdf$^MY(F=1&JS=r!LAd1_5I)}Po8-|98+mkiGKNhBsGYbYp`u~}PL zO;^mYZ~!MZz)4Wi2tG_P(Q#;|>M;xe+EwP}gdp<=Psqs!kAXG`2>IAspxYxldRQNv zIQ8Dp(D2Z2V`JlcOhq3bHa7N27WMVQ_wPJHs(t+iVyIhhLMkjmHXRHkTWad@o_1d$}kSMxR}rD(ACy{~IDb z=#l#}3z>KdSyz`|KvPn3^7I|iCAPnx_?@wzpRsYagEe)V6Ut!zyCv(!#wLwH5NB`1 zJz6kl2oV9?)Q@*tP=%~vVq(IW=J+0V{qa$9D2qpbS%LZ1^>H0fNKK8d)D%TExO4o~DzMDx9_AF+4=9}tsw!?gNJ`b4jJ1-=@e)^imb@!m{E)e>* z>)7B8xOHDu|NE pM%{4k*b`V^U6keZy7LB_3kch0I>}_VN-m@;MH5n`cg7e#x7_ znB&7p=eLaRcHika)TPtX^A|C1usfsHJ1noB@}D|*?9y=%mSMZfO}Ygbcqw2p%?qmd zf%m|p9ka;F$xb%-KFcLz)&0UQE*@KK?8zP@vc9}1egU+#6b0THNyCS4qyIJyga)D% zKudl!S-$6KZWH$^d4Nh_vf<`wB1V#e;ts&VyU;K&dgM=y39B{PT6LU19=SRqytQm$yLq)sN+o&KA;jJsmc9xAmn0Vpz|XCFM;lOZ*ab{>%F<(JYE{_J9Mp7(3olf7bljqJY7?z*|dZQF@d+Ae(=_x01kqF%ivyrwH#9^7?Cbhf=1<zxrLjz@wT8o$V9#eAc)pV`$0Bc3XH&4AH&gWgR+_%Pq1FjtN9=wpyzVEc*KGWVj zteW&Wyd{q8G4$%aas%-4!Gs&2Wk+W>^iTgS@_Eh;4uc@9$x%1oo8QRex9lpqI8eHf zI9o=#7&vD1go=up9x}vSyz~AbTVGF476kWUa$IQnQk>~40Q3itt$HuGT!_(oTqi)Q z4dmr%#bY@%wl^?ti*!xEe|BMwtZ|G_efiuKREu$+254a++!&Ovg_YY6(T_#-zP>lr z(A50d1mN3=)jvlDAu1W`!RB#hGBh1dnKLQc*MLDlig+V|8pZNMe^VUO zv(Mq_um4=zJmyPOT2-s|q;a3NO_x*6mxu=?FmPv@s;*k4yc#7ZY*grmK=^DOaHcR& z47MHoI++F`Uu73(Ys66p7DgWCs%Aa5IY=>4LPy;aOK3#<_;iBjv)oWFq(vmW$llDj zS6%1jtba>(N|ZRSCG_sMH%%C3oXrM8Q;FF8*f=V5s`3Dj|0HvQS|Wz~!VBC99Borq zbpKL)#3zrlbpD3RoWP=Y;;34g4L_){KUwGLk9roB{>g}yKHKpgbS*q}(Em{O z-;xh0P^NpbAD4i7wA4|tO}2U88nKOb=d+i-ZxDXmQR{1Q;%`AJow2JTKPy~s-!8U! zB$c-v%hPgrCuOuNi6&r6pkCd;b47)^J|qX$P2^j?1Ic0k;dCeoD?XZapa{>B>U`9% zU|5vhL*?`{RwopX!c_UQ&me-OE`RmIu~*ADq8!9wK^5>2TKm8e)+LiXCTRUIM#Ac` z5zw*p-N~_g#V)1Q#9xypnaxNM)(i#;a$ftfg@X}+Ofa6Lfm-M<6lw^3T5)2N`jSXO zc@>57Yr78)-);_HJb7LjKN{y;19KSWR z`^>h!zW4)l3}&Anw%$t(E;dFLOWZsexxdC2USG!CFg$O;f~p3!sAp8<)6>)2K^S#-{qjJ+M@s%o)cN;q97{H>5#6g+&?hC;S4gaJ_esNjzq zJ{@trA&?hO6|E~LH~9nd$ld;Sr}JWxXQa{q_+CIf?Q+AKXLgxscuW44KoeVtqRUkr z^?v5w==+;;zzO6CtYw&mz}E{$Ax{se zxJ-kd900zQyt~`$-^YghjZU{X42s7Zlurc?NVaXhmWi?$Tg|er+BK98ytbv~WMRv9 z-!+Um@R;&)i-=3{I_|!YkYv|b3SrbO%9tAaFr+ObqWz*))89nSmuF?gOzZu-*Ef1p z7JlL0$gYGifAU=+S3_^1k2PWpPRx4>z?2DKkun|#JU~VcaU^40*D+-mD==l#PwV0bpuPyuh&57wqM70B zUZ>yI@x7kgoTy7s{g$o0*O*AX=n*IUL)#LvbxwIsA{Vz{5G!)sLc0AqK`o*Dm6h(n z0q-MP=b#Gql_rs5h3T&cmi=yrGe$oxuTGNO{?sLG?IG`Knue+mc7`wpro-jwd^?^j zTRkQv-Co7#Bykbm{=>T=g1Wk(7vBD?&~QI{8{Mgovw8m)w-3DtO);!KxwIfNkqSf| zHy6hHUKf?aZ{>6R7@=D~lxl1mG^J#q`wx=F_PE6_eki$xwQ($Pe-c`)b$RUCA6*7o zD$gFADx%v?h8^Y?-Loga$E!RS*oj$!(R?{K0|b*i$Snr1gMKc#IUo9O&jyFL5juEA zLI#=aufjNn(J@SL=qLkNHcMFOK+X8Pp!wY;v%m3i+l18}cLhWpH$_LC(!3#`A^+dS z=-U>t@Ufy_p}+cnSC2gNKPe-4EV^0AYw16CLI%_V1}d3@;o|O{ZRi- ztQ+nxe!&W?ra6N8pStq7^KMX3RKf$gB}Jq9`sozAIZ#IPbS(y~efuu~{fE9N8aZ14 z&%shvM&`xsoNK9Pjd7!K1RbkRbs=EMQSq?^j$lC$I-dwHzI0D`jx6^Cw_83-3;zOa zZ)$8)G5rd)I6ZZDLdLo+$?zIAH#25*)E8HNbQ8C_`oXrF`=(q}@0HkWOWT(UT#36= zUR~dx%fa7p_r&P@|9D-Aiw0PqXA(0Uj*8y$ZJ%`bjOYs=U#`7=(K|hQd^lNj3Ilf! zc`n$spva+&N26G{^ zZ19jyL8m~W)2|`+uz4?7NlxI4Y^n>6_WL~fqzERU*3DY+!jXt=5>Y31P@U1_Us8P~ zx6#<3nLm0}Dpm}?zYGEJS7qx=kI2)b8>KHSo>5B;-U+8cYfj@TWh`#tpf)a~kgDIs zM;0v?z_K6)^sP{rzJXA9Rp9f!VU#=y9kAJhbk~+3&rW#S-g&dP=|`^|^X+xkQl`1+ zTS&q}Dj-fB8v??}2F;|mMBlQ5MFqmzmErC;RS@Ie5YQXMv}`>?LtJzbQ0byAo#603 z6L$?T3Qw$=40n#SbRy1>A~W`$6Kl4W7??q%wYs|M#!}6TLMk|U-+PWIDXJqKDx9^c zT=#Ni{8_k`uz;a>Wry3=q%Jye0-5S9oQ)Hd{0*>7aH<5x8v7r6x9um>Y^O0jOUVAH zVCQQPTJbxzSS!A6^RJ~ufhkxLgMJDEJU3sBA`G8V8S5_Cw}+6L?<-@pga(c;5XEbh zZAp)|jdWhP=|eVn^?%&{LHkx3xMVXEFsftfxpc7a0&pqIGQM03mfqIA2bz1SSMXV6 zkt>?7gz7^T^NoZ?zmUIjL-fJW0@b9{c1|SPaM}d@lB!9BebAhaGlEp2@4LyDzrV6bSY6C1q_>@g273YT5!&Q>;T!W2-VO$y;RSyDt>18~KOR^|Smfz(W0|ThayP1)QWRF> zuhKEMb(sBM-WiH`L`EYvS?a<*rA>uD+pX*IB8^aH z4e#%+g-(n=;ZZ5_C{Ke4Ylc(8QEt>M4U2PwgN9)6pXgb0E-{b${YEuY0^jh1*!#7^ zK%4QNfIoF1_SLn;rk(^MMW_HuLJ1X^KO=aP`D}mLExA2SzbHRngAw}-%QI(u z7djH$iBZuK$bXN^z*BAON+hz5)`l#uu+-~wjX&mZ=;I)!QG(B((b&#KPP@VWux=k1 z+?50n%h()%@Coew-@hrz^nWOJF)Q}C(iX|Y_JoZxF1kBjTzn0?+00^xYwdwO;6Y+n8#fA0_gL8V&zABTF=CyHP1)8db)cT*K@EvCaYuv@R6GB9ut2G40M^fvb|k!{LeT1uCJOU01wm? zzfW%4gA#C?18_1QKv zCvvib_lA_(bNu~I`f9nf!$XAmlFy;cwPMyy!dQ16=O%mZy_U~hd+ObwfkscYetqmY zU1LhbQ;MKC(X($Xr(G#_uLCqKpjyz)kJs7$wmAM(v2*ds9NJ?2(VipiXWI+jryas@ zpt$C3Mzd3i*aaoaG`Zg}`c2G|%JV7S?^`QOCSOGj0LW8u`&(gv+9S8V{fnv6&hxKx z(R1xBE9ba2f0p`e7sEzQ*O4`fZC&@rpfVoC{cp-ZscD1PZyw?60h#I6F6|~0+no4n zJU)2#S)F0VN*_*C$x-5c(5u*D{wDw^?WMlGgG!wC_Pa6=&k0za8Y^Nc+3q-AXad+bL(RR8 z@yvkO#aGwAV6*<|Sm4L|+7h9=+7Js7W-xTn-`j;y~6=HpT=_aJ_ zjJ2gUrnAMc8lGhPdmm6kZvp{4ujrTM6Ma&}eRJ4va8jU&u$7y3nz1LYXqo_49KxFO zX956fr(W`-i;~vYbWkAn=bkBv-I5Y7s3yco=o{x48dd$&A)Be`IuoZWdl=31tRvz4 zDR=aeSN~bDOh7mu1wpf6pwPaMU#WvRu&QgHQ>)E;6B%Na_OvDkn3Tlb zb-P*PW+>EZ@~hPC2C(N@1g4i^y%4V!JKLGdB4^ICd@rh@`As&r6)5-`$J{OpKhU7( zk)H7^DPHn5_loZpLH?#d+FXuHodTPeyih1-wCHmz-PTqAA=wRoijm|R(UM8EZDX7G z@DBjU+R=JZ>-8d(D zj(c_!21jRYJHWbBo!faq%lPS$b%$KginOMhpqd&-*y@B@j!2K(&wEPFNZy)K+6VlBhJf7DG_-}|X(-Rgrgi%O<<>v)8H(tTSb zXAMAigH%GS*f&PC^yi_Nf2sd+Nma58Iuw?u!x5UdwAI*A;Tq$>=}nqmbxOkRbwB;Q z=V|t2&z#m?EEQjF&WWOMBxt%q-qRAU)WLrKyhUA{lhY*oyh6LJzk2Y#l-vLL4%cjWc98VIA7UBL+7A8uDOdobuKA~Gqfno zL{31COxes#<%W5)?2|$_ZPY|g?=-Vvs*C#&y1Cp?Q}KAB`Fp<~%KK72X}kBWECUNV zPiSaZ_a80H{NOaGf5|GM`%pC?ee-MS&s=$8D2)_9N~;?#{*qxUrEvp+_MwBN`VYS;gd@H1u&ZKb*VQ@Gnb7oL!4c}kS`rAZxiqBvO(I@JG<(XrtLgeC z1J=m*&6@mp+3}|QuztNeiKQ}E-Ip-saNueD6)gY}V1_k*3vYJOeknn4)H=Ji29L^u zl14ah1~oHEkaInahMbQ!bP@ObJ?-i|6HkMf)_L!owq5%=`I-^;r)Zn&v`~kGFJsEb z&4**)3mmmd-so+()8|>XlP?G6TlQi(et%h&+A4to6=j17j&V-Rli~@9f6AIv3Yqyf zDG8tTg;IKEShid{SnIdB)mg60rX{Q2bdG>4_r5LY&3`SAX4!|EushCYhY%P1foS(ELysUqeB0YB=J(ryf3WWF}+LPF3OA>#V(a9b*v9uNxw!>zG#2Mjavs zDH_t9lu$~6`UJ93?@{fhe09-&1+M>*y&y&`VYDmcLPHj%lDO?o7#z!Kj5+?#_XVZ< z4Q&~RuAhm|EpE6cz`t;l+9Zzi4>$yt2RI=t9M7n2 zo^BbD=N_pz60vU^kw-gc;B^b6;WGwIB1-fRd~AM|+Ua{IFK(IW4%^;_$%0S^E;KtfDnT25Y8 zsgDOF>e@Sab}!16JZ9@jP!iIKg{;UZI-0_KrS0Q$vGc+h6I34lIV=LyXDe~_*}O>X zOU1q1L1TCaTQtm(NWvn}wty=4p>)7^YvNFtdyBr~UTXX#`7d}vH}Y?Dol4(uHp^kl zHM5exnv=cI+0QSH(_w56*d`6Ne|B?|<2em2v;3?*;u3cHu`=29^Qlh5^?@+w9F1x3 zY?7{|!k4VAp1Oi*QU{7T2)2kWuS>yQ-CBQZ?BR^IIIpaC0qPTftdLrZ3>tCb2J)jC z=sZ?YacC<-hPbk-Ywhq?Ka9m-im7Lw2)}2-UO|N+D~AF;H7l#P zCw+r}h=FN3f}mSJ$hWJm1qlubO-5$+ZK=QfnP6W%a9QA)20GR`+p!}-PU zX(C~#gNS?6je@c6Vl3?-=e&Ba-yglpvpjb1X=#3sVv|rfvEir`3H1jxg@EKyH<-$- zQ^2XtpYV2U0MEjo?{E)Sp8ojb=W*k-5E>RH2^9sf>~QT&Qfb}-Cu$Gg{T~*3 z^yuo{KWEts0+>Ne4nt-6MKG4v>@~Ko?db31Zv`mmd>_7H9OtfPRh!h+dfX8c(bHVG zp*;}_KFGg)I-HNEwlN++Y%zIOk7RCt@BuSjPQ}uFaXoiriUP2ASKek;^&C`Y-Q<>( zv|r~{Y(Ds64v^~{WvEXWdbmAwfMmdJ!od4c)!v}><0tq{{bwiC^Bk7)@w_K| zpM1!f*E_>9eGO{KjIof+ljemo2z+m1$yB!5ZfFZ^qC~1B`T4qXb}T!{)!#9MshqZ4Gz|J61mS8xq-i&2=o10HEIUt)blhLFZ*|9v;==HbtvdptEc^ z(~#vg%eP~m>&|Vo)Yii&>GV69__`($;;bj~n+I|5m?vrCAY05`2Y`qVu6b;NC!}yJPDFMHQ#LKe1bQO1LG-DQTPR7 zRIa9ZE4r&EBKzKp9o;}CjmO5Ia_p7xO%_*qt@m9-9KFwBCdqKifl~qHXyx z`MJ)H+=W4icVOoN3Gns#!A36p=VA29$d|i ze!hvbr(EM<>W^0<34=|K$LR~q@xmwypmW8T1#4jdQAyHBpZB5Df&8(nDHzh|e_;+Z z%-Q%JP2by_6LxmDgmoEcO7#=OroDWdPH50U8t%xy!wONTwJc7 z5FLONE9uRM8b9={=6{y0$&jIIDeRGT0o)W_pah)-=r5oovc7$M>v?`YR=8N|y!((i zHL&FGAS-v_qN>uEb4=l}x#uU;LM+_R%*=fD<_$jnvoV(xj{mQ_bB}7`%H#NiP^gBN zVi1Udx=N@B7(zfGqEXPQ1zMg08U;Z}gn$u=B8Wj=Mi7>20gGZ31%(78gn&HKplu>A z1;L2KAQpL)heo@IRnh&W?K!*M({uLhzx(H$$z+nroqKQQe!jmCZ9nsEFu~oO_S3Ux zkz4_tVi~VHEs)>+o8zleuh7KfDoNVe^^~x~iqe9DPG&(uux45GvP7^|q z@Wt7{jYsV&$|#yqbSni8$ag=z+HcN^CJ2)6e7J5RVv6X+5eYp5g0L*2|zV1FA3_ye2;hUt~bu0=$;~O&YhWOdUxbA+P+rdy!`U$^jZ-ms^ji zEh+iQh0QS*UK(D&g0h@Z-AIQ-t&aUtwRW|=@`WV!?Tg(f_0+kAr*dRsjN@E_;@E{5 zj_NpDqF$F*ere7XQn2FeVjUIZw~*}4n$}gLITfRtThifu`VEXZ1*d!Z!s&F`UD?Kn z=n-CR9jj;Hkqhu<%gCVa`)|8L3IpfZ&Hw9B|8~y4zjv4Ln)En9dOM0v-&|W#sLGxs z4OG4Bkd zO_C5Tx9fZxZs{kZP|`FXGH+mY9i5joKrSy_IE0s@%Wb52oTAJZ|{ z0{zuRmNgNQbRx$U=_H}>>J3A%ND;9q+d`6D18rDef-j2Je5im-)aM!?Q84koj38b0E8=|bVb!KMW3(WNFm!;<;WZEXC+9(1Zd@O~<}Yv6q2 z3t8E%Z2A`*eySRtLdehedk+Ih=vSm#MkO@N6=rLtpQFnOND^pm*yY>Z# zt9P5}ne^+g{SDDyF|0|+#Go1-M1v-#dSY^t&Lw{XNz(#m+LlSVF~%9yu>xt)GLEqD|}X$^%-)oz(THZKH`vxf(^A9Ms!`Q75UD|v=AZ;_wIsWQyRkLr(`Kqb|5laFC{kJ4x zv4KZF*nb>Qe?4hz9L{(0N!^HoF`29L`e%@d%g~-2 ztgkwWxwnb;W2VQ^qfOJeJHOtOKF5Jz7*QLKjZ<2?YON2xl!;Ro6%RQ)CG(G6kuP_) z&G^u@`s!&btjr9TeWv4j|0<{Bh6o?J3B+>6roGy-U4cQgKyRElu58GgZ{M82bhcY# zTfvy8^jffdWs35VpI=V5^Hq&p9VoYfcJ%PrRhnzzScWc(=n-pLH;-pI*Yf7E+@eEx zg2ZTe>2_q@C*BB5)DT{)dHfFB1BrvQse|+gR$tX^L`{HSh9f_+Q*YxGXqTv~Z}m>X zIDzui;RI!DsFV*s;ABBNdp^#8=cYsf&AIkTX<;Fu4G)1%nO~S6=; zR8Tzqv`O-C4ADmivAo@I-XJ%(z+ua~gq8w%`#&az_$-ngEnd&vr~G(LDcDMOQ?BMF z_HTen4IKFJ|8jJKs)@S#@?o!x7$4z*hXU1r8{(6`#~cxy#5BqK+|i3;Dqh6L;*kny zD@H&3kaaWeskha)_hY*HIJ={y4E1_5Tv+x6*8RH;G#G&+RfEm35Z)zUJN`Xk*1n{^ zjuTaZHIL~Y&kS!ryUZ+#e=|F|Xujbf)+%wD3N^>;W@uS+?b5q+jT-bsF}J;zTSUA)O*1OHHWIi&_Dg1 z{QZwB8aF4w!vD_~XI9y*S*y^>7vA^lxO0c7VC%j(lctIVe-jmH+iSeY!k%^xT|lx* zeM~dsl%Ie_kT<5yED~u;dDE%Ns>)S0JP!Lh5$r&}c}K@~fct#K`yVx7cYHy^jJZ7ixoLE%(RXU zU}wZ~)`Eo?uak>$s)=~j4ogGicPg0Ik`GT=ZaB8ZeFaz{))XJS^*5pOwr$($H~9-% zaIJ>l*JO^sWX?4o>z;MhifytW^~vcFPtn6$>^!Jf9vTzYJi4mBKmvu1m%Ym&Za(RblA=8Hsas%#WlJUuJ|C`&aL0O>Ton))w-*}V$D=y-qZpw z@A`?XM{#+r+>p}C%P*R{AI2KE!7 zmA>~993dO+Y+&%G?AmOMqL%m8s+=K1g6KnsfJjqtuY|MW{waxshFfe54yuK4jF$E< zf{D0C7YDc)bQ($@(nV>Cdm0g$qoDV0LzzYNWqEIKue?rW@XJd!`9YlbMHA!P^t20% zO8iJ9^66*OE!qh;lw(y0H35u~2+Hd-RVfsd0S*>sNy|=5d_-A}nAlstiZ_I9WZPyR K&+1KKC;kcZz6#v{ literal 0 HcmV?d00001 From 951eda9d15ac4e11a8da2bd22d0d4e59521cc284 Mon Sep 17 00:00:00 2001 From: sudarshan-SpringML Date: Mon, 5 Sep 2022 12:52:01 +0000 Subject: [PATCH 20/34] ran linter --- ..._tabular_regression_model_evaluation.ipynb | 2909 +++++++++-------- 1 file changed, 1469 insertions(+), 1440 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index d314dba71..e1d623c3e 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -1,1444 +1,1473 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Regression model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI regression model evaluation component to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you learn how to evaluate a Vertex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `AutoML`\n", - "- Vertex AI `TabularDataset` (AutoML)\n", - "- Vertex AI `AutoMLTabularTrainingJob`\n", - "- Vertex AI `BatchPrediction`\n", - "- Vertex AI `Pipeline`\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Create a Vertex AI Dataset\n", - "- Configure a `AutoMLTabularTrainingJob`\n", - "- Run the `AutoMLTabularTrainingJob` which returns a model\n", - "- Import a pre-trained `AutoML model resource` into the pipeline\n", - "- Run a `batch prediction` job\n", - "- Evaulate the AutoML model using the `regression evaluation component`" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook. You can skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. \n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY", - "tags": [] - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). \n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "from kfp.v2 import compiler\n", - "from google.cloud import aiplatform_v1\n", - "import matplotlib.pyplot as plt\n", - "import json" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BiVlyW5OUnjK" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bViYfWfpVAiF" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "20S9En09X0PY", - "tags": [] - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset\",\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print (\"Resource name:\",dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "A-QQkeUnq8Xt" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple regression model using the created dataset using `Age` as the target column. \n", - "\n", - "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Bxn6ATUXrET6" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if TRAINING_JOB_DISPLAY_NAME == \"\" or \\\n", - " TRAINING_JOB_DISPLAY_NAME is None or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\":\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl_\"+UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "An AutoML training job is created with the `AutoMLTabularTrainingJob` class, with the following parameters:\n", - "\n", - "- `display_name`: The human readable name for the `TrainingJob` resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", - "- `column_transformations`: Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using \".\" as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. If none of column_transformations or column_specs is passed, the local credentials being used will try setting column_transformations to \"auto\". To do this, the local credentials require read access to the GCS or BigQuery training data source.\n", - "- `optimization_objective`: The optimization objective to minimize or maximize.\n", - " - `minimize-rmse`\n", - " - `minimize-mae`\n", - " - `minimize-rmsle`\n", - "\n", - "To learn more about `AutoMLTabularTrainingJob` click [here](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob) " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "3l691PEMZFdA", - "outputId": "f0757841-1901-408f-a587-28b6f3a4181d" - }, - "outputs": [], - "source": [ - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"regression\",\n", - " column_transformations=[\n", - " {\"categorical\": {\"column_name\": \"Type\"}},\n", - " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", - " {\"categorical\": {\"column_name\": \"Gender\"}},\n", - " {\"categorical\": {\"column_name\": \"Color1\"}},\n", - " {\"categorical\": {\"column_name\": \"Color2\"}},\n", - " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", - " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", - " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", - " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", - " {\"categorical\": {\"column_name\": \"Health\"}},\n", - " {\"numeric\": {\"column_name\": \"Fee\"}},\n", - " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", - " {\"categorical\": {\"column_name\": \"Adopted\"}},\n", - "\n", - " ], \n", - " optimization_objective=\"minimize-rmse\"\n", - ")\n", - "\n", - "print(train_job)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if MODEL_DISPLAY_NAME == \"\" or \\\n", - " MODEL_DISPLAY_NAME is None or MODEL_DISPLAY_NAME == \"[your-model-display-name]\":\n", - " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model_\"+UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, you start the training job by invoking the method `run`, with the following parameters:\n", - "\n", - "- `dataset`: The `Dataset` resource to train the model.\n", - "- `target_column`: The name of the column, values of which the Model is to predict.\n", - "- `training_fraction_split`: The percentage of the dataset to use for training.\n", - "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", - "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", - "- `model_display_name`: The human readable name for the trained model.\n", - "- `disable_early_stopping`: If true, the entire budget is used.\n", - "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "IIfvPCGYyFCT", - "tags": [] - }, - "outputs": [], - "source": [ - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=\"Age\",\n", - " training_fraction_split=0.8,\n", - " validation_fraction_split=0.1,\n", - " test_fraction_split=0.1,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " disable_early_stopping=False,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rYirKB_9yaa0" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "KkgCdQQAyZP1" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ktMsqtibAUzz" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(\n", - " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = 'jsonl',\n", - " batch_predict_machine_type: str = 'n1-standard-4',\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = ''):\n", - " \n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " GetVertexModelOp,\n", - " EvaluationDataSamplerOp,\n", - " ModelEvaluationRegressionOp, \n", - " ModelEvaluationFeatureAttributionOp,\n", - " ModelImportEvaluationOp\n", - " )\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " \n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - " \n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size\n", - " )\n", - " \n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs['model'],\n", - " job_display_name='model-registry-batch-predict-evaluation',\n", - " gcs_source_uris= data_sampler_task.outputs['gcs_output_directory'],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationRegressionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", - " predictions_format=batch_predict_predictions_format,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name\n", - " )\n", - " \n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format='jsonl',\n", - " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " problem_type=prediction_type,\n", - " regression_metrics=eval_task.outputs['evaluation_metrics'],\n", - " feature_attributions=feature_attribution_task.outputs['feature_attributions'],\n", - " model=get_model_task.outputs['model'],\n", - " dataset_type=batch_predict_instances_format\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_regression_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Regression model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI regression model evaluation component to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a Vertex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML`\n", + "- Vertex AI `TabularDataset` (AutoML)\n", + "- Vertex AI `AutoMLTabularTrainingJob`\n", + "- Vertex AI `BatchPrediction`\n", + "- Vertex AI `Pipeline`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI Dataset\n", + "- Configure a `AutoMLTabularTrainingJob`\n", + "- Run the `AutoMLTabularTrainingJob` which returns a model\n", + "- Import a pre-trained `AutoML model resource` into the pipeline\n", + "- Run a `batch prediction` job\n", + "- Evaulate the AutoML model using the `regression evaluation component`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook. You can skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). \n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BiVlyW5OUnjK" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bViYfWfpVAiF" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "20S9En09X0PY" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset\",\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A-QQkeUnq8Xt" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple regression model using the created dataset using `Age` as the target column. \n", + "\n", + "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Bxn6ATUXrET6" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2e7664fe3af6" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6cb41277f4f3" + }, + "source": [ + "An AutoML training job is created with the `AutoMLTabularTrainingJob` class, with the following parameters:\n", + "\n", + "- `display_name`: The human readable name for the `TrainingJob` resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", + "- `column_transformations`: Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using \".\" as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. If none of column_transformations or column_specs is passed, the local credentials being used will try setting column_transformations to \"auto\". To do this, the local credentials require read access to the GCS or BigQuery training data source.\n", + "- `optimization_objective`: The optimization objective to minimize or maximize.\n", + " - `minimize-rmse`\n", + " - `minimize-mae`\n", + " - `minimize-rmsle`\n", + "\n", + "To learn more about `AutoMLTabularTrainingJob` click [here](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3l691PEMZFdA" + }, + "outputs": [], + "source": [ + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"regression\",\n", + " column_transformations=[\n", + " {\"categorical\": {\"column_name\": \"Type\"}},\n", + " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", + " {\"categorical\": {\"column_name\": \"Gender\"}},\n", + " {\"categorical\": {\"column_name\": \"Color1\"}},\n", + " {\"categorical\": {\"column_name\": \"Color2\"}},\n", + " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", + " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", + " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", + " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", + " {\"categorical\": {\"column_name\": \"Health\"}},\n", + " {\"numeric\": {\"column_name\": \"Fee\"}},\n", + " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", + " {\"categorical\": {\"column_name\": \"Adopted\"}},\n", + " ],\n", + " optimization_objective=\"minimize-rmse\",\n", + ")\n", + "\n", + "print(train_job)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "391c51c98647" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "454f077b984e" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c4f338cdea7c" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "de7e24205889" + }, + "source": [ + "Next, you start the training job by invoking the method `run`, with the following parameters:\n", + "\n", + "- `dataset`: The `Dataset` resource to train the model.\n", + "- `target_column`: The name of the column, values of which the Model is to predict.\n", + "- `training_fraction_split`: The percentage of the dataset to use for training.\n", + "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", + "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", + "- `model_display_name`: The human readable name for the trained model.\n", + "- `disable_early_stopping`: If true, the entire budget is used.\n", + "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5caae7fc10d9" + }, + "source": [ + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IIfvPCGYyFCT" + }, + "outputs": [], + "source": [ + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=\"Age\",\n", + " training_fraction_split=0.8,\n", + " validation_fraction_split=0.1,\n", + " test_fraction_split=0.1,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " disable_early_stopping=False,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rYirKB_9yaa0" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3f4d0c17150d" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "581a188f0453" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5adde0951eb5" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ktMsqtibAUzz" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = \"\",\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=\"jsonl\",\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " problem_type=prediction_type,\n", + " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a712dfa762ee" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_regression_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NOvOMTEgCVcW" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bdd9e2fd6841" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8f17c5c7b3e3" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1aa7d7bbb1c9" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "90f424d5dca0" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", + "- `prediction_type`: Type of the prediction (In this tutorial, it is \"regression\").\n", + "- `target_column_name`: Name of the column to be used as the target for regression.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"prediction_type\": \"regression\",\n", + " \"target_column_name\": \"Age\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e8dce0638349" + }, + "source": [ + "Run the pipeline using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pdHib_yUEuEk" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_regression_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "625960707c60" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U2zocUvk2YVs" + }, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mtHA8rhGGQv3" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", + " 0\n", + " ] # ['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e69f183f902b" + }, + "source": [ + "### Visualize the metrics\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b7c5e5c35ee9" + }, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " if (\n", + " i[0] == \"meanAbsolutePercentageError\"\n", + " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", + " continue\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(10, 5))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c26ad3958895" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance.\n", + "\n", + "To learn more about Feature Attributions click [here](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b09056628b26" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4d9d6a82d826" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c26a2091f4fc" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "77151be8d776" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "069bf017e0de" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "NOvOMTEgCVcW", - "outputId": "e9918c90-562c-4c1a-ce37-ffce17cb0af7" - }, - "outputs": [], - "source": [ - "\n", - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_regression_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\" or \\\n", - " PIPELINE_DISPLAY_NAME == \"\" or PIPELINE_DISPLAY_NAME is None:\n", - " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", - "- `prediction_type`: Type of the prediction (In this tutorial, it is \"regression\").\n", - "- `target_column_name`: Name of the column to be used as the target for regression.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "JeSiA6-TSgV8" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", - "parameters = {\n", - " 'project':PROJECT_ID,\n", - " 'location':REGION,\n", - " 'root_dir':PIPELINE_ROOT,\n", - " 'model_name':model.resource_name,\n", - " 'prediction_type':\"regression\",\n", - " 'target_column_name':\"Age\",\n", - " 'batch_predict_gcs_source_uris':[DATA_SOURCE],\n", - " 'batch_predict_instances_format':'csv',\n", - " 'batch_predict_explanation_data_sample_size': 3000\n", - " }" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Run the pipeline using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pdHib_yUEuEk" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_regression_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U2zocUvk2YVs", - "tags": [] - }, - "source": [ - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "mtHA8rhGGQv3", - "tags": [] - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if ((\"model-evaluation\" in task.task_name) and\n", - " (\"model-evaluation-import\" not in task.task_name) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", - " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualize the metrics\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " if i[0]==\"meanAbsolutePercentageError\": #we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", - " continue\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(10,5))\n", - "plt.bar(x=metrics,height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance.\n", - "\n", - "To learn more about Feature Attributions click [here](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if ((task.task_name == \"feature-attribution\" ) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", - " feat_attrs = task.outputs.get('feature_attributions').artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print(feat_attrs)\n", - "print(feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print(attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - " \n", - "plt.figure(figsize=(5,3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "community-automl-regression.ipynb", - "provenance": [] - }, - "environment": { - "kernel": "python3", - "name": "common-cpu.m94", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/base-cpu:m94" - }, - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" + "collapsed_sections": [], + "name": "automl_tabular_regression_model_evaluation.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.12" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "nbformat": 4, + "nbformat_minor": 0 } From 67f0f0ee1ef3b216503a9912ac500bb8597a0c9f Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Tue, 6 Sep 2022 06:38:17 +0000 Subject: [PATCH 21/34] addresses the review comments --- ...ular_classification_model_evaluation.ipynb | 2804 ++++++++--------- 1 file changed, 1379 insertions(+), 1425 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb index cc5de4927..d4df32c4f 100644 --- a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb @@ -1,1427 +1,1381 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `Datasets`\n", - "- Vertex AI `Training`(AutoML Tabular Classification) \n", - "- Vertex AI `Pipelines`\n", - "- Vertex AI `Batch Predictions`\n", - "\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Create a Vertex AI `Dataset`.\n", - "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", - "- Import the trained `AutoML model resource` into the pipeline.\n", - "- Run a `Batch Prediction` job.\n", - "- Evaulate the AutoML model using the `Classification Evaluation Component`." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "import matplotlib.pyplot as plt\n", - "from google.cloud import aiplatform_v1\n", - "from kfp.v2 import compiler" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8d97acf78771" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3390c9e9426c" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2011a473ce65" - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print(\"Resource name:\", dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6da01c2f1d4f" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple classification model the created dataset using `Adopted` as the target column. \n", - "\n", - "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "5dd3db2d1225" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "0614e3fb19da" - }, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if (\n", - " TRAINING_JOB_DISPLAY_NAME == \"\"\n", - " or TRAINING_JOB_DISPLAY_NAME is None\n", - " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", - "):\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "73020acd076d" - }, - "source": [ - "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", - "\n", - "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", - "- `column_transformations`(Optional): Transformations to apply to the input columns.\n", - "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", - "\n", - "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "531f117e536c" - }, - "outputs": [], - "source": [ - "# Define the AutoML training job\n", - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"classification\",\n", - " column_transformations=[\n", - " {\"categorical\": {\"column_name\": \"Type\"}},\n", - " {\"numeric\": {\"column_name\": \"Age\"}},\n", - " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", - " {\"categorical\": {\"column_name\": \"Color1\"}},\n", - " {\"categorical\": {\"column_name\": \"Color2\"}},\n", - " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", - " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", - " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", - " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", - " {\"categorical\": {\"column_name\": \"Health\"}},\n", - " {\"numeric\": {\"column_name\": \"Fee\"}},\n", - " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", - " ],\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "391c51c98647" - }, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "454f077b984e" - }, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "21b5a27e8171" - }, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if (\n", - " MODEL_DISPLAY_NAME == \"\"\n", - " or MODEL_DISPLAY_NAME is None\n", - " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", - "):\n", - " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0299c1f24a87" - }, - "source": [ - "Run training job on the created TabularDataset by passing the following arguments for training:\n", - "\n", - "- `service_account`: The service account configured to run the training job.\n", - "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", - "- `target_column`: The name of the column values of which the Model is to predict.\n", - "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", - "- `budget_milli_node_hours`(Optional): The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", - "\n", - "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", - "\n", - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "15463d5d2243" - }, - "outputs": [], - "source": [ - "# Specify the target column\n", - "target_column = \"Adopted\"\n", - "\n", - "# Run the training job\n", - "model = train_job.run(\n", - " service_account=SERVICE_ACCOUNT,\n", - " dataset=dataset,\n", - " target_column=target_column,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bfa52eb3f22f" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d56e2b3cf57d" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bd2e1da7a64e" - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "19c434d8b035" - }, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "1db1b1337f20" - }, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "e6e1c0ecc3b6" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = \"jsonl\",\n", - " batch_predict_machine_type: str = \"n1-standard-4\",\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - "):\n", - "\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " EvaluationDataSamplerOp, GetVertexModelOp,\n", - " ModelEvaluationClassificationOp, ModelEvaluationFeatureAttributionOp,\n", - " ModelImportEvaluationOp)\n", - "\n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size,\n", - " )\n", - "\n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs[\"model\"],\n", - " job_display_name=\"model-registry-batch-predict-evaluation\",\n", - " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationClassificationOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " problem_type=prediction_type,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " predictions_format=batch_predict_predictions_format,\n", - " )\n", - "\n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format=\"jsonl\",\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " problem_type=prediction_type,\n", - " classification_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", - " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", - " model=get_model_task.outputs[\"model\"],\n", - " dataset_type=batch_predict_instances_format,\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "1abb012ce04b" - }, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_classification_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "e526b588cae9" - }, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_classification_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "26eef4b83c88" - }, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "63b84f5490d2" - }, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "e0a18b803bb7" - }, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if (\n", - " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", - " or PIPELINE_DISPLAY_NAME == \"\"\n", - " or PIPELINE_DISPLAY_NAME is None\n", - "):\n", - " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "9149549cfd4d" - }, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", - "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", - "- `target_column_name`: Name of the column to be used as the target for classification.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6223d67277f3" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", - "parameters = {\n", - " \"project\": PROJECT_ID,\n", - " \"location\": REGION,\n", - " \"root_dir\": PIPELINE_ROOT,\n", - " \"model_name\": model.resource_name,\n", - " \"prediction_type\": \"classification\",\n", - " \"target_column_name\": \"Adopted\",\n", - " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", - " \"batch_predict_instances_format\": \"csv\",\n", - " \"batch_predict_explanation_data_sample_size\": 3000,\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "4d089ca32516" - }, - "source": [ - "Run the pipeline using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3e7703929a21" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=DISPLAY_NAME,\n", - " template_path=\"tabular_classification_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d1b840a79c4e" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if (\n", - " (\"model-evaluation\" in task.task_name)\n", - " and (\"model-evaluation-import\" not in task.task_name)\n", - " and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " )\n", - " ):\n", - " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", - " 0\n", - " ] # ['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ca00512eb89f" - }, - "source": [ - "### Visualize the metrics\n", - "\n", - "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "f9e38f73f838" - }, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=metrics, height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "049c9bbae2cb" - }, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "03ca8c149bc6" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if (task.task_name == \"feature-attribution\") and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " ):\n", - " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print(feat_attrs)\n", - "print(feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "719d2cd57d10" - }, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "82e308dd8aca" - }, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print(attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5bfe517357f8" - }, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d7a7dca9e3cc" - }, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - "\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "automl_tabular_classification_model_evaluation.ipynb", - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - } - }, - "nbformat": 4, - "nbformat_minor": 0 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `Datasets`\n", + "- Vertex AI `Training`(AutoML Tabular Classification) \n", + "- Vertex AI `Pipelines`\n", + "- Vertex AI `Batch Predictions`\n", + "\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI `Dataset`.\n", + "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", + "- Import the trained `AutoML model resource` into the pipeline.\n", + "- Run a `Batch Prediction` job.\n", + "- Evaulate the AutoML model using the `Classification Evaluation Component`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43", + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "from kfp.v2 import compiler\n", + "from google.cloud import aiplatform_v1\n", + "import matplotlib.pyplot as plt\n", + "import json" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print (\"Resource name:\",dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple classification model the created dataset using `Adopted` as the target column. \n", + "\n", + "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if TRAINING_JOB_DISPLAY_NAME == \"\" or \\\n", + " TRAINING_JOB_DISPLAY_NAME is None or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\":\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", + "\n", + "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", + "- `column_specs`(Optional): Transformations to apply to the input columns(including data-type corrections).\n", + "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", + "\n", + "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define the AutoML training job\n", + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"classification\",\n", + " column_specs={\n", + " \"Type\": \"categorical\",\n", + " \"Age\": \"numeric\",\n", + " \"Breed1\": \"categorical\",\n", + " \"Color1\": \"categorical\",\n", + " \"Color2\": \"categorical\",\n", + " \"MaturitySize\": \"categorical\",\n", + " \"FurLength\": \"categorical\",\n", + " \"Vaccinated\": \"categorical\",\n", + " \"Sterilized\": \"categorical\",\n", + " \"Health\": \"categorical\",\n", + " \"Fee\": \"numeric\",\n", + " \"PhotoAmt\": \"numeric\",\n", + " }\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if MODEL_DISPLAY_NAME == \"\" or \\\n", + " MODEL_DISPLAY_NAME is None or MODEL_DISPLAY_NAME == \"[your-model-display-name]\":\n", + " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run training job on the created TabularDataset by passing the following arguments for training:\n", + "\n", + "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", + "- `target_column`: The name of the column values of which the Model is to predict.\n", + "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", + "- `budget_milli_node_hours`(Optional): The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", + "\n", + "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", + "\n", + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Specify the target column\n", + "target_column = \"Adopted\"\n", + "\n", + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=target_column,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(\n", + " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = 'jsonl',\n", + " batch_predict_machine_type: str = 'n1-standard-4',\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000\n", + "):\n", + " \n", + " from google_cloud_pipeline_components.experimental.evaluation import GetVertexModelOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import EvaluationDataSamplerOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationClassificationOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationFeatureAttributionOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import ModelImportEvaluationOp\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " \n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + " \n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size\n", + " )\n", + " \n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs['model'],\n", + " job_display_name='model-registry-batch-predict-evaluation',\n", + " gcs_source_uris= data_sampler_task.outputs['gcs_output_directory'],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationClassificationOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " problem_type=prediction_type,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", + " predictions_format=batch_predict_predictions_format\n", + " )\n", + " \n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format='jsonl',\n", + " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory']\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " problem_type=prediction_type,\n", + " classification_metrics=eval_task.outputs['evaluation_metrics'],\n", + " feature_attributions=feature_attribution_task.outputs['feature_attributions'],\n", + " model=get_model_task.outputs['model'],\n", + " dataset_type=batch_predict_instances_format\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_classification_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_classification_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\" or \\\n", + " PIPELINE_DISPLAY_NAME == \"\" or PIPELINE_DISPLAY_NAME is None:\n", + " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", + "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", + "- `target_column_name`: Name of the column to be used as the target for classification.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", + "parameters = {\n", + " 'project':PROJECT_ID,\n", + " 'location':REGION,\n", + " 'root_dir':PIPELINE_ROOT,\n", + " 'model_name':model.resource_name,\n", + " 'prediction_type':\"classification\",\n", + " 'target_column_name':\"Adopted\",\n", + " 'batch_predict_gcs_source_uris':[DATA_SOURCE],\n", + " 'batch_predict_instances_format':'csv',\n", + " 'batch_predict_explanation_data_sample_size': 3000\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create a Vertex AI pipeline job using the following parameters:\n", + "\n", + "- `display_name`: The name of the pipeline, this will show up in the Google Cloud console.\n", + "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI or an Artifact Registry URI.\n", + "- `parameter_values`: The mapping from runtime parameter names to its values that\n", + " control the pipeline run.\n", + "- `enable_caching`: Whether to turn on caching for the run.\n", + "\n", + "Learn more about the `PipelineJob` class from [this documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).\n", + "\n", + "After creating, run the pipeline job using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_classification_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if ((\"model-evaluation\" in task.task_name) and\n", + " (\"model-evaluation-import\" not in task.task_name) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", + " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the metrics\n", + "\n", + "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(5,3))\n", + "plt.bar(x=metrics,height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if ((task.task_name == \"feature-attribution\" ) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", + " feat_attrs = task.outputs.get('feature_attributions').artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print (feat_attrs)\n", + "print (feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print (attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + " \n", + "plt.figure(figsize=(5,3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "community-automl-regression.ipynb", + "provenance": [] + }, + "environment": { + "kernel": "conda-env-eval_comp-py", + "name": "common-cpu.m90", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/base-cpu:m90" + }, + "kernelspec": { + "display_name": "Python [conda env:eval_comp]", + "language": "python", + "name": "conda-env-eval_comp-py" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 } From 4c782e20a62cb04a1d4b793202b3071a49032dc9 Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Tue, 6 Sep 2022 06:38:49 +0000 Subject: [PATCH 22/34] ran linter test --- ...ular_classification_model_evaluation.ipynb | 2812 +++++++++-------- 1 file changed, 1433 insertions(+), 1379 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb index d4df32c4f..0c86643ab 100644 --- a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb @@ -1,1381 +1,1435 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `Datasets`\n", - "- Vertex AI `Training`(AutoML Tabular Classification) \n", - "- Vertex AI `Pipelines`\n", - "- Vertex AI `Batch Predictions`\n", - "\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Create a Vertex AI `Dataset`.\n", - "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", - "- Import the trained `AutoML model resource` into the pipeline.\n", - "- Run a `Batch Prediction` job.\n", - "- Evaulate the AutoML model using the `Classification Evaluation Component`." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43", - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "from kfp.v2 import compiler\n", - "from google.cloud import aiplatform_v1\n", - "import matplotlib.pyplot as plt\n", - "import json" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print (\"Resource name:\",dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple classification model the created dataset using `Adopted` as the target column. \n", - "\n", - "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if TRAINING_JOB_DISPLAY_NAME == \"\" or \\\n", - " TRAINING_JOB_DISPLAY_NAME is None or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\":\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", - "\n", - "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", - "- `column_specs`(Optional): Transformations to apply to the input columns(including data-type corrections).\n", - "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", - "\n", - "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Define the AutoML training job\n", - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"classification\",\n", - " column_specs={\n", - " \"Type\": \"categorical\",\n", - " \"Age\": \"numeric\",\n", - " \"Breed1\": \"categorical\",\n", - " \"Color1\": \"categorical\",\n", - " \"Color2\": \"categorical\",\n", - " \"MaturitySize\": \"categorical\",\n", - " \"FurLength\": \"categorical\",\n", - " \"Vaccinated\": \"categorical\",\n", - " \"Sterilized\": \"categorical\",\n", - " \"Health\": \"categorical\",\n", - " \"Fee\": \"numeric\",\n", - " \"PhotoAmt\": \"numeric\",\n", - " }\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if MODEL_DISPLAY_NAME == \"\" or \\\n", - " MODEL_DISPLAY_NAME is None or MODEL_DISPLAY_NAME == \"[your-model-display-name]\":\n", - " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Run training job on the created TabularDataset by passing the following arguments for training:\n", - "\n", - "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", - "- `target_column`: The name of the column values of which the Model is to predict.\n", - "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", - "- `budget_milli_node_hours`(Optional): The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", - "\n", - "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", - "\n", - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Specify the target column\n", - "target_column = \"Adopted\"\n", - "\n", - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=target_column,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(\n", - " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = 'jsonl',\n", - " batch_predict_machine_type: str = 'n1-standard-4',\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000\n", - "):\n", - " \n", - " from google_cloud_pipeline_components.experimental.evaluation import GetVertexModelOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import EvaluationDataSamplerOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationClassificationOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationFeatureAttributionOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import ModelImportEvaluationOp\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " \n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - " \n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size\n", - " )\n", - " \n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs['model'],\n", - " job_display_name='model-registry-batch-predict-evaluation',\n", - " gcs_source_uris= data_sampler_task.outputs['gcs_output_directory'],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationClassificationOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " problem_type=prediction_type,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", - " predictions_format=batch_predict_predictions_format\n", - " )\n", - " \n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format='jsonl',\n", - " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory']\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " problem_type=prediction_type,\n", - " classification_metrics=eval_task.outputs['evaluation_metrics'],\n", - " feature_attributions=feature_attribution_task.outputs['feature_attributions'],\n", - " model=get_model_task.outputs['model'],\n", - " dataset_type=batch_predict_instances_format\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_classification_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_classification_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\" or \\\n", - " PIPELINE_DISPLAY_NAME == \"\" or PIPELINE_DISPLAY_NAME is None:\n", - " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", - "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", - "- `target_column_name`: Name of the column to be used as the target for classification.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", - "parameters = {\n", - " 'project':PROJECT_ID,\n", - " 'location':REGION,\n", - " 'root_dir':PIPELINE_ROOT,\n", - " 'model_name':model.resource_name,\n", - " 'prediction_type':\"classification\",\n", - " 'target_column_name':\"Adopted\",\n", - " 'batch_predict_gcs_source_uris':[DATA_SOURCE],\n", - " 'batch_predict_instances_format':'csv',\n", - " 'batch_predict_explanation_data_sample_size': 3000\n", - " }" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Create a Vertex AI pipeline job using the following parameters:\n", - "\n", - "- `display_name`: The name of the pipeline, this will show up in the Google Cloud console.\n", - "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI or an Artifact Registry URI.\n", - "- `parameter_values`: The mapping from runtime parameter names to its values that\n", - " control the pipeline run.\n", - "- `enable_caching`: Whether to turn on caching for the run.\n", - "\n", - "Learn more about the `PipelineJob` class from [this documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).\n", - "\n", - "After creating, run the pipeline job using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_classification_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if ((\"model-evaluation\" in task.task_name) and\n", - " (\"model-evaluation-import\" not in task.task_name) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", - " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0] #['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualize the metrics\n", - "\n", - "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(5,3))\n", - "plt.bar(x=metrics,height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if ((task.task_name == \"feature-attribution\" ) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", - " feat_attrs = task.outputs.get('feature_attributions').artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print (feat_attrs)\n", - "print (feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print (attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - " \n", - "plt.figure(figsize=(5,3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "community-automl-regression.ipynb", - "provenance": [] - }, - "environment": { - "kernel": "conda-env-eval_comp-py", - "name": "common-cpu.m90", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/base-cpu:m90" - }, - "kernelspec": { - "display_name": "Python [conda env:eval_comp]", - "language": "python", - "name": "conda-env-eval_comp-py" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.6" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `Datasets`\n", + "- Vertex AI `Training`(AutoML Tabular Classification) \n", + "- Vertex AI `Pipelines`\n", + "- Vertex AI `Batch Predictions`\n", + "\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI `Dataset`.\n", + "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", + "- Import the trained `AutoML model resource` into the pipeline.\n", + "- Run a `Batch Prediction` job.\n", + "- Evaulate the AutoML model using the `Classification Evaluation Component`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8d97acf78771" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3390c9e9426c" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2011a473ce65" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6da01c2f1d4f" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple classification model the created dataset using `Adopted` as the target column. \n", + "\n", + "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5dd3db2d1225" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0614e3fb19da" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce9c9f279674" + }, + "source": [ + "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", + "\n", + "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", + "- `column_specs`(Optional): Transformations to apply to the input columns(including data-type corrections).\n", + "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", + "\n", + "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d33629c2aae6" + }, + "outputs": [], + "source": [ + "# Define the AutoML training job\n", + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"classification\",\n", + " column_specs={\n", + " \"Type\": \"categorical\",\n", + " \"Age\": \"numeric\",\n", + " \"Breed1\": \"categorical\",\n", + " \"Color1\": \"categorical\",\n", + " \"Color2\": \"categorical\",\n", + " \"MaturitySize\": \"categorical\",\n", + " \"FurLength\": \"categorical\",\n", + " \"Vaccinated\": \"categorical\",\n", + " \"Sterilized\": \"categorical\",\n", + " \"Health\": \"categorical\",\n", + " \"Fee\": \"numeric\",\n", + " \"PhotoAmt\": \"numeric\",\n", + " },\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "391c51c98647" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "454f077b984e" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "21b5a27e8171" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "93ebafd3f347" + }, + "source": [ + "Run training job on the created TabularDataset by passing the following arguments for training:\n", + "\n", + "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", + "- `target_column`: The name of the column values of which the Model is to predict.\n", + "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", + "- `budget_milli_node_hours`(Optional): The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", + "\n", + "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", + "\n", + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9ce44a2ab942" + }, + "outputs": [], + "source": [ + "# Specify the target column\n", + "target_column = \"Adopted\"\n", + "\n", + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=target_column,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bfa52eb3f22f" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d56e2b3cf57d" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bd2e1da7a64e" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "19c434d8b035" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1db1b1337f20" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e6e1c0ecc3b6" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationClassificationOp, ModelEvaluationFeatureAttributionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationClassificationOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " problem_type=prediction_type,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=\"jsonl\",\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " problem_type=prediction_type,\n", + " classification_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1abb012ce04b" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_classification_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e526b588cae9" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_classification_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "26eef4b83c88" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "63b84f5490d2" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e0a18b803bb7" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9149549cfd4d" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", + "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", + "- `target_column_name`: Name of the column to be used as the target for classification.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6223d67277f3" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"prediction_type\": \"classification\",\n", + " \"target_column_name\": \"Adopted\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0409b0f330c2" + }, + "source": [ + "Create a Vertex AI pipeline job using the following parameters:\n", + "\n", + "- `display_name`: The name of the pipeline, this will show up in the Google Cloud console.\n", + "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI or an Artifact Registry URI.\n", + "- `parameter_values`: The mapping from runtime parameter names to its values that\n", + " control the pipeline run.\n", + "- `enable_caching`: Whether to turn on caching for the run.\n", + "\n", + "Learn more about the `PipelineJob` class from [this documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).\n", + "\n", + "After creating, run the pipeline job using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "894afe1ba396" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_classification_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d1b840a79c4e" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", + " 0\n", + " ] # ['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ca00512eb89f" + }, + "source": [ + "### Visualize the metrics\n", + "\n", + "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "f9e38f73f838" + }, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "049c9bbae2cb" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "03ca8c149bc6" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "719d2cd57d10" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "82e308dd8aca" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5bfe517357f8" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d7a7dca9e3cc" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "automl_tabular_classification_model_evaluation.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 } From 5f0debb0a36ad1c0114a29f0cdc4ca2f29567f74 Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Tue, 6 Sep 2022 06:55:25 +0000 Subject: [PATCH 23/34] removes the artifacts comment --- ...ular_classification_model_evaluation.ipynb | 2812 ++++++++--------- 1 file changed, 1379 insertions(+), 1433 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb index 0c86643ab..a6a44472b 100644 --- a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb @@ -1,1435 +1,1381 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `Datasets`\n", - "- Vertex AI `Training`(AutoML Tabular Classification) \n", - "- Vertex AI `Pipelines`\n", - "- Vertex AI `Batch Predictions`\n", - "\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Create a Vertex AI `Dataset`.\n", - "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", - "- Import the trained `AutoML model resource` into the pipeline.\n", - "- Run a `Batch Prediction` job.\n", - "- Evaulate the AutoML model using the `Classification Evaluation Component`." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "import matplotlib.pyplot as plt\n", - "from google.cloud import aiplatform_v1\n", - "from kfp.v2 import compiler" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8d97acf78771" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3390c9e9426c" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2011a473ce65" - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print(\"Resource name:\", dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6da01c2f1d4f" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple classification model the created dataset using `Adopted` as the target column. \n", - "\n", - "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "5dd3db2d1225" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "0614e3fb19da" - }, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if (\n", - " TRAINING_JOB_DISPLAY_NAME == \"\"\n", - " or TRAINING_JOB_DISPLAY_NAME is None\n", - " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", - "):\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce9c9f279674" - }, - "source": [ - "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", - "\n", - "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", - "- `column_specs`(Optional): Transformations to apply to the input columns(including data-type corrections).\n", - "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", - "\n", - "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d33629c2aae6" - }, - "outputs": [], - "source": [ - "# Define the AutoML training job\n", - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"classification\",\n", - " column_specs={\n", - " \"Type\": \"categorical\",\n", - " \"Age\": \"numeric\",\n", - " \"Breed1\": \"categorical\",\n", - " \"Color1\": \"categorical\",\n", - " \"Color2\": \"categorical\",\n", - " \"MaturitySize\": \"categorical\",\n", - " \"FurLength\": \"categorical\",\n", - " \"Vaccinated\": \"categorical\",\n", - " \"Sterilized\": \"categorical\",\n", - " \"Health\": \"categorical\",\n", - " \"Fee\": \"numeric\",\n", - " \"PhotoAmt\": \"numeric\",\n", - " },\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "391c51c98647" - }, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "454f077b984e" - }, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "21b5a27e8171" - }, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if (\n", - " MODEL_DISPLAY_NAME == \"\"\n", - " or MODEL_DISPLAY_NAME is None\n", - " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", - "):\n", - " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "93ebafd3f347" - }, - "source": [ - "Run training job on the created TabularDataset by passing the following arguments for training:\n", - "\n", - "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", - "- `target_column`: The name of the column values of which the Model is to predict.\n", - "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", - "- `budget_milli_node_hours`(Optional): The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", - "\n", - "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", - "\n", - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "9ce44a2ab942" - }, - "outputs": [], - "source": [ - "# Specify the target column\n", - "target_column = \"Adopted\"\n", - "\n", - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=target_column,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bfa52eb3f22f" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d56e2b3cf57d" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bd2e1da7a64e" - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "19c434d8b035" - }, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "1db1b1337f20" - }, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "e6e1c0ecc3b6" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = \"jsonl\",\n", - " batch_predict_machine_type: str = \"n1-standard-4\",\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - "):\n", - "\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " EvaluationDataSamplerOp, GetVertexModelOp,\n", - " ModelEvaluationClassificationOp, ModelEvaluationFeatureAttributionOp,\n", - " ModelImportEvaluationOp)\n", - "\n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size,\n", - " )\n", - "\n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs[\"model\"],\n", - " job_display_name=\"model-registry-batch-predict-evaluation\",\n", - " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationClassificationOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " problem_type=prediction_type,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " predictions_format=batch_predict_predictions_format,\n", - " )\n", - "\n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format=\"jsonl\",\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " problem_type=prediction_type,\n", - " classification_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", - " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", - " model=get_model_task.outputs[\"model\"],\n", - " dataset_type=batch_predict_instances_format,\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "1abb012ce04b" - }, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_classification_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "e526b588cae9" - }, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_classification_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "26eef4b83c88" - }, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "63b84f5490d2" - }, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "e0a18b803bb7" - }, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if (\n", - " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", - " or PIPELINE_DISPLAY_NAME == \"\"\n", - " or PIPELINE_DISPLAY_NAME is None\n", - "):\n", - " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "9149549cfd4d" - }, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", - "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", - "- `target_column_name`: Name of the column to be used as the target for classification.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6223d67277f3" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", - "parameters = {\n", - " \"project\": PROJECT_ID,\n", - " \"location\": REGION,\n", - " \"root_dir\": PIPELINE_ROOT,\n", - " \"model_name\": model.resource_name,\n", - " \"prediction_type\": \"classification\",\n", - " \"target_column_name\": \"Adopted\",\n", - " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", - " \"batch_predict_instances_format\": \"csv\",\n", - " \"batch_predict_explanation_data_sample_size\": 3000,\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0409b0f330c2" - }, - "source": [ - "Create a Vertex AI pipeline job using the following parameters:\n", - "\n", - "- `display_name`: The name of the pipeline, this will show up in the Google Cloud console.\n", - "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI or an Artifact Registry URI.\n", - "- `parameter_values`: The mapping from runtime parameter names to its values that\n", - " control the pipeline run.\n", - "- `enable_caching`: Whether to turn on caching for the run.\n", - "\n", - "Learn more about the `PipelineJob` class from [this documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).\n", - "\n", - "After creating, run the pipeline job using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "894afe1ba396" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_classification_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d1b840a79c4e" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if (\n", - " (\"model-evaluation\" in task.task_name)\n", - " and (\"model-evaluation-import\" not in task.task_name)\n", - " and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " )\n", - " ):\n", - " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", - " 0\n", - " ] # ['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ca00512eb89f" - }, - "source": [ - "### Visualize the metrics\n", - "\n", - "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "f9e38f73f838" - }, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=metrics, height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "049c9bbae2cb" - }, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "03ca8c149bc6" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if (task.task_name == \"feature-attribution\") and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " ):\n", - " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print(feat_attrs)\n", - "print(feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "719d2cd57d10" - }, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "82e308dd8aca" - }, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print(attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5bfe517357f8" - }, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d7a7dca9e3cc" - }, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - "\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "automl_tabular_classification_model_evaluation.ipynb", - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - } - }, - "nbformat": 4, - "nbformat_minor": 0 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `Datasets`\n", + "- Vertex AI `Training`(AutoML Tabular Classification) \n", + "- Vertex AI `Pipelines`\n", + "- Vertex AI `Batch Predictions`\n", + "\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI `Dataset`.\n", + "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", + "- Import the trained `AutoML model resource` into the pipeline.\n", + "- Run a `Batch Prediction` job.\n", + "- Evaulate the AutoML model using the `Classification Evaluation Component`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43", + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "from kfp.v2 import compiler\n", + "from google.cloud import aiplatform_v1\n", + "import matplotlib.pyplot as plt\n", + "import json" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print (\"Resource name:\",dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple classification model the created dataset using `Adopted` as the target column. \n", + "\n", + "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if TRAINING_JOB_DISPLAY_NAME == \"\" or \\\n", + " TRAINING_JOB_DISPLAY_NAME is None or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\":\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", + "\n", + "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", + "- `column_specs`(Optional): Transformations to apply to the input columns(including data-type corrections).\n", + "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", + "\n", + "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define the AutoML training job\n", + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"classification\",\n", + " column_specs={\n", + " \"Type\": \"categorical\",\n", + " \"Age\": \"numeric\",\n", + " \"Breed1\": \"categorical\",\n", + " \"Color1\": \"categorical\",\n", + " \"Color2\": \"categorical\",\n", + " \"MaturitySize\": \"categorical\",\n", + " \"FurLength\": \"categorical\",\n", + " \"Vaccinated\": \"categorical\",\n", + " \"Sterilized\": \"categorical\",\n", + " \"Health\": \"categorical\",\n", + " \"Fee\": \"numeric\",\n", + " \"PhotoAmt\": \"numeric\",\n", + " }\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if MODEL_DISPLAY_NAME == \"\" or \\\n", + " MODEL_DISPLAY_NAME is None or MODEL_DISPLAY_NAME == \"[your-model-display-name]\":\n", + " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run training job on the created TabularDataset by passing the following arguments for training:\n", + "\n", + "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", + "- `target_column`: The name of the column values of which the Model is to predict.\n", + "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", + "- `budget_milli_node_hours`(Optional): The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", + "\n", + "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", + "\n", + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Specify the target column\n", + "target_column = \"Adopted\"\n", + "\n", + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=target_column,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(\n", + " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = 'jsonl',\n", + " batch_predict_machine_type: str = 'n1-standard-4',\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000\n", + "):\n", + " \n", + " from google_cloud_pipeline_components.experimental.evaluation import GetVertexModelOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import EvaluationDataSamplerOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationClassificationOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationFeatureAttributionOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import ModelImportEvaluationOp\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " \n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + " \n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size\n", + " )\n", + " \n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs['model'],\n", + " job_display_name='model-registry-batch-predict-evaluation',\n", + " gcs_source_uris= data_sampler_task.outputs['gcs_output_directory'],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationClassificationOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " problem_type=prediction_type,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", + " predictions_format=batch_predict_predictions_format\n", + " )\n", + " \n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format='jsonl',\n", + " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory']\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " problem_type=prediction_type,\n", + " classification_metrics=eval_task.outputs['evaluation_metrics'],\n", + " feature_attributions=feature_attribution_task.outputs['feature_attributions'],\n", + " model=get_model_task.outputs['model'],\n", + " dataset_type=batch_predict_instances_format\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_classification_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_classification_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\" or \\\n", + " PIPELINE_DISPLAY_NAME == \"\" or PIPELINE_DISPLAY_NAME is None:\n", + " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", + "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", + "- `target_column_name`: Name of the column to be used as the target for classification.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", + "parameters = {\n", + " 'project':PROJECT_ID,\n", + " 'location':REGION,\n", + " 'root_dir':PIPELINE_ROOT,\n", + " 'model_name':model.resource_name,\n", + " 'prediction_type':\"classification\",\n", + " 'target_column_name':\"Adopted\",\n", + " 'batch_predict_gcs_source_uris':[DATA_SOURCE],\n", + " 'batch_predict_instances_format':'csv',\n", + " 'batch_predict_explanation_data_sample_size': 3000\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create a Vertex AI pipeline job using the following parameters:\n", + "\n", + "- `display_name`: The name of the pipeline, this will show up in the Google Cloud console.\n", + "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI or an Artifact Registry URI.\n", + "- `parameter_values`: The mapping from runtime parameter names to its values that\n", + " control the pipeline run.\n", + "- `enable_caching`: Whether to turn on caching for the run.\n", + "\n", + "Learn more about the `PipelineJob` class from [this documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).\n", + "\n", + "After creating, run the pipeline job using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_classification_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if ((\"model-evaluation\" in task.task_name) and\n", + " (\"model-evaluation-import\" not in task.task_name) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", + " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0]\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the metrics\n", + "\n", + "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(5,3))\n", + "plt.bar(x=metrics,height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if ((task.task_name == \"feature-attribution\" ) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", + " feat_attrs = task.outputs.get('feature_attributions').artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print (feat_attrs)\n", + "print (feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print (attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + " \n", + "plt.figure(figsize=(5,3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "community-automl-regression.ipynb", + "provenance": [] + }, + "environment": { + "kernel": "conda-env-eval_comp-py", + "name": "common-cpu.m90", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/base-cpu:m90" + }, + "kernelspec": { + "display_name": "Python [conda env:eval_comp]", + "language": "python", + "name": "conda-env-eval_comp-py" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 } From 86581cbc6239d95f0d8826868bff02449ec960e0 Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Tue, 6 Sep 2022 06:55:54 +0000 Subject: [PATCH 24/34] ran linter test --- ...ular_classification_model_evaluation.ipynb | 2810 +++++++++-------- 1 file changed, 1431 insertions(+), 1379 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb index a6a44472b..6c4a483a0 100644 --- a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb @@ -1,1381 +1,1433 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `Datasets`\n", - "- Vertex AI `Training`(AutoML Tabular Classification) \n", - "- Vertex AI `Pipelines`\n", - "- Vertex AI `Batch Predictions`\n", - "\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Create a Vertex AI `Dataset`.\n", - "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", - "- Import the trained `AutoML model resource` into the pipeline.\n", - "- Run a `Batch Prediction` job.\n", - "- Evaulate the AutoML model using the `Classification Evaluation Component`." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43", - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "from kfp.v2 import compiler\n", - "from google.cloud import aiplatform_v1\n", - "import matplotlib.pyplot as plt\n", - "import json" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print (\"Resource name:\",dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple classification model the created dataset using `Adopted` as the target column. \n", - "\n", - "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if TRAINING_JOB_DISPLAY_NAME == \"\" or \\\n", - " TRAINING_JOB_DISPLAY_NAME is None or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\":\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", - "\n", - "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", - "- `column_specs`(Optional): Transformations to apply to the input columns(including data-type corrections).\n", - "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", - "\n", - "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Define the AutoML training job\n", - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"classification\",\n", - " column_specs={\n", - " \"Type\": \"categorical\",\n", - " \"Age\": \"numeric\",\n", - " \"Breed1\": \"categorical\",\n", - " \"Color1\": \"categorical\",\n", - " \"Color2\": \"categorical\",\n", - " \"MaturitySize\": \"categorical\",\n", - " \"FurLength\": \"categorical\",\n", - " \"Vaccinated\": \"categorical\",\n", - " \"Sterilized\": \"categorical\",\n", - " \"Health\": \"categorical\",\n", - " \"Fee\": \"numeric\",\n", - " \"PhotoAmt\": \"numeric\",\n", - " }\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if MODEL_DISPLAY_NAME == \"\" or \\\n", - " MODEL_DISPLAY_NAME is None or MODEL_DISPLAY_NAME == \"[your-model-display-name]\":\n", - " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Run training job on the created TabularDataset by passing the following arguments for training:\n", - "\n", - "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", - "- `target_column`: The name of the column values of which the Model is to predict.\n", - "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", - "- `budget_milli_node_hours`(Optional): The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", - "\n", - "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", - "\n", - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Specify the target column\n", - "target_column = \"Adopted\"\n", - "\n", - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=target_column,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(\n", - " name='vertex-evaluation-automl-tabular-feature-attribution-pipeline')\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = 'jsonl',\n", - " batch_predict_machine_type: str = 'n1-standard-4',\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000\n", - "):\n", - " \n", - " from google_cloud_pipeline_components.experimental.evaluation import GetVertexModelOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import EvaluationDataSamplerOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationClassificationOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationFeatureAttributionOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import ModelImportEvaluationOp\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " \n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - " \n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size\n", - " )\n", - " \n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs['model'],\n", - " job_display_name='model-registry-batch-predict-evaluation',\n", - " gcs_source_uris= data_sampler_task.outputs['gcs_output_directory'],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationClassificationOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " problem_type=prediction_type,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", - " predictions_format=batch_predict_predictions_format\n", - " )\n", - " \n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format='jsonl',\n", - " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory']\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " problem_type=prediction_type,\n", - " classification_metrics=eval_task.outputs['evaluation_metrics'],\n", - " feature_attributions=feature_attribution_task.outputs['feature_attributions'],\n", - " model=get_model_task.outputs['model'],\n", - " dataset_type=batch_predict_instances_format\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_classification_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_classification_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\" or \\\n", - " PIPELINE_DISPLAY_NAME == \"\" or PIPELINE_DISPLAY_NAME is None:\n", - " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", - "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", - "- `target_column_name`: Name of the column to be used as the target for classification.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", - "parameters = {\n", - " 'project':PROJECT_ID,\n", - " 'location':REGION,\n", - " 'root_dir':PIPELINE_ROOT,\n", - " 'model_name':model.resource_name,\n", - " 'prediction_type':\"classification\",\n", - " 'target_column_name':\"Adopted\",\n", - " 'batch_predict_gcs_source_uris':[DATA_SOURCE],\n", - " 'batch_predict_instances_format':'csv',\n", - " 'batch_predict_explanation_data_sample_size': 3000\n", - " }" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Create a Vertex AI pipeline job using the following parameters:\n", - "\n", - "- `display_name`: The name of the pipeline, this will show up in the Google Cloud console.\n", - "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI or an Artifact Registry URI.\n", - "- `parameter_values`: The mapping from runtime parameter names to its values that\n", - " control the pipeline run.\n", - "- `enable_caching`: Whether to turn on caching for the run.\n", - "\n", - "Learn more about the `PipelineJob` class from [this documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).\n", - "\n", - "After creating, run the pipeline job using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_classification_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if ((\"model-evaluation\" in task.task_name) and\n", - " (\"model-evaluation-import\" not in task.task_name) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", - " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0]\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualize the metrics\n", - "\n", - "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(5,3))\n", - "plt.bar(x=metrics,height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if ((task.task_name == \"feature-attribution\" ) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", - " feat_attrs = task.outputs.get('feature_attributions').artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print (feat_attrs)\n", - "print (feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print (attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - " \n", - "plt.figure(figsize=(5,3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "community-automl-regression.ipynb", - "provenance": [] - }, - "environment": { - "kernel": "conda-env-eval_comp-py", - "name": "common-cpu.m90", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/base-cpu:m90" - }, - "kernelspec": { - "display_name": "Python [conda env:eval_comp]", - "language": "python", - "name": "conda-env-eval_comp-py" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.6" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `Datasets`\n", + "- Vertex AI `Training`(AutoML Tabular Classification) \n", + "- Vertex AI `Pipelines`\n", + "- Vertex AI `Batch Predictions`\n", + "\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI `Dataset`.\n", + "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", + "- Import the trained `AutoML model resource` into the pipeline.\n", + "- Run a `Batch Prediction` job.\n", + "- Evaulate the AutoML model using the `Classification Evaluation Component`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8d97acf78771" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3390c9e9426c" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2011a473ce65" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6da01c2f1d4f" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple classification model the created dataset using `Adopted` as the target column. \n", + "\n", + "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5dd3db2d1225" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0614e3fb19da" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce9c9f279674" + }, + "source": [ + "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", + "\n", + "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", + "- `column_specs`(Optional): Transformations to apply to the input columns(including data-type corrections).\n", + "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", + "\n", + "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d33629c2aae6" + }, + "outputs": [], + "source": [ + "# Define the AutoML training job\n", + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"classification\",\n", + " column_specs={\n", + " \"Type\": \"categorical\",\n", + " \"Age\": \"numeric\",\n", + " \"Breed1\": \"categorical\",\n", + " \"Color1\": \"categorical\",\n", + " \"Color2\": \"categorical\",\n", + " \"MaturitySize\": \"categorical\",\n", + " \"FurLength\": \"categorical\",\n", + " \"Vaccinated\": \"categorical\",\n", + " \"Sterilized\": \"categorical\",\n", + " \"Health\": \"categorical\",\n", + " \"Fee\": \"numeric\",\n", + " \"PhotoAmt\": \"numeric\",\n", + " },\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "391c51c98647" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "454f077b984e" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "21b5a27e8171" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "93ebafd3f347" + }, + "source": [ + "Run training job on the created TabularDataset by passing the following arguments for training:\n", + "\n", + "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", + "- `target_column`: The name of the column values of which the Model is to predict.\n", + "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", + "- `budget_milli_node_hours`(Optional): The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", + "\n", + "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", + "\n", + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9ce44a2ab942" + }, + "outputs": [], + "source": [ + "# Specify the target column\n", + "target_column = \"Adopted\"\n", + "\n", + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=target_column,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bfa52eb3f22f" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d56e2b3cf57d" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bd2e1da7a64e" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "19c434d8b035" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1db1b1337f20" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e6e1c0ecc3b6" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationClassificationOp, ModelEvaluationFeatureAttributionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationClassificationOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " problem_type=prediction_type,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=\"jsonl\",\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " problem_type=prediction_type,\n", + " classification_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1abb012ce04b" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_classification_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e526b588cae9" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_classification_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "26eef4b83c88" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "63b84f5490d2" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e0a18b803bb7" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9149549cfd4d" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", + "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", + "- `target_column_name`: Name of the column to be used as the target for classification.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6223d67277f3" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"prediction_type\": \"classification\",\n", + " \"target_column_name\": \"Adopted\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0409b0f330c2" + }, + "source": [ + "Create a Vertex AI pipeline job using the following parameters:\n", + "\n", + "- `display_name`: The name of the pipeline, this will show up in the Google Cloud console.\n", + "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI or an Artifact Registry URI.\n", + "- `parameter_values`: The mapping from runtime parameter names to its values that\n", + " control the pipeline run.\n", + "- `enable_caching`: Whether to turn on caching for the run.\n", + "\n", + "Learn more about the `PipelineJob` class from [this documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).\n", + "\n", + "After creating, run the pipeline job using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "894afe1ba396" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_classification_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ec4ec00ab350" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[0]\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ca00512eb89f" + }, + "source": [ + "### Visualize the metrics\n", + "\n", + "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "f9e38f73f838" + }, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "049c9bbae2cb" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "03ca8c149bc6" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "719d2cd57d10" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "82e308dd8aca" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5bfe517357f8" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d7a7dca9e3cc" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "automl_tabular_classification_model_evaluation.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 } From 6e2d5b78584052dd29b8cb97481c921dc5d431d1 Mon Sep 17 00:00:00 2001 From: sudarshan-SpringML Date: Tue, 6 Sep 2022 10:54:42 +0000 Subject: [PATCH 25/34] reviewed comments --- ..._tabular_regression_model_evaluation.ipynb | 2974 +++++++++-------- ...tabular_regression_evaluation_pipeline.png | Bin 37994 -> 33828 bytes 2 files changed, 1503 insertions(+), 1471 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index e1d623c3e..0ad4b7ec6 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -1,1473 +1,1505 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Regression model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI regression model evaluation component to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you learn how to evaluate a Vertex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `AutoML`\n", - "- Vertex AI `TabularDataset` (AutoML)\n", - "- Vertex AI `AutoMLTabularTrainingJob`\n", - "- Vertex AI `BatchPrediction`\n", - "- Vertex AI `Pipeline`\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Create a Vertex AI Dataset\n", - "- Configure a `AutoMLTabularTrainingJob`\n", - "- Run the `AutoMLTabularTrainingJob` which returns a model\n", - "- Import a pre-trained `AutoML model resource` into the pipeline\n", - "- Run a `batch prediction` job\n", - "- Evaulate the AutoML model using the `regression evaluation component`" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook. You can skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. \n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). \n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "import matplotlib.pyplot as plt\n", - "from google.cloud import aiplatform_v1\n", - "from kfp.v2 import compiler" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BiVlyW5OUnjK" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bViYfWfpVAiF" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "20S9En09X0PY" - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset\",\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print(\"Resource name:\", dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "A-QQkeUnq8Xt" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple regression model using the created dataset using `Age` as the target column. \n", - "\n", - "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Bxn6ATUXrET6" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2e7664fe3af6" - }, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if (\n", - " TRAINING_JOB_DISPLAY_NAME == \"\"\n", - " or TRAINING_JOB_DISPLAY_NAME is None\n", - " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", - "):\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6cb41277f4f3" - }, - "source": [ - "An AutoML training job is created with the `AutoMLTabularTrainingJob` class, with the following parameters:\n", - "\n", - "- `display_name`: The human readable name for the `TrainingJob` resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", - "- `column_transformations`: Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using \".\" as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. If none of column_transformations or column_specs is passed, the local credentials being used will try setting column_transformations to \"auto\". To do this, the local credentials require read access to the GCS or BigQuery training data source.\n", - "- `optimization_objective`: The optimization objective to minimize or maximize.\n", - " - `minimize-rmse`\n", - " - `minimize-mae`\n", - " - `minimize-rmsle`\n", - "\n", - "To learn more about `AutoMLTabularTrainingJob` click [here](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob) " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3l691PEMZFdA" - }, - "outputs": [], - "source": [ - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"regression\",\n", - " column_transformations=[\n", - " {\"categorical\": {\"column_name\": \"Type\"}},\n", - " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", - " {\"categorical\": {\"column_name\": \"Gender\"}},\n", - " {\"categorical\": {\"column_name\": \"Color1\"}},\n", - " {\"categorical\": {\"column_name\": \"Color2\"}},\n", - " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", - " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", - " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", - " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", - " {\"categorical\": {\"column_name\": \"Health\"}},\n", - " {\"numeric\": {\"column_name\": \"Fee\"}},\n", - " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", - " {\"categorical\": {\"column_name\": \"Adopted\"}},\n", - " ],\n", - " optimization_objective=\"minimize-rmse\",\n", - ")\n", - "\n", - "print(train_job)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "391c51c98647" - }, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "454f077b984e" - }, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "c4f338cdea7c" - }, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if (\n", - " MODEL_DISPLAY_NAME == \"\"\n", - " or MODEL_DISPLAY_NAME is None\n", - " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", - "):\n", - " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "de7e24205889" - }, - "source": [ - "Next, you start the training job by invoking the method `run`, with the following parameters:\n", - "\n", - "- `dataset`: The `Dataset` resource to train the model.\n", - "- `target_column`: The name of the column, values of which the Model is to predict.\n", - "- `training_fraction_split`: The percentage of the dataset to use for training.\n", - "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", - "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", - "- `model_display_name`: The human readable name for the trained model.\n", - "- `disable_early_stopping`: If true, the entire budget is used.\n", - "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5caae7fc10d9" - }, - "source": [ - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "IIfvPCGYyFCT" - }, - "outputs": [], - "source": [ - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=\"Age\",\n", - " training_fraction_split=0.8,\n", - " validation_fraction_split=0.1,\n", - " test_fraction_split=0.1,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " disable_early_stopping=False,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rYirKB_9yaa0" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "KkgCdQQAyZP1" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3f4d0c17150d" - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "581a188f0453" - }, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5adde0951eb5" - }, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ktMsqtibAUzz" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = \"jsonl\",\n", - " batch_predict_machine_type: str = \"n1-standard-4\",\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = \"\",\n", - "):\n", - "\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " EvaluationDataSamplerOp, GetVertexModelOp,\n", - " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", - " ModelImportEvaluationOp)\n", - "\n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size,\n", - " )\n", - "\n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs[\"model\"],\n", - " job_display_name=\"model-registry-batch-predict-evaluation\",\n", - " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationRegressionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " predictions_format=batch_predict_predictions_format,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format=\"jsonl\",\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " problem_type=prediction_type,\n", - " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", - " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", - " model=get_model_task.outputs[\"model\"],\n", - " dataset_type=batch_predict_instances_format,\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "a712dfa762ee" - }, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_regression_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NOvOMTEgCVcW" - }, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_regression_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bdd9e2fd6841" - }, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "8f17c5c7b3e3" - }, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "1aa7d7bbb1c9" - }, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if (\n", - " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", - " or PIPELINE_DISPLAY_NAME == \"\"\n", - " or PIPELINE_DISPLAY_NAME is None\n", - "):\n", - " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "90f424d5dca0" - }, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", - "- `prediction_type`: Type of the prediction (In this tutorial, it is \"regression\").\n", - "- `target_column_name`: Name of the column to be used as the target for regression.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "JeSiA6-TSgV8" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", - "parameters = {\n", - " \"project\": PROJECT_ID,\n", - " \"location\": REGION,\n", - " \"root_dir\": PIPELINE_ROOT,\n", - " \"model_name\": model.resource_name,\n", - " \"prediction_type\": \"regression\",\n", - " \"target_column_name\": \"Age\",\n", - " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", - " \"batch_predict_instances_format\": \"csv\",\n", - " \"batch_predict_explanation_data_sample_size\": 3000,\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e8dce0638349" - }, - "source": [ - "Run the pipeline using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pdHib_yUEuEk" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_regression_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "625960707c60" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U2zocUvk2YVs" - }, - "source": [ - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "mtHA8rhGGQv3" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if (\n", - " (\"model-evaluation\" in task.task_name)\n", - " and (\"model-evaluation-import\" not in task.task_name)\n", - " and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " )\n", - " ):\n", - " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", - " 0\n", - " ] # ['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e69f183f902b" - }, - "source": [ - "### Visualize the metrics\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "b7c5e5c35ee9" - }, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " if (\n", - " i[0] == \"meanAbsolutePercentageError\"\n", - " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", - " continue\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(10, 5))\n", - "plt.bar(x=metrics, height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "c26ad3958895" - }, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance.\n", - "\n", - "To learn more about Feature Attributions click [here](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "b09056628b26" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if (task.task_name == \"feature-attribution\") and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " ):\n", - " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print(feat_attrs)\n", - "print(feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "4d9d6a82d826" - }, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "c26a2091f4fc" - }, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print(attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "77151be8d776" - }, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "069bf017e0de" - }, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - "\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "automl_tabular_regression_model_evaluation.ipynb", - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - } - }, - "nbformat": 4, - "nbformat_minor": 0 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Regression model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI regression model evaluation component to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a Vertex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML`\n", + "- Vertex AI `TabularDataset` (AutoML)\n", + "- Vertex AI `AutoMLTabularTrainingJob`\n", + "- Vertex AI `BatchPrediction`\n", + "- Vertex AI `Pipeline`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI Dataset\n", + "- Configure a `AutoMLTabularTrainingJob`\n", + "- Run the `AutoMLTabularTrainingJob` which returns a model\n", + "- Import a pre-trained `AutoML model resource` into the pipeline\n", + "- Run a `batch prediction` job\n", + "- Evaulate the AutoML model using the `regression evaluation component`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook. You can skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). \n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "IS_COLAB=False\n", + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BiVlyW5OUnjK" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bViYfWfpVAiF" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "20S9En09X0PY" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset\",\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A-QQkeUnq8Xt" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple regression model using the created dataset using `Age` as the target column. \n", + "\n", + "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Bxn6ATUXrET6" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2e7664fe3af6" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6cb41277f4f3" + }, + "source": [ + "An AutoML training job is created with the `AutoMLTabularTrainingJob` class, with the following parameters:\n", + "\n", + "- `display_name`: The human readable name for the `TrainingJob` resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", + "- `column_transformations`: Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using \".\" as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. If none of column_transformations or column_specs is passed, the local credentials being used will try setting column_transformations to \"auto\". To do this, the local credentials require read access to the GCS or BigQuery training data source.\n", + "- `optimization_objective`: The optimization objective to minimize or maximize.\n", + " - `minimize-rmse`\n", + " - `minimize-mae`\n", + " - `minimize-rmsle`\n", + "\n", + "To learn more about `AutoMLTabularTrainingJob` click [here](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3l691PEMZFdA" + }, + "outputs": [], + "source": [ + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"regression\",\n", + " column_specs={\n", + " \"Type\":\"categorical\",\n", + " \"Breed1\":\"categorical\",\n", + " \"Gender\":\"categorical\",\n", + " \"Color1\":\"categorical\",\n", + " \"Color2\":\"categorical\",\n", + " \"MaturitySize\":\"categorical\",\n", + " \"FurLength\":\"categorical\",\n", + " \"Vaccinated\":\"categorical\",\n", + " \"Sterilized\":\"categorical\",\n", + " \"Health\":\"categorical\",\n", + " \"Fee\":\"numeric\",\n", + " \"PhotoAmt\":\"numeric\",\n", + " \"Adopted\":\"categorical\",\n", + " },\n", + " optimization_objective=\"minimize-rmse\",\n", + ")\n", + "\n", + "print(train_job)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "391c51c98647" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "454f077b984e" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c4f338cdea7c" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "de7e24205889" + }, + "source": [ + "Next, you start the training job by invoking the method `run`, with the following parameters:\n", + "\n", + "- `dataset`: The `Dataset` resource to train the model.\n", + "- `target_column`: The name of the column, whose values the model is to predict.\n", + "- `training_fraction_split`: The percentage of the dataset to use for training.\n", + "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", + "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", + "- `model_display_name`: The human readable name for the trained model.\n", + "- `disable_early_stopping`: If true, the entire budget is used.\n", + "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5caae7fc10d9" + }, + "source": [ + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IIfvPCGYyFCT" + }, + "outputs": [], + "source": [ + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=\"Age\",\n", + " training_fraction_split=0.8,\n", + " validation_fraction_split=0.1,\n", + " test_fraction_split=0.1,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " disable_early_stopping=False,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rYirKB_9yaa0" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3f4d0c17150d" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "581a188f0453" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5adde0951eb5" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ktMsqtibAUzz" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = \"\",\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=\"jsonl\",\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " problem_type=prediction_type,\n", + " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a712dfa762ee" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_regression_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NOvOMTEgCVcW" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bdd9e2fd6841" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8f17c5c7b3e3" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1aa7d7bbb1c9" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "90f424d5dca0" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", + "- `prediction_type`: Type of the prediction (In this tutorial, it is \"regression\").\n", + "- `target_column_name`: Name of the column to be used as the target for regression.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"prediction_type\": \"regression\",\n", + " \"target_column_name\": \"Age\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, you create the pipeline job, with the following parameters:\n", + "\n", + "- `display_name`: The user-defined name of this Pipeline.\n", + "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI (e.g. \"gs://project.name\"), or an Artifact Registry URI (e.g. \"https://us-central1-kfp.pkg.dev/proj/repo/pack/latest\").\n", + "- `parameter_values`: The mapping from runtime parameter names to its values that control the pipeline run.\n", + "- `enable_caching`: Whether to turn on caching for the run. If this is not set, defaults to the compile time settings, which are True for all tasks by default, while users may specify different caching options for individual tasks. If this is set, the setting applies to all tasks in the pipeline. Overrides the compile time settings.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e8dce0638349" + }, + "source": [ + "Run the pipeline using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pdHib_yUEuEk" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_regression_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "625960707c60" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U2zocUvk2YVs" + }, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mtHA8rhGGQv3" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", + " 0\n", + " ] # ['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e69f183f902b" + }, + "source": [ + "### Visualize the metrics\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b7c5e5c35ee9" + }, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " if (\n", + " i[0] == \"meanAbsolutePercentageError\"\n", + " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", + " continue\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(10, 5))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c26ad3958895" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance.\n", + "\n", + "To learn more about Feature Attributions click [here](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b09056628b26" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4d9d6a82d826" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c26a2091f4fc" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "77151be8d776" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "069bf017e0de" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "automl_tabular_regression_model_evaluation.ipynb", + "toc_visible": true + }, + "environment": { + "kernel": "python3", + "name": "common-cpu.m94", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/base-cpu:m94" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 } diff --git a/notebooks/community/model_evaluation/images/automl_tabular_regression_evaluation_pipeline.png b/notebooks/community/model_evaluation/images/automl_tabular_regression_evaluation_pipeline.png index 114dc4e4c07e133f2776a449916d17530a16afe3..9055dc2b731a2108a904a5c7bfdb2eff0e73eb83 100644 GIT binary patch literal 33828 zcmbTd1yEe;);5?N0vt$y;0_5c!67(AkTe<`8h3XZmjridpm7N9Zovue1a}BD614G# zV6!>*-tWKj)znnYbQKMo-u>=Z*Lv2oo)z|4K^hB_4D-p8Cs?3Q63S1W{AKjy$E&A8w7k}g8-)88*Ykr)QHeHdQN{|Wmms3D=o zL;F5ZNu8Ns$m(m&Ux9J&@SxDxcSNKni20LMm2QMjG=xwqu9Z%EA9=CkOSdB3+zgHO zrpj>P3I$$8{k;MlVUK@=G~rbTpyQPE68Djsr#tGC0B?OhhOLJJ+C7hV`Ue{S z`!}1B1n_#_B@qeyo9R<>C@%0q#lYvHp`ponWk@3hd`du9Au+P%!TIl2fTl>1^owvC zTw35JOuRALf8F$-Yli>VCLQUG?#BXt2*4wGj-CT=W8XO@sLUfsfX6DuI#B)lx&OZA z{P!dNpPLM0#XWPQdhy>w47>cF5uW{fDE}F8*uNf8Wep8w>b$Z~^o0H}aL~(j=mCb# z^yS>IIP(E~T?zZESM=l*lF@|@OhL|*iH8g!3T#MQLp#;mH2HE`Oy1hT$euv=#4o!b zlKARlxJ=KvAiI;3({w&#A1N^~CM-C*+6KL2Q(1AjB*;S$ijR+LT`ho+oanRqy-K%i zRi9f?KqXZ^k|NKrxxHO-kvXWzY9 z1m)rT!v@at8I8>`<$b|=%$A&KqoHA8mNROg?3l-8_xH}iNQlEZ$+u3$e9{h zwQyL$&2%n{=_3d1kJ~(rFE#V^m~lmgpeARI5CXBs?SUG4BS4vxNw=jiL>)x^xH3<_ zF&dP;U?w?t>Zkr|Jp`EOxUBObMRqM*WzO$OqOc#w8!|m6na_PHp4=S#=#bWJKC2f46A}^%R#J|woesMr`xb!hM<4mx|r=y*;aC@BPYZbA%YgYs}d1$EwfG@j1_U@e)r# zR_KiNY>x7hMwq1e>td;99{pHY|y8ip?O!1<_2th@$aw>!Z z)!Cdhb(eKzN6;3ldJ&{XaU;6Lw~`(G91((!=ulkhK-U>R`{m=;wS;I5SzGgAk}_-?y5BjEVUWQS}ml|0q(D{dGD24{+#6+4%Y7#F5b z-wof`zJ$ayDl~EzL+hh4ZKM=F-Ud8lqT8oB8#k##=j+|pDyyZ|!|9|DMJc_BE~wA- zXfY}^m8|s~xf2J4M!MVT6S4n6yTPIOe6qYMnC!7PDB!xsMZ#ea7PV;I z@!S8*aRXz@0da@3F%d%bd7q>0RwhBX<)!`iBk--wWp7b=d3VIPc)-R6-Y4rXE?h&0 z@n-3v4?Zmv!mbCZ$hNKRv?{w5Pm{}`&gB?!wWZh{L9-zONESRsT)nORv#NP%NNz?$ zbj$VwA@zOT6M5HCXZZ*+Hs%|v{zmWzSkQV(K)SV0m1Ux!k-QeJZ8es)+25!M~_%kxrDa+DCmUqxu{?|!P+OumXOKU}tLuOW1XZrP6sgdGs>tp|h&t2W@27NS3>=K<06j1`B0`Tz zdd-EFXgRaE`Bgj|3})X{wj1{3Itx>7z4^FOrBdG_=b(`u&RIlAVk4vr9)r*;;Nv4Va|E*v`_n? zPuf|$ujBq22eRAlqkrn;9G8$BsxmI{97pW*!Qn{IyjW7VgL~nN?V!VzNO3n0s*-*U0(wYd&`)-(k&fl2ze~8J+9R4H=sLl#IJoL|$+$b-L@!+}%Af}X{6t*i3SjJK@>hPKZO22gLbfSy& zhQ822_bQbKNTN(L!*-NwedI1^zF|_=-NJD=xgDb^_t>9@>nzx~r;Gg2A^30R=R=ZD zEvjX@*ad~C5Vt>_p5TaXa?D4K%JZHFf zy7WjG-~r8-iT&WVaa_iT`=8$`{C3U%D%ejA#`}(PB zy0DnS%28xMQ18{%S>-WOWAy{r3bP%2E*>dj!~|o1(9_j#vZAc3gU2{EujAq${ze)&2$P6mElDhu7j@Q5eSFHoG zE-k%ru@jhTvd7iF{kcHI<>vHkOICJ|vmj<+hNfZt3n5NsS{Yt#1ysT&8}?b(;d1xN z<_8TOBSVh&zl4I7soIWm1t`0ClT7@c8+1VqIb zHM$F8xDbjKhOp4vUy5@kZgCZZrmPypvsGu~V)FEclae0W#XJJOrRb0X>q+W^)ED*@ z5^`B^8b(HGT~DKQM*t75N;vD>2W{`{luDFoRO-Es3-y4)SV2sAwaP-%^%4bnhv^KL06ifo4eFd{SQFMs=O9KftQ2n@`y|i4ddP;wt>W*Ar7Y_P9KJNAV`=Dl=`9^SgQDvpHDTJ@Y%1X!a z^srKP^7t_GgYv2nq_?YsW*v5(tTjONYzK-avMlC*k@P zr9D~1W*=5bjb9u$o*fD_KV&1Br;#0n8_K>eP!rGwJzG_Ne*T`Po?a#^?i7O7|N8D_ zWFw_H;0xoZ&xaK3zsAaZw$F&|HL5P^J!`m`RF{K&icuu}8OhJi-hKWf>gs?Pje@`K z!sl+qH?Kz&!8n&+wyUE4+w?O>3<1?V9|*v?cWXsDGS8Do7#Ye9|JaI3wGS@wn)_n; zx_LsKv95C%wI<0JB33aA7>)hIG_MREk>Ha*9m-86c(Wd-M0Z;eR=4Ys0M4|6r4EhaN?H=;ed>-j%-ohE)c(YIaNJoGrP zC0pi+(2YNIRBZ6pEj<45xE+A_p#E%=%}YYI=e|dxY%2Ln5dS06C2%Zs@_LuU4J z84>8}Olb^=5dYC3UgaDE0-Bqye@8X8{em(7C)x=0SBOr%^~-Xu5|gBE01&j&&p~%* z^d(d#%*TWI7()I3jc=5Go)AbAl+XV=OncmyOuzFaN0@4x@gzzA)njPDf}nD6&ddML zA>I3aj~*s)wdB7dL*N+%M#~-0OnpUPUFO&qRA;VDYnPD_{)SMcIwRo259gJ+I(rem zX+Eqf!R2F{nm2{*HZZ7_RBk@_48zUe?KPprPe!#{EBa z@Y$17zvvBe>EyVB!g(w_pygh~0a*AU^)6rSm(5ikH~EJSx|~kqIXb1-UBXON-x|$w zb)6e;Ht0sw)lPW|(3`rFTE1^J)r%gbUb?qclvC1a$028M!?$VL_2Jh`q6>OzwCf=| zYU``D%IVhL`(YKzIhkEVTsbdu8YNz$M%>__`0Xw^>7P?mtP&k!>&>-r{CRtRh#Mpf z>@U<5z3llz*Lq$)zV%>Ya$w5IxM zrz%QkLgH@TEH8(~OE-uzaI??)ALz9iyiL)$=_TUEGzWH;zg%Sz9Y*bfO^G!$jAYkS z2v^w96^hs@|P0TmLoKDis_V^uYQJ)b zhpM^O9#m-CkUv6_-@*y%MgAo9JDinwD3yP%(X-R{o#A6~o%J9txe#iDe6p|pGcJs1 zSR7l+^uea~xeZ&4_SHx3Kp{ts^uhTdJM8s1H42xzLB#r9r-ZD> zNgZOVdrcRcil3m{?GL4MWr7D619unwdl5T6rjGU)XWo>5-kSKhmD-i`ww6bnoR+xF z)D#%R^Gw#B6S07xLet7KRp`;WI6tZ{6a+3sQRZ3FzcxVJZzNtv23&*z%Ufj<_v4S# z^k&q9#lcU5J0DZmL~jty_9)gAp2N)uU0#oy@2gtfr7ZE=6TQTrp~Wf%23>)Udj|fx zEy!OrS))=?QfR(DONaBdZq(Z%m&9-G)*6#tZyhU!BO_f}FS6VmkQt5esd2IXsd16Q z2IsMb+A5pjOeug%ikjgqwQgoHHx zR%a#Mtrj{|xwq}yx8L1b{wd$95&dJ}b5{f+EklSUX-faz(h@&5DZJE+rsSI>6;03m z1xfruS~C{>+U`{npo2XL=z^Wo-x_PxBURmY>LYrsb+>yrANeKV~{N zm6%zeWOqwTN;|KXNsrRA{}opLn2?t5s@Yn5EzBz0i)cdn-X0b+I;;wsmoBx7+#NXY zDL2qB*+kGu1O-WsX@`93qPscS*Y?}haUQ@aRfkh^wE55J`)D2J3bzr;`5mn-dqymd z4TAlV+t!sO68qCUV_nevhyCMy%ZME>Gw#&%xTCYnKZ++mH+HaPyLx6PSH5q>-+cb; zbLF+Dd%lKrTG+PtvBa2ya>C~yx&-#JLP@ zH$#V-4qgnI=^6f;-@CP0{}hlvhSKg^sdelS6*6NZhdxUlHDs1cIS6S`IU<)wS79QZqK&Jk^tv&V zqf9q~-~7ZQMNdCvRraUd`g=px(5bKe1d!>q(?ZC(><;s{Ef2aqwsFt)i?<$}R)v<^ zrtE(Nwkp}K77Dx2y?09RL-}~BNiwPEQekOGA%;E*E-Po3VK5|M2A~ds$tmH1%$!M< z@N>t(i4*r<>9BaZj=DM`4z^;IIfMWk2qG)RcWmZ z?)IkF0;s@Yiac;nUcJqanabA-8!eLDpLOGpoHsyoo8KpL(N4=>>E6s81=l!%C*Cn6w;vCMu?CAbg(~8 z_KRGGweQD{S)lnUy+dljU3oW{K71aTFRnswkkfQ!+_R~d&8AVZM7fT*2S7XLKkPK7 zacy++S323@uUOE?710N-O3-}S`qUByHIeEz?+gnAyvsz4D|s^gY0g0Zk3SOcXJ>x= zQK4d1Vly~0dDreDFUHjYq4z#fI&t5dB-knMEG;VQ%Ap^2yk963oTI4mLiipm*5dFK z$U`aw^UC|LCx8A79VdFzl~OqNPDiHEg1P^fqq%Gr5AwV|qsB=k_}1?k!=O zia4`w)oS0tJ%Kw)39AkOp2URTk?lYL1X_3&EgKjxQ4cNh<%54{^i!%Lq}h|DK*IdM{&Z5b(}SZDd!?3a^5UdU3u-i_-v{Sjyn2^B?V7x@7lUD8 z6?X@6HJ=D%00*&R<{e9IL8GOic-|HuihLc;_g4R{M@;kfc%x8L*j*%&_VykNZ1u zHADT2vj|lu%eSejPV_NJv+PX?cVuENFULe|(cW6z@VSosec@SBVy!C2;^mWd@V3Qs zHLarU#ggk$Uv)g=YkGREHk^|a$HhyTm&TDwpuB^CScwAzC9c)1n2bMr-Q%;;Evq(WT)|eSG2`s}5$g~r6h0XX!G2LK zI2LLz*TbY-MVZ7jGPoN-CBRR*c2}Kb^;Fn3R8rpX^;syYOeXAeQ#^a;bV7owwliII zoXH3|NXn@`Pok&4x7ZeR3WdQ+X0P!I=>se+nTv{wx1e=+iEF>?b zcd8jbYu60jOC^JdE9q&4^#nenLl>6Oa{WB1q*|XgeO%n(op_%0IP1-u zrC4Y6p_^a#dNMVPjkIc@sBVw)n=1OqE>eIB0U|4Y1#`50tt%p8xRhvl&SMbTnB3g1JnvRRA2ow&6 zGJvA#KmaLIkUB6lqIy<+4W=kiC4sm0pVN!-T~uvGV2}jy0D(S%(qm-O=e_fyak`vk zKkZ};i8Zfo(JRl$uyJa&&I6}>GGX#`mubL$a`kMb}RoUIwuOGxfee<1>KYwJ?Xt9oPJ9=XFOa&801`pJFd!s|`T? zusF}|_9K};VtwDag~3>nN8+Lj3kmk!w^31_WI2EB!577y1N7S5_5Pe5UNZ15q;bJR zyYaa)GF?3M@E)Vif^o;Vc=#QK{v$PqDz|F!acx#!hL3qs<+uSe%T~R5(Vw;*eKi-0?7Fnb0lw? zruz*J$2oeSlxV8_g$CP9&x1`G#Y(E%PC2%;>yUTzYDrDqMaDtAE)%2a&yAwBk+c?-3{IZNY>>9_<@ z!$UFhOW2~rm^*_Jf#{=y{`;=9h*8g}SbuI#tqPwx+~xg*8V03-k78=&Z`yZSCsoZ$u8^utIRSiEND9d&H zKZNs93i#kD<^A$wrU@4qbOiSMF@J$VpTB7RH?#}+(W~AtjMpiF7`acYYG23aya4D; z;IR;>5Fk;IOyAT5ct~lt zsqm4)%iA0wKI(Z6b2jt0EA4R|n9f|L#MZiczaVz{_2Ai@~|G2&)O$!Z>LE(!YpOA~KeRaQ6@3qCuYCD-8keN5cX0dM4eFsSPgMuyG+}6^Pv$8z3cvAC%AEe7<@4#X=qGC z|4dOr21P)Cn(I#az_)@tO`Q;ufTtblaiJ5hg}AIW28-FZo+v8p$vh$)T*Wd$NSn{I`$^L%r2L7$j>#q3hdP!3e)lCyiVSRh;EB7&3VNm=tbE&G}-xmi9SZ z`*8GAU7gs~H=Yks~UN|!9D7S)f)>z^_=8ex3OUYHHs!!eT+HX zj)0aa+`{j!6|FHn;Y;59&%2eT%H?QDalVq=*M%`v-|Rh2VoO}|>x)t z)I{ub9{D88Ipx}ISc-^6B_;g4{RaO1RkpOR+cb#9`7Tu(IYx-X6%trRq zzJr`!>6tnbTs7n0Pt-4L{}jUveOpf@S9iZ)oW8Ru!V5}2$Hu_pX=?K7wHOr_7uO2( zpX|W@7G{M=>9jZ0e8OWGZpaXhOGybEspCx^HG@!Fac$yAT77OkTH`N(_pFQZ2<|P{ zV_j{fC5~w??j(@~s3FY+?NL{MG(+v=+p!k)jcsZ16+B}%uR8dZUZ2?OD9dCAUNmnLq$sswEsfX=Fb4?qr3OLuUgjF^= zLK-E-_!>sj*wEWlk=|{C6)$Uh1TlV8$>^+(}UGw)OVFnWk-JSndhMoM3a)YIt z`~iq{gX?I9oz!E7!B{#;fcAH&*!2aHzdb&Fe`q&0z**gs8vdk%w`?7NB^-UUgp#h zyd-@{6xa=5=#7v^D`Vvimyg=7n@`YM%PGaZb|-DBn{hT6~y;*x>RY!eI>s$9OsvH<{Bg zJ+Ut4zwX{?WsF!Jy*G97g|qlQ!Bn(3c6^MOOudXx@nM>X`WpY; z^_cLE&4p%DWjn`dde+ztMMu*`lKVNwLm?gu7i5UwQH>ReuPAcR&S<@)TYc)a{HS>9 zVbzx%xxe3#-4h-7q^J>tV5Urwmcy%P*%2>*aw*mZk9XSk>qVK!E^U1)F2S(G099$e z0fcN7O+5lJBuYDgm1M&FBs(9B!p4|+DFyC(J%}eHZ4DAbTe<`xvB%KlAUOWSW3S0$f zRq@9+0L@)sbs`gOy)yZeeTV&Iih)a&~(5fM{Opw#3u-3-}Ussh0RozS+FlMYzwJ8 z9YnxU>cnECE4rvr=nDr;GGL_8*-gBQ!gY2vmhx|CW_Vr_;c)NgyW2XGN@v(e>4jx? zy(u6acUuDmLJ(a=y5T%6xcxHmK~~Sc{>ko9ds4G%2Q&V`(?0Vhc;A zWg2Yp-!p=Ebam#|)9n*Mbln*YY4l|mLyOD-a1lJ_&3^BBHnE+yAomy8JW z$c-}reKzi)JAs;vXyCbSbPkSLQ^M`9Au&r7kG}p0b}#~OGfY4E99;|eW6$Z3Ebp-J zA(ox;k0u#CB8RIj)DbXtw^dACYro|pB7Hp%xbBF@B|R@MMd}BM*%hb$wmoG=M0XpG zc=Y%-hjQbV@)7)czT*4&K<#vXW~7H~7oAobr02yB8_k{_Ap8YZKv+}1&sX=%O0sm8 zgEW#O;Phr$>$lF-huLA=ggfrYF+NHaUEP5G%=?xG`_93sm~KFPbbJs|X*eBfNWxo1 z-L7}b#x8_>5lJl!rq||CNDjvlyZ@_nwxSq$WuwDg0z_k&*hGd? z+qRz+6?*~EQdp2I%y2d7xo*rHqEsAFuc4EJM{y@ORR2u+@?&3ojEE6Q``ryfIENUqzIVOmVRt#eobj*S zy+lQo{QjpVqRO`4PL6~hEU(j|;zenmDTLzaSk1Ej2Vf=TGDIdWGV}5a46cV}LgzMW zI#gmZAL(@En1xnI%VPUIxNUbp#K>**ffaUezv@v6ND8UaJQ}S;t)D+~Nzo_1QD;K9 zK^Vn0hpsJP6W!G^bI#L%jO;sed>y3QpngFWTDRRVUo_jm3ewO+*Z}%7`L>vvmAh+nwdV<= zX8jxhpSo757t^(2bkPirkFT54+77q9wPcNk*lFY8FZ7rCy1p){dv-A}Azk#WRKpCd z;cz($;N6RF0Jc}X@*`ByLvd3Xug=%UY2xR6p57TiORAOWuj$yXHP8w&;|ebN{!uoz z^=n-#{ur&11n8-=2UnG;dIFb~+^-UrxC)RCfT;K&%C#qjjgu^@;+dTOExkAe16Ez9 z5g`AV_zHyhO7GCN*SL5v00x#=f!v45FkZ0N#EvV)SPqZJ4-}!KbDQnVA#+O4W^^d; z1kR(kZ?dH$`2ZycS-1eKVo~$K2w2&9KT2aApj)wWi}UFfMJr+R-y^2*-uQ^q8BXsm zB_`9EW;(4Lv9dY4EuV1``PRMwft5F@ra%#TK0X1>Yhx>3GY#}VPV;7OK4q?eWEaxS z@3S*&+mO|{6Yn3~uVY>>l@uQVsbn7gL8A4fDl#|r znH(N3X!>@7pM&_p*R5i&q*lw8aJ=6OFl0B8erY@X?H-U+XC99> zvK}9yXO-+988>u&rSlqHThUkT|Ir3 z#9dr4y6SZF$?uS*8cq!eFA@D{JE;Q~mqVVn)>@Q%{;Iov%@yz4=DcK3&K556%%q)5 zXDi>;1@2PljR8Js@E^yt$;ukB%4b;*R?0OiLE7&HQTDo~-b(ECvKw9z)0PA0^3JIh!iO?Vpdx`ytfFRVS&L?OUzOc;ePD38z-nC z3Lfz(7M%}A=Y9=$pPj2pNO>~L+*Yklh6t;Zh2|e5igKep%BT`ALhdi_?|GkB2o+do zr3sh**ls0quW6iRU@c*_o{(iV|8vFZ6G;LDWEGFw&g@)54$5hX<=l}FC^sfv>{q_@ z$8+gmRMW3Bx>*YSl$By$ohU-`hC5x!z{Q?CQ>bOpgVjkO6-xkdp*pRNqjvt#T7-Ew z92m$(O zrwNbCCpscModU88$eK~%KVtR=6Y2DzE+9NuR#s(&xtUoBEz1f%XOkX~(X8laFg4OB zMrjE@oYyr{DSVsyb1kxG8#9mbiM`!k)VOc>`;I!ZTf1NsTCYYMX+Lju^iI8HX;0^%LceB1*i`%WTqa%dppVO~=yr-y8~&o`0%&jHMFHKqFkg z)$W#M)y3Zt{Uc3-jH_a8eQDC(i+!C)+&PkzpVGKnEk0E>$-tZ%(d6=$eGs9oL3%Q3 zANl>v1V5U}po>ps^pkvK_o--S_{yk7w4lqwnBAK5uGO+nhA%aC&tP~bbMBil zzU*~_e9HaHrMmI)aCmT!Q7fztH9pf|H!KN|M{;~s=*djhj~FiK%XQ;k!eA2mM{;pA zn5=X3#;$8Nz}2>zMsJuc*g98=axQaSzktKdqh4tmq$gZu>LX3^l$UdloqWn1Kom(y+3g#j`bP zaI{XVFWBE%-U8e*7hYK(XydK_TtTEO2P_cqLv(kN_|Em!qhRkbh8#(l7C2V z{+VA_?TG-^9 zxtEsexSl7#7wdicipOZ-Zz2bEZ62HGenP8zz`LNyCqx^wLl52f?7bBVXRcrsTMEEMR)1ShDTVkf2af;l;Mt;~# z?ltp$qlOn|WYu2e<2Fc%pgg5vBaF%MC} zlSBckBsU93QQtmlESGs{3ve1=iNBjFk58`}+pLZ--fD zXi5YIqO>(Od{zbll~+qRSwSa&RDpp_<-BOW+`_3 zRc{L6B+XXn%Sa;9EoUYW93vx+fZU{3)g}iZ!~CNF=Ymeb)Cey}>~R5ZnI|UFxq=>3 z@x%-Tusq#!X&!nqIR?)EsRoTB<0z?tC-(M|OqHVbzqni_+#SC}g-0S?bbM6@;=)QV zHCxWz-5v!(bC22{tALr+$R=xYE{`uybDagIWa;88r8fr#EnE*?K36~3)a1%B*Z_(F zZIhA4C@;{egt`BcToN&AZw0jG(bQIbRWAikktZD1j=j|%05)z@MEO}${Q&Ja)KJsF0aM(eOWx>Qp`ByqjqcAT$00mq$skf4M{^! z#8g@D5jD7R*!C)?^Ou27IOY-IClAy7234Ru;fKws*t>xU=gNtop27td94eC><(RjC7Nc>hok3knMEP2$5NBOD@SaXFZnJ|`-B zf1sfl#WpcD6Ux1>UO1d@d3Z7c#$q6Z0Hp+TL(BdOqz(7%A)ttJI=TyBLzGp0b6-3w z63+~XgkzEQIW5=P`33|d> zlu3LDbkukEW>G2H$4>7W*Uys(Nu8q*b_8Sft%FbZ_0g!Zdw+j7ve61^a^(8`qsLYJY=py7f|DIP=xx zcP8%Y?3)$Za7JgWCv-&Sni_oG>`QzFEd4-6`CrlUbF>Ud7Lv)r(pfy|ANykl{Z;;$ zj<~m^!@dx+wM_`W#zWC6x`#N0HbNDFgL^^o-)if13JyR{FYA?Q5#{==)}Uzth9veUAr@Hz59ND@3v!pAp7dFLl~~t)Mn>Fy?OT;5pj9%0_(Wx z7l2Pt0JY(9X~{qQf_F`<_qTeCIuo{A(SwA2`xF-U6Xy=E^#*P}N{EI4Hd7*ggUy*` z@K#y9l<+!8Qo!ik>&Ut=!pU9T^mznk%W`eq~Td;@!UQaJG~j?!TQ2NM!!Il=Z&KDl{MDN}^S*>ANx@RIR!oV`7{oZqJ2w#LJj$7u z@r_;E4&`!&vHP3aSYI_FyUE*e*zleJc10?@z%u<6^HgkLQm3OOKTWwWf^b}Nq$p|T zCfjZ4lD8#*XJ5qyurr$`pM;el!vp!TLzoRf4a7~o6)C4lMkx!6{p>-#p%G*~4kB1g{0q)hwszMVGRwTGGt2Rb56Gr=t7s-?~ zZ5pd7d8nu!cHYX%yF0ySdvq@2zBca+6IV#{Kd+CQQ&?{L#G+n!&DeTyY+jUB$XXKC zsTCc`(Cq0w?zaR03~dEVQIwsE(M`tE5h~QGlX9phu{A!0_f$@#KXA;*)2;}VD76F$KL2mzkls_&%iv)%~y{FGoNJSyzr;u31}%+Y&+c(%#iz@r#uliO1DDJHI-< z*p`}h(iT}8AesuEBohWe!iynp!&q1u))2SVO;0uF{oZc_A3upn;0=N_@X0(+%ZVzL zENk-sDyu|vHcp2(LRQp8n!cgQHt-o$!H#u!3y93I%Rl&Qt`^THB0BN z!mI6{{sqn5hy#Men45BUtM_br*ZP#1+NvQ4KbF7e_mK=_8@8ixqQ7y0%Ec$wEq?!` zFI?r7dx|V=`f^62lFv+|tb&B>KTuQ%i{aw`vF=#TpckEIBUxbP5-_VmO;U*Nryi5* z0?``9Wg2ugw!CmZ@$wOuaXa?Vz zVB2bUp08JvKy4+;!Z3pQ*Uw4a7(J-Zksnq=!iQv%*@zykJpz|R+wI_;o$Fhm1n{pw zHY0+D2Eqlk=aV<3dlRh8XgC>3fm8iGfN;dkQ!<;y#ug)QFBMGJ(_gxOulfmt2t{fu zh11%G!WZI1{WZCYY%&4gp7Y@q^G0%*53ZZgOzjBr+!I$ zNdJ2)jzo4e@I2SI+e^oH6@Vb;%V5AL$=W!wlY@Dc`-_%6o28A-2T2IupY1K0StG~Q z(*1B*StWo<)g#>vsek#bCI{MVFI`Q!6j27K=tX#!0$*!My%>AjMbevSl;_=d4@JHw zLeBo!sNv?*6n|N61Pwe~=%6kR1BFIJK-!4Eidn2LR9ZWpn2ePvK1w=*qUvwTu6!E_ z%?=-GBWuaxtevF1r&ukKLm;)!`KA^YHmWBTvEkAz-=-5x@~j9mRAp*@OHU0>W?+_5 z%qy8p;qj2Okf+9#tARa@V6d0EgqnH2eY&p^0rz^p5g+qm3F_)uCW3)3wDNhCYc>s+G#?dUbCm|>=H|H%(-6k%DeXrr&;DP9y>(QS zYukst5ycG>n{H9MhLkQPgrU1Tm6UE!329{L5CjJ4?v`$lZjcy|mTvg2aX;_#zTaBk zTAzQhmJANJb6@v$o#*j8&Na$gvR{Nz`m%Ji;p#G1W0Zty-pt!6<*f@*lwFWe9a;Sd z+1}tYUO~dq}(qRTdufr)8hI1(gksq$>ga=9N9#Qs&$8jItaSE5;_+B`_}izGV1xc_`{=SkOLg; zjt|*#)?Z&=&y!OEjFlI4i=y0jO@6)aQRH*5_m`EvOsjynDxmHk9(L0h-Y1L|eXMb? z)I6EPFj8lTyF;k1q7u_EI56PcwfN}h_OM&4sh~xt8s>Xz#w5FgNg~Ghh{j2tqy%zw zxBao&U(kQ;=(68x>+*6t!_i92XyuNS7{n=ejxN3AM8&smW2OsR?L|tH+rxUX!JsI z-L4YX#Wa~EjbdiE$5>q>GvvNTr|0eSJKMjq#9p)SczI4QJ5wLwfGy51a0vdrh|N-T zW=SEo@R7#L-3L?K>X(!O+hVDEkRi%t1E#AJ!MCP^)F-yvTGdRy)g8O5=yz=+ zbwQgpIP2b{iw7N*Pryfg6uEH%h7^nx5EAlfY=%af3Hl{HGP`n7dazN_80O8dK5RGZ z^dq=Sk4!`B8?EV>E0aMIY3Pa;=@GXmEyePasDBFrUixn%QSZm`$_%>P73@AusKQ*gFm8;=3xg4gKC%06AlH5-Q z^9n4EB?P!l`jR^E<|S|m@NLYr$fxIfgKtL`9=tjMp6QcKXVT$KuJM50*+9i3<}D}u ziSsQ=s%&@Eu9@aypCc|m9oi$De`^bQdVTz5U3w`Kr})ujl3fq8M~(F|Y0Wilf50ke zu2VErd0(VvKrW+0Nu&;oZt6|^FNPg5c$=C zx~iJf6A^s@tiofwVaQ4pfJ4N6)3Le(s7h@)j4fVq!PZRQ4PVr0qLxUQ+&^4fp)BRI zFzFhmC(sh&uqe}^v!GT#i$2_s1~;_$+xsSyG{@0N{#5QfiL(UMgOa!1pKZB^Q-27e z(_j#gvbb%nduVjKCC(h~w(IhwgE+A&poQn?xN}Dg#@_>aDu0V9#H)WvRVcyMMwR=5 zN^I~Wt^lilOcb&N4dR-1uveEx_HVvx^mKEhXFr`58xI+X4()N2fM{3M;~tdSzl37t z?nIk+lb~9s&WOH5K6EZ=H$(M7sYFX&;iXu5QR>Gx@G}}D5V(C4;BTb=$wEk1kya_D zXaCIxQxszXDC?EF5zKQU%Ri!m*v)F%@{BKJ&ew;y@g^?HYg-Dfc1Uo8M5fAINCY55 z3^wHN+Tb21E0-utoxQdk8J|C~jgX&wveOlAq+jjY30|=n}A09YkHcm#488 zevTrx&)PA6|D^eLob%sMH4&$^1)xFX$)}JP*uv2gmF+1`+!R@v#}mEo5XL6SU18I- zr^Ux-3A2Z@(UBKjCl5EtwiyF5y`zpy1@`%qZFajHw%>Kfl#?u_E^Zr)8X)Sp({;2z z$*n)jG_S!|b_uDI9K1z%w7k%?Zs`-aGyK6^4i4)jY5kH#ko5TY>%Es6=Bzyi$7&xa z9ZKGO_gsVUdp#S`noUS&>@jFVH(;8RRHy#dN>C5hecnrYhTkn0UDgEZHS&*s|E>?M zyGhb^{Q`CXEXcpF5z~%p+;4ByGMP+I$Yyj%B8v7Nj?kL!tKDbrG3Ww_l|m=2>3)2C z$I43;C^}h6iZSOvsi|M){87IN_{qM6$TxHgVdSu$a!^Hg=yAb*7Y+Qgp z`DvM^BkHmWFf@30I8p~+6_Rvxt=ge2+?F2)kKfzd1Pd%!8Eg+J|ZVAIEK5{azl84{H z6({>S+9?97cc@>LFi9*v$x0yMp?rzex7@zww89lgZYDe@=qA*St^_-A&@ z;Sp6Y))5HGDK7dsmSwP+*ya)v(u+6SI$Lo5`BxMPmt=8nR4lgyZI6_qjjU32NNE3! zEavX?^yDO4I3qawCW;*1FY1MBKdU`JuI9#FG+9&e=JHCt@CVr?p>W!wTmi{eKdq zUgx(DMvtNSy&+yH>5^O1jbVtSk4(ylq!2;2tgEbx=zxz>?A{+316X?qY*_;&MHo!G zL$<^bRK4C*J|SS2peEyvph ze%ups`}yG$m6U|Uj=hP|j!-rW;E}=jWs65XMuI72S6Qzh1a^ zJT>W?{!gTt=efBS1^?%0IrdgOAnEe>-6FOUntK;4;(eY!@4j$mGdOCmrKxH9^4<@a zu9!TwK<~ALftL0soOJvU1hS)ME7{w9afBnPCJ6MMPvh(E+oUJXaut(50&N$uqoZR% z*5#>Pt@q+xd5?Ts#erttwH4r@a*suOu%-8e_b zmU#VIa0vv(=g!O~%}#no_zxv_7i)2p#E#Rv20b^^B2nh9i zu_@vc3%41*nd^X+g5%lhoil8dJ$pxOK+a}Nn#zB*X;Qp@)`-gY@6zTwqJ&cI%qi%l zBRjb&k>7Qy(TaQPy+mO}y(Ej7`?67Q73*~HbA7_r@YBb&nH!|Bjr#Y6b~a6xrn(B`pD?wmM&$^*OZKJXogBnY7f|e= zo_S+D-U0Q=>X5#Fy&QoO`WZXlM97cr3Ql(bR_I}e;tfL9--LA|?(dWj^v4B!tNhK6 zlrBcjZ(bFu@0FNK!?HU&B{ec-x{#73C_cl_xLBq3TmQRQ*>+({-wE2Lo7E@Ujc3xE zWY1ck8PVyDoz{c+I5Ll|PMzy;zsa&xyYWWtrAPUn=MS^n$D>6qg@ z;2s=;m9z}qbP9XqoXQ->l{`)n#N|Jal_q_+x3?#p``jk>u1cD%+A6AD_)2{4-E|Sw zz43ve4ps2o#EhlfDR)V*ex)KtmXG!nG<>;{UOeW3CEf4qpSlMbxFi{;P z%L&Af_;!af7^iCPgE;x5U1+iPWpklzFLgPkkb4dDb-aeg_7#YS7gk))tSK2I`Xvm4 z3FmO~@V`-hvKVMpn{QZOeWDk?SHZQs_-ix6JBs;JR$1BKVy!N2xdZ#P1Br#Vuyr+= z-H*p3UEhXMC+(h>LYJe2OWY3C6w*d@~1Drb;L^uc?-*+@hCiv4%`GxS9D7c263 z?D~BIFEVXn0O03ZTYU! z3MqN^_g7O!1%V1MvU-TY#?hD0lB^(lzgU}XalK`whleGCZW~@&A!KsnQB5@#zomEv zh4F<#+oy>}rk&5fxaZv_YVOo7zlB~jbQtJX(1|fOktPTzUz?O?b%-$ostvT=lMg4S zi%O{}R-3)g-Ik6Nbg}m{^de`ayUY+GI@>3r`>)?Z=e+rH*8q|-;p^bth|u!UiWG5* zxXNX3t+ME2XH;%2LxQY6r>Ei*Q&Z2GIB|g_fn#iZT)j596FxbxSihMTgv_#AlOrkD zZ5RP5DQR9wiE4BV+do^pTg#;xbHns|iH)D_6)!FJq3IG&8Rp~n^d zT;>>iJPLu6EBi-uTF7htYXitjFm^~IJ| zM(tdbxA^9z*M8Fujx`m+Z;Xr4Ab)J`>?+kEf0{XwZ~2OR+P%IIe(I)24xK>;(i<2v zbkEo~=Z25}`?2GDEe~Va>W`TU6tbkxmy~Z{$jO3n@+4AFmhNXq*Cz{ht`-b3HdXPx z-rZ2Ti{N>b`Z+c(D1fJHxEjQv0I!;@J8+^Id|>aRWZ;Eg#A)igZ@YytIZ*x?RS9mp z%mKlTKgH%jQ*B_>i9(fbU>F}C*Kn_jP;#a^o}!#1r#V}%&_d$DpW@9#(7k2D2kJNv zuY#h@>S~sV0iUz&sp35Wk0^~H?r4)7^~`3^K1lD%e`{tgh4_**0Opuj_8`` zOfnT=?4;uth;B6DclH&s?Z}!#e+{h(fwt2>A6$6){|VSvY!@`O*xKc)-o+5ipIy`|8%M0v(=#5zlezVDw^&>mc>{dD`!sa9)5ISw4w+PE@-+%KuD`*jh z3~N1_-EY1w2imgD;wkgy&gQo}jx8SE@>(@CqBrYuJh!j)PHqo+eVaWIk?xKlCR-rq z?&LjO$)1_n)@om?BR!}bANzTI=jQ(AsEn4i5B7dpvANqb+$Uy4OVr9?j@iskWSIDJ zLF{5ZTrl9|08t*=XZDTsM^cw}T{RZ)~ODH1;ngIy%J^IrS-2ya*W|JV|>jGAVs zU3m9Bn7$Sj-Qs(t%MRab_Hs><@{|P`l zQyUR`^HQwZAfJG?#>%LrW>;%xlU=_cYX8Re&UD&_Ut2q3Fv5lP#xl8Kq$k5aDB|SB z&1v7}q$pD4H-F&!5%lq76csx^zXHwUJ}%+Y8Xch=U zYvzZu%hPwaPoqe9AGS8#VN@&C3(U@4X=`})ojw^BHNc{`m&GbHsoFIs=3pT}ewFUu>GVty`^*&Y!+tv{XCp*a}ERs7}WmK@eISA7W9i z-5M!DCKm*0LC5CZ?O3F>VpBn|#3H{_Szn8jccBxj8P!2j@=Q*s!RgWeks@q!$Bo!# zYNDMwv1SbIi$F4BF zoHU4DdB28bFcv8s_AZpp<{(?R)Ntu_G)a{2V~mNKz?NmzZk7Vw79-VRgy*{7l?|v4 zUm88XX1r#*ra8a;xyG)pqcmlByx{pnU0}F>P}yS1Hva?AV-9@d&+3=|KUD|T?He^RqzM%Z3HXl{y4|m*vWi;T>%Udce>04%tfG3lmtR;U zVJ%xB_(PJtr=pj_Q{?pC;0{NL?9GQRmpl#De_O5rr0#U4P7EtMQQ#vjDbLJVZ_AF0 z-(yWU9LbZX7POfmBm%I(p+zDHIj zeHHO*SSF`^;ey(NxEun%VlnI-FpkNaw`>hTx zuW?a&zv;U>Q?I4az{GiJr4L`6&Z3il6eOja6Ccpx>*Tocmb=QXMDsc(SHm1=4y!Pn zo+07ct1GlL|FgHfKbE`;6XH(B4Ufy!=ri;3)aevJgAv@jIR!=L_ZAiNMsKd!2**fs z6}8T1t2`Ue`ccBWFeM^h(Rrp;-zLiC_tN2dT;D3G2Qd@ot6R5JC5D6!2EIrWv;HRA z59fKs%tiXkXZ!Qo#Lvw$7{~KaZBIwi)#`_1TGUg>k8Uk@=*%xs%fTtUQtjhRDxyEv z_j$m%#g;4{{q^(Wm#R-`P4XSyytK6~y28xS?+eV^O!F*Y?!il}U>q1>7f|xu_j01i zsE)%`s_<(~_$f~!q#J+fY&A5|GWb0YSHiLVP_4{*&ny0B%ffO#%Ai0$gL_B)rpiJH zn}~4bXqGsSiukDGYf;eyBcEn%xsQG?4eUhhTlSqax)F>MczZ(h%c*U;0?iTKRCLtb@DCS)1tC87ajSo9giElO_1m;=ZC?#eBwJ&Ngp1bdA% z2LbZ;PGw~eIlD>XCHqDHW~zJ~9(H)QFrq=Nr>*`Gea$DMEt4iRDQ@{ml~Yl-2=7!u zHbYLeRs$qQ>U<)zN9bUd*C-T-)`+B(Zv4*^LLNC?11(?O-$BhDJ*JHZRDpO6 zRjx)9a6@cYK%D&g_=}y8($)RCO_Rx#__(+{ypT_KE(`$fztJpTkNeZ%B|N2aoWqNuQVfTQRoAY8cVvN0e6TlyBl}Iposab1v!}8=xKdf>qU$U6gsLp2*seQ->i`=a~%Nfg;{M zQf)C%?yYl5+b2`P9h1PtI1d}CZ9e|GjcCrw%4ur8Kir%XOtxB;P-WB%&3Qfj`=*ynLb|-`~K#7w_tR<*uCPJ8szX6~%Yb8Hb$f`(||ef9(}Vi|Bzm z9acX9I<1bl$j)f}s_P=;lu~kbq34HG%`YDg^PNkQB>$t9VFVrKBn=5;Vd6v)vs)_Z zE-x~yeR^B$QmG$4DdCT$s?z6cZnQ1}o2H4Uto}^IerYl~hLsE_f{xuh zhfu?`WdssNRPZn_cTWx#BO@VZRqM{P{$F1jB#wapLc}ptKEgd- z<<6}=aAGxB+_$h?NK)D6^ztNkHy^0W`_c%fJO3f9myXcx-vbZ(g4@KM(^JSb#6nls zUaBaew@-BB+VpurwNdH5g$U>K%p3}vWlAXX;yDyg$bBr5eR1EKc|2q&&u@#zI=yq25oJO zrNtVPVuWY{dvLt$QNC3*U#877M{%vJ^d89wY9dg|e+UpLYP|_}1w7=K0HjM3=`jX8 z5Z6c$YYRASZ!|o|KodZ5`Ev$PAdtUHFhaVL-P8UrBaX|dZA6s(ml~SD06a6u!h(dC z<76bCNEWDuO+n2VbK9alDE6a2_1eo{`E6J9pKx6fJTgjhYF3z^UkzUcCQ%}aDX93h zLt;7vv|G7Bp8v2d1r6Rs{_fNfeYkq1I2Ub)9E+1tOu1yk&6&OD3SJ@-6mW{H(uOD9 zjkM<(OifM~?o703=;FDZaa$lk2fy14*&D=hg3kMVUsX>H8)%aJ7|R0{ldbvBm%n5O z1}0stjikPyZi?!GV0#wYxsP^as-c5Fm!=>*{~PoRp~Q*#suaN<^*1SbvE5~!pdfW) zP6)ng@Hxw_A3GcOPa_s=#JX?Qeb<434af}{p_!d)(vNv*Xz6{5h$C>VYxV1Gy*9b0 zL=;kJX=-&8h$!jXnpdg1<(AfvygRTeUOo=0ox^o-Tl!mV(I6^kkmtpVXKi?e^rfe) zJ+Qb_bJK1j&}hMp2|@S8rq8Fy(UWH8s%Hx?a`#U*xr3qZk4K^j^WXnS)9S&o@5bK@ z_NOL#vmYO$pC~KaU~e8HE7#zB!Uk=Au)~;Oc5f&A<~z^5fO^#RHxw%U@w58Z*(z28 z<$kT{fUV3@m|-v#H~h%DAD^UdeB~LvQUsFtukbdoO(GocO(~j@SY-zqwp!`N<(uONQze#rVx-{t|tiO46Y&vvU)0nmA!*fKPk{9MM*}(on&3}Ky zAE|KrG-e4Ugmr~9*VukNBs4_x#<2vxmm1o3o znH@)D~<8=u*HO<6yud7pUjGT)F9>9d!b^a3lg!UPXTy8%-!dOV~F#8 zPv(7AtdxWGG;#f&or(T6k8Ruzzg##&e|zMh-eW&w z6Hmc0yU1Cw^(SQCE_AA{Ww^QNQ5MCuM3jO|)x*vD(2vil8|WA7TPa2oiA>meex_eG zVCP0u7@M;-($BBzysBTuR!4L>Ex3;H%z|DLcXy}Y^k+U1m8iy#$c_^=8}4{{#kO$% z_u-6x!gu2&q|;S;T(r8tD?F)+?7#nB0?L36DuOSBgAe7wLI4^RE(%^Se@6YflZi2d zCJ|y{w#DjyR7S#QfIAl$XeR((^k1zl0T&={CU_lfX{B)f7ZUSW?yIo;y3o9DEig`b-703s^7!8J)u8$epy@o0bFlC0c zYna5NdJpUl>q$yx0_?QDGa2hwd?)p*m11{U%qy!h4L<#31x}E%QwgeO!^I84EF5Zc zR!~(3ZBGzpA)p0o8geGe|2P>|s5XU7|5xUF*9F9%Yzn+jfj{ryWj)|TY=`v->}}^- zp8?wVO}l4jZxDdbXn~D+CGros`~Nq*Bt|awi?o93g^+SvHhn0gPaPg7Qeq^0vli*f zBRaxE`tLRQ5OEIWHhtr^Lqjig?BPIiP+vD>!f$&OHY~gZ(hfL?Kch|<&67-1JOOyf z4D9XtARH_8V3>wz3Hb$xeM_PD)uL7dV{GlwuMu%c0$ce`>ZIDRo|ShO`YFvt$NDFB zzvgMS(Y-v!iAS8aTB_pAu1$6piAVO3%X=d|P264iHlczE6urR4`X8H`x+(8D)i=_u zCnU5U5hENVy7qc|yq&tNy*rEKp!-yEdPS%5)iYrYa7A%in(d}sjo<~E<`Q1wgC>vD ze<3DH=uCr2P@O6&)IAeUuf*GWkjNx>Q=ji*wp(v-2q+{K4{SBIE(R=X%?>`G{z^ zzcVTFr%Eh=#(!||r@D^K4+`@O)%w()K~=R1?X+r2wM@Y|9%a#ADMH#xU#Z*eU=}b1 z(BbM2*>(o;CQPjr7qYz#4flfJnFq%1%uo}VoP@~{>>!`W#r|5>qdpA>@rRz)I=yAE%CJ4ZL+PO4hPTL zS_)v}f)797g0uriS=dEz2iRF94~mL9m^q;B`wRN3O^A6OU)6V-Zdu1u&&l~5R6~d+$QC>;wzq~6bV2-${!j&TmPw6T!xzWs{@s$@dRW6pOdKv7Si{MVX6Nxqc(o^ z7QKF$=aWrgksNhG>K6oa&r$YA1b-beov}hv{&$X+d4%;aGIaVBheW=frw3>R4G`l_Di6Q>~Rb zeO}UFBkj(k=oC0;?U|i#vCIisk5%3@j6k}x={tG=-}bW`A1?-PWaL8@s=u2gXMP@0 z;$&gi=3)R3Xszw+@ZdEnNYX?QCD z#TN6k4Prqd$NZP)#W>l`{)uE~%7uWki!pNj3c?J?B>je@;E|Rj6&ce3^^=%O4+uSK zL{TzhSY22nGBE*ek^D|`aAf;p^I)J7_ODUR(ZR7fEj}S3tER=w`J-!ahX7Z=aY)eX zrnb2V?M)PteZeaaFp)|)ZtZ3fcSRM3r=p>@(?m#OK}aYwRa3~|Ex8Eo*VZq3q@e*} z$i#{3{zkv|Nmh>3QU@Cgi}%^^Ah=o}?<-Am$!`(!YM4vgD4`-uwv!}MMoGaH;C(<+ zJ#X79uN2T_>WKAj!`K&twwYq*ar%q5P_^o_fULGTc>LOnq$;k>=(#IFb(>HFP0L{B z9xm++1pfsRA{a_|tnJBmzz%zw6*u zA|T1ku(csCD=!JcaRqfJ{c>Vs5riaQQmsJJlv~_JI}K~7wiAUIDl!^D zt*%F?ZEL%05+B*?QV*4I@i$zH8F|rVs;enth@+k9;Xn7NY<0OT)pvwh*nDH` zo@km*d^s_fXYbjIN6>#F?L+_M$NwwbTVQIS0%-o9Jko+HaC715KvY~D<30X7TdP*R zazEz<$6Y2HG3XwyI{0UC5{pJpSsd&44c~4hepwU) zeBnN%B#|e1Q?9=({j+C+S@X@*i>_xpqY)P~66lFf-P!LedTUN3BYGQGcmX`iX9{kQ*ddV zlV7U2fJr~wFp(U$8wb5uLFhqf#}G;Xs~Q6Hoim;C=HBk{wZNF1x%%jT*B9jPn7qQ%z6xtx$FqmV3$oudq!we=qvIttBIE3VBF^I)+dh;`yTYYW7@znobjCCA2X*91JMt<&v;hm z_cU+TQgw@F2RO|qxWlY5t-}Z%pd>+2#X-DHUvC{k_{f16(IY+(S4s#FcC|B_<~=DQ z3Nv~e-#bw(DRLl8hc~%$ixLg>yHS>DgO?rRs*csD%eHSNp^>Nzkr5rgmnhm_!oi%U z#e%Z_Of;3!d55m|nE2i2e3Vmm(Dl+*_L8n-v$Y>$0fat`G3c3zdvDsKOv%4t*ZiEx zOw0P%3u0Q^xSHPFycA_C(vatcMGoCDzE$v|4AL#(HulRc;~dn3(vltsH{KT;#u)&I z9(L^0Pd?y%`(6j)Xjv12?}@}F_5Fe*_{Vt7iiX#FT5jp+$=y!tIlGWF;g+JSu`*2tcnu$@85~x~t8U z%lOa@_3rWuN`}Lm&SBGZ$FwT=OSzfzgt^IAR0JMQ@U>4iAnlyUqg^-jS0gIFMckv> z5&2GmwKWc|=^_}%znrdZZ|3j$Fx%|~m&Yu7$V9No9J|w{D;ooK?yn<6_v@sp3|N{F zD+fIFF!w>@RTPhQ>SWpbrYe$2L{eLJYG?!Sd|}7|Ddslub(E=)+Z#e@%?jLrN>dKn zfj~bBEuuqU$~;{EsC%;!CrR7GiGC2)&fG{A%e3juTI`s25Q$kaq7d1}@svYi!VK)e zW@7Q03Om0++^S`AxcCP`UNzGSaND$Q=%`4il=@o-BJ;Yz#U_+F(aj4bx9nxBD#fBiF3d%=ORSA1+)n?H(G9S^3195gO+TRb7Km@oC)#XeS(Za<} zHL!r20+v!5))YIA(N|BI)8(NwyAn#=NFFMV3TjUtQo2D^wG-clz3XcSu}O`DpN)sy z&2hafOyh6#_qYVALQ@3_@{Rnw{u0Zo!~h*P8-xL!;TG4hoLrNt>F4B5Cq;rV@Ec9U zyt;>d-eZ|gP!*z>)B0)@45hPF^*r6^{Q6uDE$-gRLd4HimD)7>CQjf{JXebx9(yYD zz-=|xAQ5{Oa^mA3c9fKO5tqH#u6xcdh+q5B#EaZKwQ3nT@#Pbe_^+pv%(X{j;;u$U z^T=FWV%v;AF66M(E!S8wic}iQR>VRoZ`a7Z8%O;?{Lt2n+FVh*hS1epi>l}Z7lbup zd`3Ggu5RP=lK-EvNZx5;$&A|;^kf{m*M4cjV9n}$xZtGvAZm))>(d#&#pM~jE5?`U z*2hG)D4>E?kUg}p71>Y6Gvsmb$*X8)7Ct3Ffz5`Z9adOnSMA})ROVEAHlV{|)q~BslCXsi5JvvA7h_qL)rqtb zA?^4e>(~cW-7Y2u){)Y(?4}Z(q4Eiap}+-o)RbLp{X?f5UG{weZeXkmSDVh? zG1p&y#IBxCbJsb<(7n1pjz4+&jz0JAhF=GhAo{|@exC8=eSUy~?vaH?X?;n)jso5D z#fY(0y77v?1KNIAk51#NT!2Ty#G_NqpD&X&=-xbv=WRNdKy6txaen=w zPG~sX7`KZhP^A#F@#T;n7!uwc?1T@zrhDJ@5xDNq`oZQnDZ0a-!s#nm^=Cc1}}am09Vg;0U%?03=T8| z98hA>k`50`#m{{qCF}u1qjyYoOZrn0xh=vcrRPbRXu{r7Jus%&j=%N3VCf{2%69`_ zA5h@B#-s<%+VbkZKmlQ1IW2fv;8k=bY|`~;#GcHO0T3|^KHJ-Mpb;e z2;;{ZK(QC8UxUU$y0Hs8Hpt|K$EiQ^i}s42A<;vF-*7eC2xvn?6iYogVoJ%{>ETNZTJH865jyQiag*7dpO?t zF>F(dpp3IfWTnFsh3@`&hT#AP*iAr>;$x&5g3Z7BffKmA<&!@HIA4txRGM`*6x3&u zmUDpF9#83G(LrG`5(msc18)gTw_$sCe~UNVYOx`F`9jn0J^&m?Tc+G6&M1X`fA5IS z0)71YZ@wa@Niw)h7M~gCkkYx2?yaRgQtWYS;aHKwJfl!NMkt&%lT6V3@BQm>DqP~} zLvB151gD+jG8J`JIx;G+2rG?<7y)mOo`spkq^8p?&&CPh#e>O4gVA{+a5tDgaR~Hq_4Y;p_PYRNpV`Qgul3@+_r)upn?7My zqjK@Z^y3+$vJ?Mh)49VQ6TmX&y8>XcqzC@(o9R??ZgxqHqTNoq_d(B0xfvL!)|9=^ zGU5n^C$2#D&bo-xqVN5FKi_XTb`w5t4<`}xqcFUkwkfWmi6;NiO6BK0UwXLJG~&4C zI=u?;yMP^0oTYOH;lqh#{cD!%+=E0~kQrGx?GdZKzCu!n%Xb@E#WnX6X+0e0Yf6Ws zJsCec@xQI4(#cKUA&IrmF-!IP8&PfwntYS1m0)Ph(0=^3TLm3117=zT5QP=a@5Pir z(lMekexljVhVsYC>YW1#FCCugVMyT|@ugp6qP;Gm1o z9pV9D;q}#N9czTNEv7em#Lr936 zE+@xhZOFgymQ2JNE}OURN&EcF=eAI@bo7h?t4C1;4M*%9m8SKzaitzEIr-+?T|)8v z#7=yVzR$`dMt%?FJ*A8$;ZrZ6>|sqDX`xoT+JyaXje~ndI+I6}UDmT|j(^Ej%~zyx zo$z;(L597rjmPgiY0!wmXXD=kd{@Q86pT*7`D|{}fl_#!;7)#IH^}+g8>=ga1q4(Lo&D%%LR=lYO*OR5l(SId}HMjff$_9aYqV<4v z_KZDclg}CQWJy=o0U!!?TP(JC&0TtQ%pBZW;ryd6#E}+B-od|j3STC-m*-0 z;wq`Xqfhg0>q-*7WTfD8Es*W~HpDOG+JiuBPy=!e2=5Yf&zBxk2~zp?&d~JKJ*5e+ zs!~YdM`7w?`Zb>$U3jgOZZ%(Hm*-$hDx8s?USdB`gQy4PJ&@(gIg_%d0^Ie8?|lGH zVYX6N(r7mJ-tt1UYJ*$6$gB@5CS2CnbdRdp+L2uhXS7LGJD-xnn18d;Psk^J`CBbI-a%D37DV)24M} zX?JVPN>#w1=MW_PB+PqkZ*}r6-K;Y#EKHr)q9j4x8((&r3Yi^~v`GW=nFR9-Lz85H z47XDb22s&ijz~{_ zt*{*igVm-fScJjl-2r&>FjjQtS2S6_z#z|%hL_!;TV`EzsnV=GqG;i~eAU*Eb`nk0 z1J0LkGczZfOu6cJfT@sLsQ$P$hSn1~ETl~A{uv(r>fwF5hoONI@Z5`JS{XTn~ zk#`FCu5K`9MnjmPF>y}XvzV(y_6FWeJZ->9K0D`Tv5au--H}(4Q*o>>wLNcOu>$%| zleyd4!t>$8vkhGGq$%l0!ubvP7Hi?CuW_`yV0vxMGIck&h}396-of@uo|+ySPeI9~ zru}C&jAp18y(w2v&|nF>oEEHXPsp8yn+9uhP>_(E*G@z26;fMg-9FiTeQXlJx!Rv$ zkhjkgnnX+e9|}e`82*5rUptUSIvP3jrXfR7snc9_l1VzU2lTN`cA^YIA|o}T?;Xd9 zCh@9ot|u8GEIt1jKXM4XWS?Pa>t5Jz4Ju}LFS@D|O@U^jpn%w`Rf%l-ooLSA=QWeA8mRm4|Hm{(-txHk#kUxGGi&FR> zP++T*^=_uiL4X>Uc6^d7jP86KpK~_jbndus4>rbis*4bPo6j@{+iq}x>Lkw@N=ec6 zS4&M@j>VuGC{gf}*7xbm=vL@qO-=X3z|Ix#3F#4$ZY6KQStKiD{`gTw9^C4>HA%|c_%f*kj-l4Z|%a3yM^PzNc zu)qsEDym$|=INnp)6DG(ZX4(j@FfNzAq{%Sb3)YX^Dg+IG?2EP+{#SXWHb0wj8qtS z?)QgTR}X(_D0QP!QBeueD@;;M*MoHn?R%04xgThZtnIn>d3pMHe|svT zBC;%3+8o*Bw$u}q!sCM`dUg5a0x7|}tbF?2ov$VLTOr9lg~XK3f7d=Fa{3C)399mH zc5+Q=xShz!w=TfYIly__cL}6-$jQAk?$3;>HKEkhp<`2fLX*Dmfjz(|7M7gv#D*h> zja&e(;`h+K=Cj4*A|GD=7&3f%gD)BE+C{qk!cf!L)OlE3@0x1>+KEOe%+?*)D2vl+ zxSJ6?-mr)7R`}*x{BIrsbnbl{>Q%n4|5?}ck5;9fam+{|r?`d%Fw&U0P?y23T-US) zz#Tq$&5w^;^z){mg~?5po_qW2JpD85o^vkRk76^MdqC?fdrj_V)=6pxlU@e+BxD>yM$m=HjCj6;7LES^I=$A+OXV z@FRp}VT{!8t}y=WVMT+2j`DN0m;Md2e&pfiLm!jcX)bWADI10 z@V6I?t}oI1!8ZnzkH8z`fjsF+U1>?PB?qdw$4U+~pN4}HviZCJmv`n#w_eX_Ad8ro zkwIczEiJ41xlxaYN2jy1Gw4cr;}J0<{UJ*nOKxe$Az3FpB>)f*)MmbA4#8_#pV6|1a+21cJK+*PsdR+PFKxwQ&-h;O_439^4_gLvU$a8#tZs`|W%8 z8E4=7_hAgi8f$fT)he5F)_kA(p(rnfjzWY2007XxNQ)~205B;40IVAl0`!R_9Aggj z0^_7C^%+n(L2?Mad1EdrCkg=6M596s;i31)4$@jq002hc-+!1v`(hISAnx^xxTvbT z-f0G+0Z#wjpuXp4>GwfFLF;ANuwjce@9B+t)-UYA!u@(Rcg(I5HU5kJu=K7r+3!2t zX2d)dUar7uR6;Pt5oQ2%(=u`G||tQ;&`a2B#fX%PIP0mkuINw zUyfod9w82-wx;krBJ|bjzpeiGD%J~^-tQP&6BZU#R#rww$&gn2&8Z~6 z9)|)BV8O=to^5*iH}41kpZqk+7R6;tOY~fCi8i*1hctsF22D*(>plAt^y$^Sk8IQhM!9 zGbSpRBk{GiWjB8~z#EJnYWCEh0m>tzBSnUAh>FP0EhVGW{a?hA2IBMx(JR-`J*fyYqBaM+>2>;ZqDgw`XX2TyROM6 zz&2Xmnm8NiaMBzmFN%(kFnf)eVx5!fRWWm9MJ0{=jKIU|7=z8rqJGZ`0O%Vsfv&ka(9-)aEKRTb_9tOdQTa}&e`@wEo3VI769=B68^Cx zbjKg`N}3yhVB%u~HWABB-;Pjh7&5SD-Uc@IK2P27q%4PzXb=DZkRay3vE4XQCZ<<| zDtou~t*o9|I6!w+Hrge*9x07+XAGZ1lW68w$-&Wu%DkF9$q`?GUpLL-;INH$LqZ7p z0gKA$xh5g0&fus)c?8G(ufJ1sitD0e$bH%pwmVTDq$7Q|diTCUvu0*-JChvrB5@1s zFsrD6uUA*Ry^mJk9fmY^rc-yt$SibbDd;$rMpX33xVm!rG;Ja-S3Bm_^n1)#=&)q9 zST8Ue%Q*CSe0psI(=iFIK})+i<{s)yN$CNSz7cu9(>!#)jT`{K8$j50U_!-;nlLr? z(iSuz5F-&i$-sV#BtKsclyci}687oZlrj##@)iW^JdDYuaTDtREP&l*7fgn33=yVM zo2aF5+%M*Pf0Qv<{i9wE-w$lS-J{Eq4UkznNv+&eP%GrSJFx7G5O`et)#uXLMF-N@j1Hd z;EH8$Cb2@NBscfx*RNrgi&HrRT0k5;-09J(PEnmE+q@hxgwe(X294=*Od(Ydpx9cQ zTV95x<@NlS+d-NA-gF2$ak2h}?d;7HEQ zjB?ob@7bAxUPBE}X6xRbuYtVR`}*O3oII}&W%?XB z9u@wA%rI2ul%Lofl_oaXuE>`hrMBM;Fw~18uTv%Io2Ca_3~h=k(>kR|>gjd1Hqtq7 zU&I|m%*y9*ucv>hnsk?$QPR>X2rBWbY`dke(Fg4?eRY2+3{^|jimzb=o)M}UKdEnIAtG{f9pNPhxDkV(O`dy*|3pJzR&ygB>`7h5;(?wZ6qS1m6_1J%C z!=j^;z{^E$ES{&k_Vb0|d82OS+9zqa?% za=YPYxC~f2ANJ#qxNU=PH+^Ogy5VkuKsXd&V0xcb$kAw(Mph z+&jtZHu9u|h)RM(~o?0_2Q=nZh z_hzpfZeS)|3p_Q(t(VYyS<$qSGfYFKu%PeNaKdf5T|>c&i;Jt8pZUwh07m%o+_BbY zI=7$V67}_748{7EG1L2zSIymgM$m0d9z32dnxyX;>&^v9A;QZ@(sz#{^SXy=-5n_5 zG=#V>`5w!uNEpml>d*wUdFhc)_(lu4+f%!orAt>aV1{gQN=i!3)J=9tD+petueNjR zf+`lrESWUOGD12+6x4jZUoJ?fbdlo=gRWbxD|4fT-LneQ-5O7>2K7P1)~!Wd*R@E$ z`0YDH$tOHDkXIb*9cRt3$c4a9b;9`=q)Vf_pOca@1uVFDD?Gt9N`Lq|4zqJr5Jq3pzQPOsa(ISW}bnwPfh6^b*3V5 zbJ+1vA0|Gr*Ly~!6DAVh#T?~DmdAba8YW<4dM4d<{IaZSHX;SA=&V|=>T$Z9Ktne3u%9lm(r91Aq|-E=Qj}FH5V>x|U6dfmD2KV* z)Bo$2w96J*U&v77^WBERwqOW)T(aQL8)7bHM@MAReDP#K!M*rlny>9P4=4|^cvWz7 zH9vm*xXcK}a8V;>AJr&-i^UkTe%UCg}NHgVjL<^eFLTY_bbN{S+y@jf6$#gBU8ktjr<|2sC?DsrSthHs7}922)jvz!btvDX(u}7RQ>bsEQ9Y> z#U)`CwY8s(?22ZpdDtZS`za=5BZln?#gTF1iry#`Wk`h1>H&8lVhGjd=AsB4uN%6p zccJ?Smy=*CR>t*~RUVW#45F0mtzqFSs+oL(dO^8TpJ!Jzn&r%1Ua=E~>Z@=G+9qJv zgI2_o6Ur8(#~kJoLplI^f8ylhxsWgv1aFb#K(}-y!t)kcA!X(ADc;JqQwn#X3i8JWfB$*lgy9sqPWWE2 zG&R%1j_>(=b}}U4&!LdIU~t#tBk}cP1oz<@)DTG|>B-NYd5N5~y3JR0y-;`bt@d=} z5!l5D%Q77!jJ=CWqg4}oZQG(g?797=GC4C`lu;m%so!shhzftz6G{%XHm^f&25g2f zUUrY+oX3H@&cnMJv#*8gAJ=EnI5s|G{$|kKZ`u5kpFy3Y)uD@OM*HnfBRynq3U@l>XTB*O^|y6#^lars-7>c#YF+!^h%G zV}mx@y7jug?e>Vls%(wGnhXHGI?7$D=USPgcHb&iIo#nWn!VmBoF)y`TQV^C0NWF} z^UlPx4Ap6&VcJUf3~YyygeI2F7ryv-(P)?V2y8v0cqO{MI|K=u;CG#iioQTiwNO| zlTE|fcZJA^?O3Mc5i&2z3to|+s0bfT=SfaKMiN`x95ryheD6DmTCrohg?(RbSuzA^ zK2&J6T&JuRc!hA&G1nfFDzw844F4WnxXmtQt_y7Yv=Rw|`gF>yp<8!>NN&T)K;kv2 z2m<+6CLPI~4>`^~5BoEbDpee}fb5$-GPF3Nq~9O?Q`*e@uLa;L0<$ZiWt2@iugI$0 z-buI+0(zoY1XDgOr5&td7LBlTL!8=VWkg+4K**lZZS~tlG}-dB-OqQ-PKe-9QaFOM zZM10Wd;cK-fF!u>L!Cyi0}BV)_ocUt>~B-&;h=l(O&&&~RqDQTk+IRPW*G#tZ%^UC zO&>%8vbRCh&7x_teoez;?&cVz950Gr<&SJ^E|_^++GqtS!+-L*VbN2i8@)>X4jcG!EaoP9hY@9{@j1Q?|X4`$N6xdXSrOlFil9KkUQC# z%92S3826XDxbhoIi6hU8o^tL%$FaOxSExK^sRhS9b8P;gK@kK)yglM~Q_?T0* z(X2I}1=?kzyVtTPT=(r-Bw+sYAV4DY(tZYH6^20gfo`K6-`2gzeI+MG&w0zsh)HFP`h=^6kAdP{Ai}c)|dLL0STEDI2Q(? z4~U7GqDH2MHm;5B3T;fFb6^D)&O7?_ILYHEGNf?BVr*iz=wO(1{FDQ7y)v`Fa;3^9 zfKRA!HyOw{&OvwODyWd+fkK!$crwywba+EY>howeR(?WK^2qnHc;Wn_MI-Yx*o5kv za3k@`oM3r|h^o!$GH1G zT9v|HOT_K>WJpuev@;s9Bd98P<~o;tp^Wvm}KbBwtu_fBjR%Rqdx(nr_zve z`?J^8stv{ZJl-DCeV(c@7C={D|LW^6o0*+r7^ zezrl!vic5KP~F|dr?s!CG>oU2G6{>ocS&*l5#Vz(pV=2WD8>?b3Hh#r=jDtO`=>CpIS5DKg z-pOY5kLRi5&&!Y4{4V*O9bPkDPD1a%6cZl_UjTPvJR&67W0)HU((OcF-h@r}Q=HqsPYh)zN*`{;SBWzK2ZX&Djz#XxD!*D`^6@<2PT^ z&iVeiF9{6|m2#F|Ws^KCKtybj3*z-|?O?Kp82z>9jqvFe5567Cm&vhs$(J~4Ql`j5 zSqn`9CWsSOhSz|JrH0{8RsRXFQ%Nyx(T6~Up+^7bCxVXF&n9n|x?g+>j%>$@1xvE4 zrF)Z=yr!{sFyfzEp4n?uU|qmg?w_pS!tpWlO_YvPmnBY~N_S*NM%u1bb?B{eZf0(z$?$*D$y1JiCj&`j1OGin91QVsX z=QNtuo{Bk36*4({7GVK^is}hzpJe0%m>{anqIz#b6VP|^q%>5v90ZAqUoQEl+qic z`xPo_sZqiJ6l#Q4w+$DXc!cCb1_y^UQ;2Xi)TbYrbc*$>^l&!1pO64k7ge);XP zV9Y<}9y9BPS!EIxo5uM-gsQ-_-WD+kfpd&<14l{S|C>JJpM>92&rYb3p8zLxbhHJz z0oZg-1V98)3-9ne3Kj67|7!`4Y)grtiV7_gL)^UWG9#Eplo-9*cbApV8RWqfZ_B_Qeu6aTq0b@M`+i3p%keQ^iQ~Q@^ zulXqs8n9jh!ZL=LVT|cbH1Vy+5*2N`4N`=*8L|BWY@XQZMK@OaqYz<;!l2I-@7YX2 zxJ?-@p_roU33R7AJp6^e^2*hDXlN*A=V$y{)EJtEJ^rFjQwO8K4`y%SAoehmhD40m zx$E z^E5~cwbsP1NMoX4Vlq5=!yKzLDvP#^NVt=VzV$LEf(}nfB+^eY#XfJ8WO=4@e14Ge z^JP6!sV$L>=9KI=)j9eJ@|-9OX^vnxJ>&!eAM!@2HU*$5Y~yRX2J_schC!pKalMFT znf}_g-s53d(Xp!yd?hW3Pe3=E0Sy=qZC2e;ZSitg<0#;hn2mRE9!1}8N}b-F1piL&q>dD)^V0K zIVnuG$KJ|k`l|B6l6TvYOaKA`qMqFC$9-%p3~k-|nC5VQCmw3{2%!dGd7uXnT7#eC z57|g;LU;CTA+wxK(>g!Rg;S$V#;7o~JO%s6ln@(NPBX-cq@u1}t`wryi-X$%ys3E$ zSsWtl*btj;EfRdf0SK1o(qoN~Gy;-QW;)m^uH}KioxNL7{i6CEE>eVj&1vu0bGnPZ zIln-I4hw5^-n7qLgB**K0WsKk-3;T==dCLjMs<_CS1}g%oja+4eTQp^?cSw?y}A3+ z?ByuRN#+u1)_WD>j;XzHd%cC8RReQ$n?R_Vvs+|EG=6$`u?O%vZcGa#Jmxz`7b;<{ z3n*pVq*EAU$gdqCYK5OO?G}5_{L*ip8A0f#Km=%}WDWzB-go#F`m<9zoQKoE`4zQ8 z#IQ_p)n+ROd*fi2E) z9Qj}S^*wTQPkIPrPv_}9Rf2n(p?C)vaz4bmMsQv!iC{cv}r(c8uGxqinA~PyVU;vw5@7cSR z+SP|fFPQj47iM4q7VK{cH<4U}TUui71kRDQT`@RZSD;!MA_ZK#ycm3|-Yw@_wX(o6 zTibDDqGLWlciZAE^Q=$0ii!$VBXnFMGBQB&mX?;-Mfv%^?TkxHON-0f(`RO9O$B** zwTu`D2??>&POBiQ;=2i-ocj$=ZBNZyn?e;$Ox{QYg^m&xv3&d}J~LP&$%6Rpg&ivK zy$Oeo@i~#U>=;{U@3xV4gRdZyR?E+Ms7hg)@YP0jvDLW9wz^K?AWI<*0l&E+EpV>W z@>pHF^&4)5W?SMV3X0Wi!E7-b3ml*wZeSQtWx!PLlAf6&u`)Yb>~?n+KUG5+92z|A z3W`a|V9K3sQK30XoatXYx->WuAOeVt`VaZx?Z_LTgoRn`Bvx=bcLSi3UA*l}VR?Ca zVwIMdV4G_}6zR4#d_N;6bc(ejmX3;qR}1AO2b(=ja2{o*Vza{bIcPuLok{4poyBve z&f3fK1G@P|O#C!h@fS%nR85fhQR z8iwyK{y=SY$r?7cf%)#h*#XPnho)nr000=eLYM}$6*V+8WRR|mDM2bzhe6GNO_D~6 zzJHgzLFn2wK@7JIeK_b^T7ZhOet3q=-;bG^95UJT zyAKnDDc8=#;Q>?LQ^ACss7^iQNiH`kYA(3Acz7yi@)+pEoQgK%1=M#^rNfYPD|)wK zPwCZ6oK#N?C3h(;d*)G%s+_2c(-PxQ3>-Wn$($T$CRoGF*&N|-yG81%!y+&QT+mR_ zqhp6hhHV>%e&aCalok}dO|au?{7q9on1Hi<7iAWX9xfZ&d>fxF9?~DY60NB?_PGxl zxmXYuGsEfSsk=9qmcvIgpLD1`w^Yd(X!tnPf(UmK7ME!;URmwUR+#06w7T*!`4D6}A$FeAxq245HF(@)w%#d>b$Ng^{ z=7Q2sDyAEWQz!MX03vwk0>DgK8fR=OYZ536qK^KIjiJ7M*=*~gDc$>lIY;UDN&1&h zRB0wlDpno17HCjuJr)K=;zpf{J!}dl`pZdFO!7KTFDU@$lTKD#Gw*Lty<#%BZY&## zfmz=vbp!inS2C$>j7C#4Pa%0p0qos=MR&wQiFtyYFL56$>Fza*?0XLF51u?IU!mhk zS}~5HfbfP)?v$I-%Ic@|E7n zRJ@sJD)#J`P`1V6x>hr`p50^~bF4YdN9AzSpI=MnHFM@Rb2dcCHie4%QB&no_`5wp z_2eJa%oyKC_C&tOs+7(H0532UaIj%9nHpboJGMFzydu@?o0F`*fv~Hj%%B$O&BwQd zZxzqo3C=pl;MmIWqLchn)HMxCxz}k%mF4Dp>7ZYl_}ep5i{CMI=ghuZKu5Luz``H- z)QLZsKyVBI@JAMhwrqrBOX&q2Yv$tp!m262&u`#AmhN`m-rhT1wy`qeoIslE^J77c0zE$|-A^9F|;pdsck z3vGV70OsJZ=H}tc+C*)+`1hhjOJ-;xe)fJter7o!E^PxIcO8>-sJZpC7jYp(()`nJ zT6mk>>Y8#KDF}T_4Cq@jZ;>7~4IXUn*``1hWw<%hY^{T=$V&R1O>vhL+csI$6=MR` zB$!R&&0#;kkbSD1r1&{hCL98QAGjKqP~k~b$G_udg_xKoE@3yrk^Kd*V*(vqFITAR zw$8Rf1{lEGvj0=0UG!GBEn6$FPzxM)@xdYvrknfkAaw^jX?e8N;3d-{vn6f@!))UG zwcc1*SgIxMsUpGftjYyg+pvChe{D_dXPswMOw_!gbB7TYT2@AXxS}6r*SUFly2RNW z03IlU;Z9l^VpQ|x>xuX=mu+EfZFdQmF*s~CK6x|MAa2&OlFN8=4X`N)br}rOhPH1y z);odxVd0SlyVmoHX0DAkcQhwGjKG?o%l+a?pS^Vj+1QFQ{@UHt=>G=;K0L!qPO*YLEHkdPed>h z{WCL@nGGb&UxOSEfdVc3uzy!UJrZ+}i)eY$_y z)Jr-FM%B;mn$jJL$yqc#?Wb2%1=rKL*iEiZZ~3!?1>!0C`u=ddOK5f=;u~lYAYh`` zRT+Rw4^d^vb->4aLPd!q#voxk$B>97z&N`igU`M zA~+fs^YVK&m8<>Evp&$1gwiRYB0C~SWZ>roLm3Bvh0wW)-@+6;N2sD&Oeif&VI{x~ z_h`>QdhfFkqiO~B+s~g>EqZ+$_IIwX8a=bLXjne$akDu2Pdd80H`lB?Anm?guJ>^hE|Ql55;?OL;R7~(po3(65TN~LfVR7Ul^eyw zsb>6Z|oo%V*^#8++~|l_H&gJft|ZJDp%Erh^-p$E*d3 zs8C48;C=ft(N=VGDh0lV&ZO) zKZ&vUY`lnhiXovRa?HDI@_Q=-_s#-#dILk;Z$hzCZ`ERJ)U&KHB6CWl28pZzCsDV0 z<#T4ml2=So;Smql>W(Fqm4)pAu=I3Hai)N=bPkU6c~fErJmu}r2Z3}6IAmdnxQbF1 z7PS51Qc9uCJX&TazYsp}@RQ)CguJ2V;7@BjmCZ{|wg`)~lrl6ltR)4EizZ0S~CO<*h9Ch>i*NmB;c_jMT$7F zXRo_l_nTiZexy>u31TvhY6R9dhqJs9zU?)2mnISD-E6LSrbIHL;-8e36VZ;+V?;`Z zhi}B8Z=^%gB~byZqN>XX%VD!Af`n!fgNhNsmgs6e0(1JFwOjD0DRzeA2YYQ@!5CYT zs3+k)f|Q-g75v;mcv@oY7s1v-WmVODLzwQ&cXv@aY1H*a2j5$Z#=o=XG7R+3EIqa0 zzXt?Oy**mw)f*uJzF1t>dJsDxNDU4@q{D{31UQES$jS1U}`xR1N|dJL0FljiW0`1mmUHUMP6q}ksOaw$B)MOXyEQuqHwC?CoJ)fx z1y!MlCU|RGDoL!nh-d!1iltDG%O1K5)Hl+=NJ;W{a#u4UYBgxi1^__Qnh$}s!@MLF z%(M9F`Xaj006$^YB+)%~WS(e+WfKAyEk{EbfUjr}d}Ob0kGe_ozYL+V>se^!#$osO z=FLYae}{}OR?7HQL7_@RNXydl0TDXEC;x6Bk=`igZR@o2$xCL18IziL%c>>(5}>u} zZ@nR)*^zKPU6z0k4u*>BDHN2%nWZ$sU4ECBSH(_7<|)t1(3E-YevP2gQbtB&Fz6ls z?^${N8{QuSvMv7Ix2OEHi~VPRwgWoOd%P+Z&mrgocz0tD2gXZo)7$+6iH~CsQ>L$1Rb7!x6R`;y zOj$R-W+w@CU;~{BA>?ZgvlX?F(WHB`;RLnFDLtQUSXEfTykYte7`KviD?9g*zq zGIr!oZ8Mt^Owd^rDL)-Q&EVoTudYiY z2h;fN)#%8G#qqrD(?$=l0R?KBRfh zR?IYZqY?3lWIQo{{c`vj%AAF-yE{<}x^Hmuo%=oa+exkEwKlqlr~9 zwumRfiD#WxTwzgRxpfm66h^(a0*vhT9TzAra&iSLXYD$q`=zia@i8K9>%ORwL@iAa zp{%qQit)^=*hSBqlP^$sQpNH}p2NJoQOCESprFj!Ntj8$M)`K%+G%B+d*o9J6o9CW zB;=_U?zo*n9ne0XU0ybEZbMr(@fK6`WIxG>08xaJWoi5_?q=0-D*IJ-p@F+xD3^8! zahfQMQ{WTL^d$GXBvgCrnww_dERS-L<@*wx$fQ##-W+Dvc_Y8v?D$TO_s*sj${_qX z%#aAqTJ?Cyd?HhYcg2$91(WcB(>dnexdSz1ZyB2fKj`4rYm~vh^rBAW%~{XSyi*W1 zKAGlp-Q62DCwDuU-ewcLi6~q9GY&=K?Du?RLPss)P(J)>p=O2X_>W97iwo@HiG;D@d&Kz+q8o z!0WfK^721zPkU?_csFmJH)>7meFic3pdesrwoo*6mOCua;l)GQ!MAG@ z(Zh3}zB+G!Jf|^BWkv=# z-?zT9wv9fnA8|b0o{C*{Ie7>q0l#69a%|B)ZhX?1_BFnZh1IF+!Y-6y=~VPd;;_`; z-HlI53g^DuQ4n5`(T4({j&G(jn$|uK$_XT%sjZV=9ou)wjGe0-F|Y_$iSK}FU!DWV zOLP`0f9mf<^Q2SL=Zuc8Nib+ssApY-pY%P+EAJcI|BP-5EMrc3X_74f>9;B(LxIvL z6$%j74R}N2hlLjrn>XfCv|wGwU8#Nt0xW7o+(m_8u|UfG!P}BYE948d6YK?Pm$WJ* z^a`gz{$GQmwPzG0hN75Z7Wt#`JVbBNq;i$ zVo>&HH~I`3{;C<5&8oCh*JB}<$^P%9abV8V?I9CexQ@6xPsimEv2*Wm(QL}Tyb{^3 zeIuv4>&ervvK#A_P7Z2Yd6;88BT$Z{JET%a&zR|`t;-)*)9eXdVoKU3e!D{*zgt#r7A5Z?4(mtjFeh+i`Fd zDR6&2o&DkCLbzdf>BthaM4sHvI+Y4mBu5SZIm(&zv=G<)qH@-4P30sx)DhS{^h7??gk+RW5K#&- z%Wwbu*$VC*beuCWp7YAaz)SQ3xkp7MJCgn7=bRt#-98ccE?*Zp=HG2{-+m+9jY>=e zRWCFlM$#DDDHzr~oY%hbJgUE^_TXZacKZD?8x>uKi*L8!;jP!#k>Wj`4FmVb613h9 z&Zoh{&vn}KW^9c0I{otebQfo8=e{2TvP;`)Bth5EVzXYT$lL0>iy5$5r{U)hHOh4( z_dbl$$A4iy6d7@zraEtxoZf#5T*O~a*i-9$zg5A!VF+ZK1F=f|4_7lyS~7q#tWe=hneCSKkY z48C(fNhQR582jtdY6=3z0n5{&-Jou;rjj8Gfj2YWg}yIPns=mZC(~vX%fEvUPwqB= z;4A_0j0OcO_ik~3b1p|hY}bRf84>~~ze1X7nycR*W||6U z)-?iYx(B8ms*1x|+C^G1vYd}noKN!O4ko+sr@zq1{%l08B;!+Ja=MwkfSyV*ju!p_ zT?8g1ax?P1=&QtBHo0-~yn#_dXX2LkFIV8MTZHw(VWTjL$;>zLLBhbB6chl5<2W*QzAP!>=9FkRUzic>!N?&?OO0gamr(rbhq2)jZT=Uv4l%pwFx%4y9 z$Ph-I9Vf?|0%qY(aplc+S66NwH5UqRdC4LvwYi?=TZMSV4pXaXIyG%~B$($eXz5Bu zuF4plW%4UCIjN~+t#=%JRQvvyR&Q8iXxU0>qjZx$5#Jdg{cKLFv}M-{A%tkH&UxIE z_5Hdc$QB^S%b?rO&{>GeiQ4tycXnui*D`!cNYbZw70UFgA@yD4sk!iIVU?3UC>Aou z0Xi;GDMN$pSt&jwhq1YD!1yLs&p*1JTG0K{p(e!U4XMr%vl8SBu?;2r z_Lhn^n)(w}a8#N*8qlM^;HU7a%<1<&j0ErFxw0f@j7{*+sX;5#Sh~LV+CVYHmV804 zeZpx%AVXfzwDx3B55w(w&wX;MZ&#oqxHqhozIX52;GM$EM1U+70bY_=ynFrtm5Q#a zP2dOmu`4S_hPK*Ex-ZW!_hg$JNQhVLh?0ZdZEwbTQYuw|=v8KmDjq&xrLa)Zy=WNO zTeZt`E;x&#q6&-T!3O32B2v+D50D*RWpo5@F`2>PYJ8`&P!&^;N5@$UlTv$lf*lix z%#>=Ojf6mMCe3Bm^ zZS5Ek2u5pa(`7Z*s2iB|>;ALqG?DKG?hG_CULn*Gu?~dagEG=TH{+)a(j)9oMNfw{ z*-*ng$T+6>2x%m_I$~K>7*=PF)7y-7E&MR zW|dnaw`B)#Mse>k^Ivp~S`oi&q(TV#C6=RYjocZbD$CY$OIYNJBLa0RUnnPH{qTJf z^w1LYIODss+OcW}2O7VA{EQE?B;0D~SQeP70(o#_P9`uOXFa|W|~B(@>2cN{7wBcnxCUC*(2&nkDR@OxPW zqlXU~>$ydRU6NP$5JpUom%Zz)!PZt*w~AxKfy~IZZ1|8Z)67w{H{_xm1aX7E&P7Hc z%6Hsc(MJoEPsC}KDTwy#p0aD496chL^xp2v5GZsM$%KCOu&{yMH3~WxS#ah=kBv={ z>+p0g;&Jhv;?ccl(dkT;JYid0oZ$icp40tB*|ZzXI?&EYgnob8oE@2EX%{s$-E|)F zf0_~*T@FULc9Oepu^i>{JS*U~ysz%ztOk)k zLdxvv=-I;_ufL6d-K1M5h$z`Atu{uC%A1YfJc$*E8Ms}90-wvl4NGb9&CNn-Wh)bF zqc4JDk^Vi&FH{!dAyRkx&a7@g^c-7WwML^^oY*aoG54QxnONQUuzgqVK2Lot^+q>* z4_L1)2OG=1eyF$lzOQMC%=&bd-bO`H@fmgDzK=JUH=H`ssIR)RVhAmJ3q~^>b+qgh z0YCX@Mjn5@s?V^XzUGeWG^yeE_0V$1{JqRA!}S;Fl64e9c7}aj6|&%N}Yt<}Rs!rWfz}9%L=TEPQkIki@JbK~+%j`x6vS(J!6IO3095P-}hFYge7v`qz(B`a~`?HN9OPo-@eY5zJS-F7a9>y&p*+=={<#r+o_wH#iMC*0Dpr0Se zQfTZyZOb-2;Sd2(*@Pd!414S zB6ykjs((~zCb)kVflWI|%e$P!@xY%Q7?t4b<2mxHIPG`CR=ZAR`v<%U1AlYNy)T?8 zi6{}I&}5d5mty0vJNvG1I4bHp@aQ|M$Db0Vep^NJ)>ZeysRD-->gh>4OWe*yO zJ9;qBklj!B#U}mMO`x5JLN_k=+9sQ+yw;=8R+PaLE1&L;1N7bEjnDt}r;S_}p14n8 z%J|Om#k!J3=oO7`H^|s>hE0mrzK81L$&!Q1T6~>+&MY;#6ddApUT~?RyhgyVQJ?>Y z{p%cr{Gd56Iy;V9E}MacR{>=ndJJhcx(kvcUc~`q-<7SA6WStploc$nmOTBWyW9W6 zIrh=U_f+(?v*QrLp-_0GN_L?lkR|4{^qRr6@jMbf`Oa~**#q_YWnWlTNlSD=mn%3* zcuzW^sYu#zlEue31cOW@N!KCK#{6RUI*~3K%s^FLl*t-PEzhGXAj}-Eg|kLl_nNGv zY~co;Prx8)!^FkG`NBUEyMvLdA}^pmQ(i_|)qJnZ=l)u#a5;47CDN{)N$wF(cQ9p7HT?h(wFca9&Yt`?2A84=O9$?_0jjkEn25BE~ue<)$-eWWl{1> zf5JjC2WR0wG?^jA> zohEQTXF}AvAkcpJ&*#e!YpB9c&!k(yovjZ9XNx{RDO+l#2-tG0`sRO;l`i%Qu@y9?=&C9mVJM`JEhDTMq^*jU9>j2b~9Rp+dR_g#28zt6Z#3h&b`zn zsH8SNiK^)h7{Ww8xmm4hueu(S#TwgtTR1bZ9z&>y$=#7&$<{ob6?~aG%A5^7EcaCp z;qEZ$3eDP}Z+2qU@c=gp1#SJG3G2zOejWe1N6#tPFN_-7^*tgipPdT9)nN{;%RxpZ zKeO?{vYGCn)#GY(w0WdLto^uOd)2Hvxb*=&Y29x{P_wyE&*w*wPF2^>2|?ppciq3J z_ZvJcJNwPsx9DE*>FV;_8g-VI$m>*D{?YtYK+^zhuwBIFNa$m z@M~(-d}BIo8GLJbz2ErZwjGZ}%S;oF{EPsAS$H!=FD_|}hoj%n#`$X#<~hG6)BgOM z<%Xt_<~_XWbN)LKJ3BJm;($NH`}zf7-^cOl`5s%MRZY7PR8!_RUH6at9#n&{U+7)fCTT*?!e$nY1S!~p6{FLwk?nx$)4@B*zyorH&%ssDW zXc>ZbL<6j7sR$5EYdZsdz9r@|dH?)!C%krk@^4#n=6TYqE7*ku)lLvs1iwn40?e;^H)c6~aV{)7-Or4EC+R(Hs+} zgPEM1OyVrxs86wLJ^pa zn*GD1e#DMrWMKyF_n{rHLA;BFITdhcYpI25FOOVQGdOm01dG1VCI?Yd1q91 zUf5aYcR_mVfQZHKw1q@p%)uwG6TFe?<=T{kt5uh^DpI}+5uVbHPH1(5Q=-aLF5tdN z>4uFQ#M2_7JM>n`4C6pgRvzfNhuRGELb&+8Tty*LPczkJSQE6V|uk+7mR;$ zN|;}|wnoNwr`#-^brhGt8T*3(@;=MR&hb;V6Rh~=tB_jedC?{0w* zIPQu^1T>MNkMM`9`0Y3G+f^Fsz{}pSX>19Fc2df)90!`~eOGG?b}oVo=kE^*%sEm| zKWgH$vKn?yr}l6eR#{(A`T4nGK_c6ct}lLM$CQN1IkbDIkW+>sCGiRR`XPGa&1K#W zi*afzj}^>Af|;PvuyuVFF0_UxFpTe60+0WoIgI#MI|?JKw0 z-WwX7aoTy-7t05~5!p6qUKo(ymor3a5VP4S%J##z!aET%h4C~E1P8K^rQBxUQFOFS zbm06b2uCV|zy|NXlq0~>kFyjyS}@k&D>%l#Tqu`X7Drg#Fa>SlWUG25gHYdG@Pwc9DBzyP6{fzuC#MdzVauK(zzl9zBHnI;QLPQGx=D2@hs>|5@^-B@rl@MxCxNN! z6iqkzRBO#)zLGUU-Kp+=NiRq5oyulcvJypSl00t!YjH{dTXIL+?V2A;+3bEA;br_} zz6P@^7DpJt3-qqOV;3Hu=;(8chfz_syQ6&bt^*{{SD$+h{puPLIdrg#=}6Bb@}Xmg zHkDV&*4>=>xZQ4@_K-o#aL6Pb>0-BQzry>=rTZZsg`(bXz!9-)g7iGXGrcT4)6=we>qR=Ebbfl?*xIi6S+wPGDGx4# zZo2H{B^*jKe$Oda&Wj>X{UYPeZ?Rjir_LMb`|ef>bo=dmHr6p7p1I7u(;<`I?4*#_ zV>o}-0lrZ5S>7^bNesylc6^7wV^Vs}$-*OTwG|r?@$SRp{2vGO=42ER(XN+oECeHZ zUb~d3udY6(W?|I`Oq^3Vn zrBZbpX(6#YEYUk_9Xb&SoEcZb{}Rb!2+L12ckS(ZqX`LnSGaqI5-%z22s6G++uD~c z6m_OJR@CK~EeS)Q;@KC|YO+Yc?}MgNfBj&ipSQ=;{zPi`AhwAYvF1k55eFnqDsuSZ z5++JOQG>uD>nl}N)oLcb)%MZ|_(jjIUk9CowJEr(vn6w@7yNQv@8}D!u!e)%YfFlm zaYgs-rjkvK!=HBZp3c_p$ktfywcXRhfvP-fibTbm;l^-zO$*sDhl}Y=gNgIiaSQz% zg(p-xG9el=rAmGe$dYuX7M~SQge^Fy2YArCZ@2im=hRF>$@kppU)&G*$jOC8C@3l_ zikVFep*X0Fp&KdMrXAXMy$9b##8rzA@_Y{Rq_}{7`)k(=x4UT3LAFwEI^Fxr#pYeK zn*)!|s(^9Jh{nCIC$e9tJ$@59fvWHH^$2@Tatq-bCs+Rz%gabW^!Pen(K17nT=K<)&;0Mu^o7a1IFAYMwlH}7`w`+<-s#mK*R>=bK^(jm#xyH^GeAhhdbc*u))pYaR>uDvQ1!h+tlOs=$r>IEx*QJ|^LP6G1ay#>A+WX`ivPI3@xAZu^Xm9vWFgB|(ce zP8~eZ@c6yy)0<)v3{s1u#!2ZuC`r%^&YNPgS|53a#VZAIEXKDJ(tX5uB-G>cK?7Jq zY+Mwbtx@6~m84jV5i9l=kSn?kvz0?aFLogtF-eP0sdTN{e3lT7Hh&FT!ktJw$`HD) zFjTGcVx?be6Iug~gKXH1&~DeLN6$gv%{P8qw-fxCs{UUSh)kB~ zJaj3-H;fUf%^j52E?^=fJYE;cgC~IjJ#QVBqtT!pBfEQ!5gr3sS6p(hP8lwdU}|2=L*)e;g&1EQVIK9B?Rk}OMFU=SRrKs3MPw)5B! z7$rr{p-Dc+jK_Q4lD>EBEjyV(5LDbVZIw!hhQ;$A%N=O!?ot$*JM?vuKz_6a2zSr6 z&hI5oaG$^?ygqi&0o0O;h-eyGaLyZw>YCi${(^J@FGym@`~+xd@MzwOa**?F zrRdHgGl3(wQ;%R;*osG5P11OWCv&;JlqtFzBigtz0TvD)#Q|0io!s(BkR&EODVnmj z@$lK2Ie;*x_w$zYEA>A@!@!G(&N1V((p2SFc?$Ss04y!~fQFg7gNo#>qy>uL;^8L> z%#L4|X6i>)l9K50$1nGO9r4|n-?U~+qT4W+?2*dcJJ$5HP?2!`M*_IScZ1rnYy!7C z>>L9!zy0|ucQor5dY|1ATJxcUhIJGH5Ze3@&a@_Pd|gJl_8|x=1VFMryJL18v&x=c%5Y`xXx9guyO6*)EW%R=2j3Z*k8;i5PmidlHUG5AU?4<%yyMW+crM z^S<=HY{Bdq+hvXBM?>Mhfv6oV4;7q8@cg||w|C_n>j3TOEU>7Q2QvVdehn93LNwb+ zIyi9zaXir|m)Jk)><57`QTT5l#C~;^(dXPaqd-jF&{9D6b_VT;iG)g`R$t6W^^G=_ zb1LSw{G{j53!a|t|9Qi;{=d%^cPPVL2o8c4Mb3lERuVNke034UPY;P^%`ScdXC3{O zb9xGktK|TvPi+y5JXIy=+X`IPE+Ep@}h&3-+A;q!TJ zngU72aVLlE6JocI378vQfIPN&aoZgUL%q%nSYO*_54``&`k@^J4Yc(7kpd6W>9B;r z!~UgckG|; zYLW3Ha7a5%+4u6uj~_kN0X|k+oBt4ER9hZ>zC|vj;F7)TYhy$Ct0^jP6c6voqem}b z($X#TWgJ$+LtQ`9!H85O8;CP*aBy&EAF7q?SW`H-Lp@&+zU%8N_dr%o?wxeF*A58< z*uix}^Z7cB`G`p8(j?qaT`vZOU_6-X@IJw~1>s^m`(Lr(^!-e_|G;cZdwN2x4VE_^ zk#Y8FcyuBfUhh+}Of>Y3Rj;r^S=i?iN#McP7u)QxDv(%7&avfPTA-W*^h#Gk-)HJwv8vijjyzuQyW%^p_slW=|iLU{-3`Phh7vy)&ac|2}P;7Ha z2@rfNdZH2hv$zVD=&7mD+3wjq0)kL@mmnj#H|YXgbNSKL%>HU*hLy5%ew+g8?s zVp*iZ>T(_)oqW8AQ<0cO>tmkq;Yx>{6-d!J(_)f*QdNg0xuolL6}(rWGN%YC1OXE@ zFh1vD90n2fPft&4y?g(PFemr{bee*kLO~dFg%MN|3^Og)rnSLl=-a0@4As@r=rCBi z622eK`OOWyh(`4IIw|-iZOeq903#RJi-WDzu{YBg(zf^HKs*kJ`daoNXh@4sVy8-{ zDy9xOhqrtF^^`a>G%|3$;DA2olXCkV;wB^of=^2qH$ci@@2+Qzjnf`!SP43GG{D=*aYBC}KFd-6=B2OSx)qd=+R4 zHu3=0yn8O9uMo$K+V;g7DhF|pbWa<$DpkKY$nk--Z~}PLXcMvgM&C|RKz({LA%_3( z(c3E=j0^z@Rfu1`^Khrtu-R>L)t`1>5LN1ohTX&DchfmH8gFkNJrO`~n%g4*2rJh; zLtr9ksk7*X2p?aAz*Zf;YK#fjW*QN~Z|$juUTZ6#!?@&J4TF~mJawZ2`p)m>J=L$Vsh?0Cy{O<5l;DM4M`@qEra9f?nvkW~WR zs2oQQH{lLvI-$67(88wvN{6QRYwv9pEgsAj;P^rlp^WLZi}Q^pV7LA|t-vYSLj_?~ zF>QqUnf{p@-ok2>AAW=Mqr5>%jAfd4q6$F zF?H;X<0rz)+}>^L zCp!aIpzCvz-~qwtKT(I7%%SFQ`$G&vUrLxU(z2bGlTmnARd?_Q``L?YNA)Ywe&bU= ziAJh!8mJtM83U$(+!z0z+I?Eg&dy$17Q|C4t)kep%TA(DLhq3R4JaC2k?SUX1uZiv ztNO}pgKh6>M-EWe+sTdq+4FtD&@T&Yl{wlC86D&Fu;*YK%>*f{ASfEIbQ({!<{&Mi zKN0wRacjQ^N@;K{&+IZsC1yj0fI(Kujz^;C>sI#3tEG3P%qXZ}{@M7q`NXD7FuOEf zxEH&Cw>D?Pvy~G912llY)da@avom*7uS^Uzw9>U0gJ$)^r;D9DR=u#J)X+v6kovu` z7_+EoheroSoQbo~@TXBu*Sl@MwI0sJ!QVs3K|v~}WfvX*j(nXb6Fz_ZRM@KWYExy{ z8?p?-)#n8!tLOn3XBy250@x>2b8oxyRPZm+ym*ZL%GVF_rtKFlmg0@YF`t2E3_(6; zgZM?ow4@9zy|Jf+3>ynFc`pW1N9uK(4g03W$$p`j-;=|87v!wrU8%`H&X0CdK zL3~gR=}VU4f4Ac)M}@xIXTBq)t~xZD2BY|vu4z1AiNfoWmm4IVY zFN4`}-I#*KkR{6=&E?;M=oGOCV{u4AK2-G~She`b6WBq4BOs z_eiKl~|31^(cl6HQivBtBmTXg8OTtL@6~<%91ZDq|oOc;3uY zNFpDc!fDY|unF%4Vrmg#vnsG=O<=7nF*Eh`x#TArGmyiQw7>89Y}Ijks5~wNCS8ae z$_zyKv9d7dIf_i#tRIP4bW4WN+=)OMONfq-&rV`6%||4_A4Yj`x`oEmgMkyqwL2RM zI+b6)7^;)dUcLkppC9s#ScBapDjMF%%W243T1tw?rz9n50+Y{QrS_~?ic7#DTmJc@WSVorsJi6f( zX}B#+ORE$15G8nDyi(I7W` zw!@)oBCBoA-FVwlH~UWv3d$Ycfx;N%Wchn#JjvWIla`@hIBJmi?AF`O$IyZ8SH&ld zI4vkP!)C>(i0jvanAwa})T#`DtOz}M8;Xj*0M$^B?HA_Z~gUV3pa-m0o5&njx}`6bx3FVsQ^ z$A(04Lcw;b-L1KQu+id>E4QZ?%}Ym1gWZXVPrh7aLcoG>kvFD0?CwM*qCm(51~38Gpe@#)iNY;Ycw}H^xNp0eW0uGq4Xag zE1hC5F4V1C*Ri;P|+^^NhR2fJ(A{uGDDOZYInyAI`Io#LuDT63}4V;jG3=tT|H z!M&y9Ls&A^22}Gq3!i~iac})kJ?jy4;q(4y0(YZ?b1|kPB1o_*e)O#1A)76B51nII zTl9opPo43MTI}_<25!K2H%Is_C8T4)uo6^t7KSk8AaJWaDB$MwCEdLcz_sscJ*mDE zryYI|%?LU3-8~!Em#P~B#Zn=;g*`BCjYh6=kv128_IKb|u{7=r%kfFy9Z@QgO1SoB zh|qgvOo;ol7;mZuM__N-CA24M1c}49b{L_%hRFn&CPk4^S(SkBRFZ;V`KD zOfK2(=^1o(WHdD;s|S@;<*i}x{^M|9c~8UNx8MB4)p%=`W6N(!>d-zm>t{MK?e20F zoHlBBz?a8%`Zfk&J4G^(QqpL^EYlf1p4+U;rE17SY7g~l+k%Mtf)&189av419@6?X z(SPBms%8d-0C%_#c5%GP;~~}Rfk)A(Z?Z3Q_CC#3r}&5R*9`x?Z#99#V>X_b2cx;} z!~%v!I>c+c6}0h|vF>+RgJ2>oViGD+L?svePqZ-z+ze4Q?XvZr^W!xDq(dev9BP6y zy&b^CabKEtQEuFr#kQJSMCf9hcRRmR^y)_j9#5;uw(xX)Bc1!345S=q25>L5VMz8 zU2brpc5CDFX>Em@td%XM(+}a^qeS00|zLh*My#gbQ)b$AW{XNO6ylb62LJ=fv z-nbY5ejd{U`gUeD2}6+$&pP5`Q7cw-162Ty!O}9@XQCvs^0ST?Y^ilHoFI|sJSsmI z2{(x`nAu|~*83J0)5qVdW37$;p%kXUXM(D2&OG7VAyw|rW({85B?V5;7FQ7?VH-py04 zA(4=JL?Z03#b+duN9P{>tDd2@e>B0i%oBKh{}}^KFR~VWyuHNO*;SI;-Rf#ECeHtk zVpTm$ZTmFMg%j@2LMP!s#5Ey7?#%>YIJvsl`{m+i9`|#4?-~~M#ck`38~plSs-T&Z zQyL3|@9ebg5LW)N zvljr=Iw*j~$9o@BqNW3(vTt`i-ZFCg6GU%Lh(_^fW#tL9ORR{INgYUnd{^=1M6mFx zRg3{w=OeUDhj&FLrj;9_P&XB$c!|A-)d-Bs&xQ zWEhwZLKdd40JT*;Og?{hA8J(U=dd=Gu3>2)8eUh$u@*4eR#Q(`D`LMHutVE0TRZOj0AGe5 z7p+J4Pm&k{*MuWOlo>wGNcwyWzmyZ)*zl*#xB7dIBnJeVn zXW*{V=w<-K2NI0|6HgD*WyR3R>qz?73_@R|Wu%K3Wp$IhH7s$|7=y|aobuP63ysNG4?P?buWz9v6 zaks%CR-4@>$A-WCDp9^I;m-~;va%cxZ4^kPLtxx7b`C=u#6stE~G461;8A@#jwAXg{GI93JFp0%R({VjkQEaZqJV zur~?=cC5gPEYkHPt*JHS!|eg4@3!YA3=p01`Ihy_immxwFyBlBpiNZRovA6wr|&Sp z;iKcoS?F|6sQdf$^MY(F=1&JS=r!LAd1_5I)}Po8-|98+mkiGKNhBsGYbYp`u~}PL zO;^mYZ~!MZz)4Wi2tG_P(Q#;|>M;xe+EwP}gdp<=Psqs!kAXG`2>IAspxYxldRQNv zIQ8Dp(D2Z2V`JlcOhq3bHa7N27WMVQ_wPJHs(t+iVyIhhLMkjmHXRHkTWad@o_1d$}kSMxR}rD(ACy{~IDb z=#l#}3z>KdSyz`|KvPn3^7I|iCAPnx_?@wzpRsYagEe)V6Ut!zyCv(!#wLwH5NB`1 zJz6kl2oV9?)Q@*tP=%~vVq(IW=J+0V{qa$9D2qpbS%LZ1^>H0fNKK8d)D%TExO4o~DzMDx9_AF+4=9}tsw!?gNJ`b4jJ1-=@e)^imb@!m{E)e>* z>)7B8xOHDu|NE pM%{4k*b`V^U6keZy7LB_3kch0I>}_VN-m@;MH5n`cg7e#x7_ znB&7p=eLaRcHika)TPtX^A|C1usfsHJ1noB@}D|*?9y=%mSMZfO}Ygbcqw2p%?qmd zf%m|p9ka;F$xb%-KFcLz)&0UQE*@KK?8zP@vc9}1egU+#6b0THNyCS4qyIJyga)D% zKudl!S-$6KZWH$^d4Nh_vf<`wB1V#e;ts&VyU;K&dgM=y39B{PT6LU19=SRqytQm$yLq)sN+o&KA;jJsmc9xAmn0Vpz|XCFM;lOZ*ab{>%F<(JYE{_J9Mp7(3olf7bljqJY7?z*|dZQF@d+Ae(=_x01kqF%ivyrwH#9^7?Cbhf=1<zxrLjz@wT8o$V9#eAc)pV`$0Bc3XH&4AH&gWgR+_%Pq1FjtN9=wpyzVEc*KGWVj zteW&Wyd{q8G4$%aas%-4!Gs&2Wk+W>^iTgS@_Eh;4uc@9$x%1oo8QRex9lpqI8eHf zI9o=#7&vD1go=up9x}vSyz~AbTVGF476kWUa$IQnQk>~40Q3itt$HuGT!_(oTqi)Q z4dmr%#bY@%wl^?ti*!xEe|BMwtZ|G_efiuKREu$+254a++!&Ovg_YY6(T_#-zP>lr z(A50d1mN3=)jvlDAu1W`!RB#hGBh1dnKLQc*MLDlig+V|8pZNMe^VUO zv(Mq_um4=zJmyPOT2-s|q;a3NO_x*6mxu=?FmPv@s;*k4yc#7ZY*grmK=^DOaHcR& z47MHoI++F`Uu73(Ys66p7DgWCs%Aa5IY=>4LPy;aOK3#<_;iBjv)oWFq(vmW$llDj zS6%1jtba>(N|ZRSCG_sMH%%C3oXrM8Q;FF8*f=V5s`3Dj|0HvQS|Wz~!VBC99Borq zbpKL)#3zrlbpD3RoWP=Y;;34g4L_){KUwGLk9roB{>g}yKHKpgbS*q}(Em{O z-;xh0P^NpbAD4i7wA4|tO}2U88nKOb=d+i-ZxDXmQR{1Q;%`AJow2JTKPy~s-!8U! zB$c-v%hPgrCuOuNi6&r6pkCd;b47)^J|qX$P2^j?1Ic0k;dCeoD?XZapa{>B>U`9% zU|5vhL*?`{RwopX!c_UQ&me-OE`RmIu~*ADq8!9wK^5>2TKm8e)+LiXCTRUIM#Ac` z5zw*p-N~_g#V)1Q#9xypnaxNM)(i#;a$ftfg@X}+Ofa6Lfm-M<6lw^3T5)2N`jSXO zc@>57Yr78)-);_HJb7LjKN{y;19KSWR z`^>h!zW4)l3}&Anw%$t(E;dFLOWZsexxdC2USG!CFg$O;f~p3!sAp8<)6>)2K^S#-{qjJ+M@s%o)cN;q97{H>5#6g+&?hC;S4gaJ_esNjzq zJ{@trA&?hO6|E~LH~9nd$ld;Sr}JWxXQa{q_+CIf?Q+AKXLgxscuW44KoeVtqRUkr z^?v5w==+;;zzO6CtYw&mz}E{$Ax{se zxJ-kd900zQyt~`$-^YghjZU{X42s7Zlurc?NVaXhmWi?$Tg|er+BK98ytbv~WMRv9 z-!+Um@R;&)i-=3{I_|!YkYv|b3SrbO%9tAaFr+ObqWz*))89nSmuF?gOzZu-*Ef1p z7JlL0$gYGifAU=+S3_^1k2PWpPRx4>z?2DKkun|#JU~VcaU^40*D+-mD==l#PwV0bpuPyuh&57wqM70B zUZ>yI@x7kgoTy7s{g$o0*O*AX=n*IUL)#LvbxwIsA{Vz{5G!)sLc0AqK`o*Dm6h(n z0q-MP=b#Gql_rs5h3T&cmi=yrGe$oxuTGNO{?sLG?IG`Knue+mc7`wpro-jwd^?^j zTRkQv-Co7#Bykbm{=>T=g1Wk(7vBD?&~QI{8{Mgovw8m)w-3DtO);!KxwIfNkqSf| zHy6hHUKf?aZ{>6R7@=D~lxl1mG^J#q`wx=F_PE6_eki$xwQ($Pe-c`)b$RUCA6*7o zD$gFADx%v?h8^Y?-Loga$E!RS*oj$!(R?{K0|b*i$Snr1gMKc#IUo9O&jyFL5juEA zLI#=aufjNn(J@SL=qLkNHcMFOK+X8Pp!wY;v%m3i+l18}cLhWpH$_LC(!3#`A^+dS z=-U>t@Ufy_p}+cnSC2gNKPe-4EV^0AYw16CLI%_V1}d3@;o|O{ZRi- ztQ+nxe!&W?ra6N8pStq7^KMX3RKf$gB}Jq9`sozAIZ#IPbS(y~efuu~{fE9N8aZ14 z&%shvM&`xsoNK9Pjd7!K1RbkRbs=EMQSq?^j$lC$I-dwHzI0D`jx6^Cw_83-3;zOa zZ)$8)G5rd)I6ZZDLdLo+$?zIAH#25*)E8HNbQ8C_`oXrF`=(q}@0HkWOWT(UT#36= zUR~dx%fa7p_r&P@|9D-Aiw0PqXA(0Uj*8y$ZJ%`bjOYs=U#`7=(K|hQd^lNj3Ilf! zc`n$spva+&N26G{^ zZ19jyL8m~W)2|`+uz4?7NlxI4Y^n>6_WL~fqzERU*3DY+!jXt=5>Y31P@U1_Us8P~ zx6#<3nLm0}Dpm}?zYGEJS7qx=kI2)b8>KHSo>5B;-U+8cYfj@TWh`#tpf)a~kgDIs zM;0v?z_K6)^sP{rzJXA9Rp9f!VU#=y9kAJhbk~+3&rW#S-g&dP=|`^|^X+xkQl`1+ zTS&q}Dj-fB8v??}2F;|mMBlQ5MFqmzmErC;RS@Ie5YQXMv}`>?LtJzbQ0byAo#603 z6L$?T3Qw$=40n#SbRy1>A~W`$6Kl4W7??q%wYs|M#!}6TLMk|U-+PWIDXJqKDx9^c zT=#Ni{8_k`uz;a>Wry3=q%Jye0-5S9oQ)Hd{0*>7aH<5x8v7r6x9um>Y^O0jOUVAH zVCQQPTJbxzSS!A6^RJ~ufhkxLgMJDEJU3sBA`G8V8S5_Cw}+6L?<-@pga(c;5XEbh zZAp)|jdWhP=|eVn^?%&{LHkx3xMVXEFsftfxpc7a0&pqIGQM03mfqIA2bz1SSMXV6 zkt>?7gz7^T^NoZ?zmUIjL-fJW0@b9{c1|SPaM}d@lB!9BebAhaGlEp2@4LyDzrV6bSY6C1q_>@g273YT5!&Q>;T!W2-VO$y;RSyDt>18~KOR^|Smfz(W0|ThayP1)QWRF> zuhKEMb(sBM-WiH`L`EYvS?a<*rA>uD+pX*IB8^aH z4e#%+g-(n=;ZZ5_C{Ke4Ylc(8QEt>M4U2PwgN9)6pXgb0E-{b${YEuY0^jh1*!#7^ zK%4QNfIoF1_SLn;rk(^MMW_HuLJ1X^KO=aP`D}mLExA2SzbHRngAw}-%QI(u z7djH$iBZuK$bXN^z*BAON+hz5)`l#uu+-~wjX&mZ=;I)!QG(B((b&#KPP@VWux=k1 z+?50n%h()%@Coew-@hrz^nWOJF)Q}C(iX|Y_JoZxF1kBjTzn0?+00^xYwdwO;6Y+n8#fA0_gL8V&zABTF=CyHP1)8db)cT*K@EvCaYuv@R6GB9ut2G40M^fvb|k!{LeT1uCJOU01wm? zzfW%4gA#C?18_1QKv zCvvib_lA_(bNu~I`f9nf!$XAmlFy;cwPMyy!dQ16=O%mZy_U~hd+ObwfkscYetqmY zU1LhbQ;MKC(X($Xr(G#_uLCqKpjyz)kJs7$wmAM(v2*ds9NJ?2(VipiXWI+jryas@ zpt$C3Mzd3i*aaoaG`Zg}`c2G|%JV7S?^`QOCSOGj0LW8u`&(gv+9S8V{fnv6&hxKx z(R1xBE9ba2f0p`e7sEzQ*O4`fZC&@rpfVoC{cp-ZscD1PZyw?60h#I6F6|~0+no4n zJU)2#S)F0VN*_*C$x-5c(5u*D{wDw^?WMlGgG!wC_Pa6=&k0za8Y^Nc+3q-AXad+bL(RR8 z@yvkO#aGwAV6*<|Sm4L|+7h9=+7Js7W-xTn-`j;y~6=HpT=_aJ_ zjJ2gUrnAMc8lGhPdmm6kZvp{4ujrTM6Ma&}eRJ4va8jU&u$7y3nz1LYXqo_49KxFO zX956fr(W`-i;~vYbWkAn=bkBv-I5Y7s3yco=o{x48dd$&A)Be`IuoZWdl=31tRvz4 zDR=aeSN~bDOh7mu1wpf6pwPaMU#WvRu&QgHQ>)E;6B%Na_OvDkn3Tlb zb-P*PW+>EZ@~hPC2C(N@1g4i^y%4V!JKLGdB4^ICd@rh@`As&r6)5-`$J{OpKhU7( zk)H7^DPHn5_loZpLH?#d+FXuHodTPeyih1-wCHmz-PTqAA=wRoijm|R(UM8EZDX7G z@DBjU+R=JZ>-8d(D zj(c_!21jRYJHWbBo!faq%lPS$b%$KginOMhpqd&-*y@B@j!2K(&wEPFNZy)K+6VlBhJf7DG_-}|X(-Rgrgi%O<<>v)8H(tTSb zXAMAigH%GS*f&PC^yi_Nf2sd+Nma58Iuw?u!x5UdwAI*A;Tq$>=}nqmbxOkRbwB;Q z=V|t2&z#m?EEQjF&WWOMBxt%q-qRAU)WLrKyhUA{lhY*oyh6LJzk2Y#l-vLL4%cjWc98VIA7UBL+7A8uDOdobuKA~Gqfno zL{31COxes#<%W5)?2|$_ZPY|g?=-Vvs*C#&y1Cp?Q}KAB`Fp<~%KK72X}kBWECUNV zPiSaZ_a80H{NOaGf5|GM`%pC?ee-MS&s=$8D2)_9N~;?#{*qxUrEvp+_MwBN`VYS;gd@H1u&ZKb*VQ@Gnb7oL!4c}kS`rAZxiqBvO(I@JG<(XrtLgeC z1J=m*&6@mp+3}|QuztNeiKQ}E-Ip-saNueD6)gY}V1_k*3vYJOeknn4)H=Ji29L^u zl14ah1~oHEkaInahMbQ!bP@ObJ?-i|6HkMf)_L!owq5%=`I-^;r)Zn&v`~kGFJsEb z&4**)3mmmd-so+()8|>XlP?G6TlQi(et%h&+A4to6=j17j&V-Rli~@9f6AIv3Yqyf zDG8tTg;IKEShid{SnIdB)mg60rX{Q2bdG>4_r5LY&3`SAX4!|EushCYhY%P1foS(ELysUqeB0YB=J(ryf3WWF}+LPF3OA>#V(a9b*v9uNxw!>zG#2Mjavs zDH_t9lu$~6`UJ93?@{fhe09-&1+M>*y&y&`VYDmcLPHj%lDO?o7#z!Kj5+?#_XVZ< z4Q&~RuAhm|EpE6cz`t;l+9Zzi4>$yt2RI=t9M7n2 zo^BbD=N_pz60vU^kw-gc;B^b6;WGwIB1-fRd~AM|+Ua{IFK(IW4%^;_$%0S^E;KtfDnT25Y8 zsgDOF>e@Sab}!16JZ9@jP!iIKg{;UZI-0_KrS0Q$vGc+h6I34lIV=LyXDe~_*}O>X zOU1q1L1TCaTQtm(NWvn}wty=4p>)7^YvNFtdyBr~UTXX#`7d}vH}Y?Dol4(uHp^kl zHM5exnv=cI+0QSH(_w56*d`6Ne|B?|<2em2v;3?*;u3cHu`=29^Qlh5^?@+w9F1x3 zY?7{|!k4VAp1Oi*QU{7T2)2kWuS>yQ-CBQZ?BR^IIIpaC0qPTftdLrZ3>tCb2J)jC z=sZ?YacC<-hPbk-Ywhq?Ka9m-im7Lw2)}2-UO|N+D~AF;H7l#P zCw+r}h=FN3f}mSJ$hWJm1qlubO-5$+ZK=QfnP6W%a9QA)20GR`+p!}-PU zX(C~#gNS?6je@c6Vl3?-=e&Ba-yglpvpjb1X=#3sVv|rfvEir`3H1jxg@EKyH<-$- zQ^2XtpYV2U0MEjo?{E)Sp8ojb=W*k-5E>RH2^9sf>~QT&Qfb}-Cu$Gg{T~*3 z^yuo{KWEts0+>Ne4nt-6MKG4v>@~Ko?db31Zv`mmd>_7H9OtfPRh!h+dfX8c(bHVG zp*;}_KFGg)I-HNEwlN++Y%zIOk7RCt@BuSjPQ}uFaXoiriUP2ASKek;^&C`Y-Q<>( zv|r~{Y(Ds64v^~{WvEXWdbmAwfMmdJ!od4c)!v}><0tq{{bwiC^Bk7)@w_K| zpM1!f*E_>9eGO{KjIof+ljemo2z+m1$yB!5ZfFZ^qC~1B`T4qXb}T!{)!#9MshqZ4Gz|J61mS8xq-i&2=o10HEIUt)blhLFZ*|9v;==HbtvdptEc^ z(~#vg%eP~m>&|Vo)Yii&>GV69__`($;;bj~n+I|5m?vrCAY05`2Y`qVu6b;NC!}yJPDFMHQ#LKe1bQO1LG-DQTPR7 zRIa9ZE4r&EBKzKp9o;}CjmO5Ia_p7xO%_*qt@m9-9KFwBCdqKifl~qHXyx z`MJ)H+=W4icVOoN3Gns#!A36p=VA29$d|i ze!hvbr(EM<>W^0<34=|K$LR~q@xmwypmW8T1#4jdQAyHBpZB5Df&8(nDHzh|e_;+Z z%-Q%JP2by_6LxmDgmoEcO7#=OroDWdPH50U8t%xy!wONTwJc7 z5FLONE9uRM8b9={=6{y0$&jIIDeRGT0o)W_pah)-=r5oovc7$M>v?`YR=8N|y!((i zHL&FGAS-v_qN>uEb4=l}x#uU;LM+_R%*=fD<_$jnvoV(xj{mQ_bB}7`%H#NiP^gBN zVi1Udx=N@B7(zfGqEXPQ1zMg08U;Z}gn$u=B8Wj=Mi7>20gGZ31%(78gn&HKplu>A z1;L2KAQpL)heo@IRnh&W?K!*M({uLhzx(H$$z+nroqKQQe!jmCZ9nsEFu~oO_S3Ux zkz4_tVi~VHEs)>+o8zleuh7KfDoNVe^^~x~iqe9DPG&(uux45GvP7^|q z@Wt7{jYsV&$|#yqbSni8$ag=z+HcN^CJ2)6e7J5RVv6X+5eYp5g0L*2|zV1FA3_ye2;hUt~bu0=$;~O&YhWOdUxbA+P+rdy!`U$^jZ-ms^ji zEh+iQh0QS*UK(D&g0h@Z-AIQ-t&aUtwRW|=@`WV!?Tg(f_0+kAr*dRsjN@E_;@E{5 zj_NpDqF$F*ere7XQn2FeVjUIZw~*}4n$}gLITfRtThifu`VEXZ1*d!Z!s&F`UD?Kn z=n-CR9jj;Hkqhu<%gCVa`)|8L3IpfZ&Hw9B|8~y4zjv4Ln)En9dOM0v-&|W#sLGxs z4OG4Bkd zO_C5Tx9fZxZs{kZP|`FXGH+mY9i5joKrSy_IE0s@%Wb52oTAJZ|{ z0{zuRmNgNQbRx$U=_H}>>J3A%ND;9q+d`6D18rDef-j2Je5im-)aM!?Q84koj38b0E8=|bVb!KMW3(WNFm!;<;WZEXC+9(1Zd@O~<}Yv6q2 z3t8E%Z2A`*eySRtLdehedk+Ih=vSm#MkO@N6=rLtpQFnOND^pm*yY>Z# zt9P5}ne^+g{SDDyF|0|+#Go1-M1v-#dSY^t&Lw{XNz(#m+LlSVF~%9yu>xt)GLEqD|}X$^%-)oz(THZKH`vxf(^A9Ms!`Q75UD|v=AZ;_wIsWQyRkLr(`Kqb|5laFC{kJ4x zv4KZF*nb>Qe?4hz9L{(0N!^HoF`29L`e%@d%g~-2 ztgkwWxwnb;W2VQ^qfOJeJHOtOKF5Jz7*QLKjZ<2?YON2xl!;Ro6%RQ)CG(G6kuP_) z&G^u@`s!&btjr9TeWv4j|0<{Bh6o?J3B+>6roGy-U4cQgKyRElu58GgZ{M82bhcY# zTfvy8^jffdWs35VpI=V5^Hq&p9VoYfcJ%PrRhnzzScWc(=n-pLH;-pI*Yf7E+@eEx zg2ZTe>2_q@C*BB5)DT{)dHfFB1BrvQse|+gR$tX^L`{HSh9f_+Q*YxGXqTv~Z}m>X zIDzui;RI!DsFV*s;ABBNdp^#8=cYsf&AIkTX<;Fu4G)1%nO~S6=; zR8Tzqv`O-C4ADmivAo@I-XJ%(z+ua~gq8w%`#&az_$-ngEnd&vr~G(LDcDMOQ?BMF z_HTen4IKFJ|8jJKs)@S#@?o!x7$4z*hXU1r8{(6`#~cxy#5BqK+|i3;Dqh6L;*kny zD@H&3kaaWeskha)_hY*HIJ={y4E1_5Tv+x6*8RH;G#G&+RfEm35Z)zUJN`Xk*1n{^ zjuTaZHIL~Y&kS!ryUZ+#e=|F|Xujbf)+%wD3N^>;W@uS+?b5q+jT-bsF}J;zTSUA)O*1OHHWIi&_Dg1 z{QZwB8aF4w!vD_~XI9y*S*y^>7vA^lxO0c7VC%j(lctIVe-jmH+iSeY!k%^xT|lx* zeM~dsl%Ie_kT<5yED~u;dDE%Ns>)S0JP!Lh5$r&}c}K@~fct#K`yVx7cYHy^jJZ7ixoLE%(RXU zU}wZ~)`Eo?uak>$s)=~j4ogGicPg0Ik`GT=ZaB8ZeFaz{))XJS^*5pOwr$($H~9-% zaIJ>l*JO^sWX?4o>z;MhifytW^~vcFPtn6$>^!Jf9vTzYJi4mBKmvu1m%Ym&Za(RblA=8Hsas%#WlJUuJ|C`&aL0O>Ton))w-*}V$D=y-qZpw z@A`?XM{#+r+>p}C%P*R{AI2KE!7 zmA>~993dO+Y+&%G?AmOMqL%m8s+=K1g6KnsfJjqtuY|MW{waxshFfe54yuK4jF$E< zf{D0C7YDc)bQ($@(nV>Cdm0g$qoDV0LzzYNWqEIKue?rW@XJd!`9YlbMHA!P^t20% zO8iJ9^66*OE!qh;lw(y0H35u~2+Hd-RVfsd0S*>sNy|=5d_-A}nAlstiZ_I9WZPyR K&+1KKC;kcZz6#v{ From 2f87036ec2cb103def9cce69e7fa84d2f4d47aa0 Mon Sep 17 00:00:00 2001 From: sudarshan-SpringML Date: Tue, 6 Sep 2022 10:55:39 +0000 Subject: [PATCH 26/34] ran linter --- ..._tabular_regression_model_evaluation.ipynb | 2989 ++++++++--------- 1 file changed, 1486 insertions(+), 1503 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index 0ad4b7ec6..8a1046ca5 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -1,1505 +1,1488 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Regression model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI regression model evaluation component to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you learn how to evaluate a Vertex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `AutoML`\n", - "- Vertex AI `TabularDataset` (AutoML)\n", - "- Vertex AI `AutoMLTabularTrainingJob`\n", - "- Vertex AI `BatchPrediction`\n", - "- Vertex AI `Pipeline`\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Create a Vertex AI Dataset\n", - "- Configure a `AutoMLTabularTrainingJob`\n", - "- Run the `AutoMLTabularTrainingJob` which returns a model\n", - "- Import a pre-trained `AutoML model resource` into the pipeline\n", - "- Run a `batch prediction` job\n", - "- Evaulate the AutoML model using the `regression evaluation component`" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook. You can skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. \n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). \n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "IS_COLAB=False\n", - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "import matplotlib.pyplot as plt\n", - "from google.cloud import aiplatform_v1\n", - "from kfp.v2 import compiler" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BiVlyW5OUnjK" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bViYfWfpVAiF" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "20S9En09X0PY" - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset\",\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print(\"Resource name:\", dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "A-QQkeUnq8Xt" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple regression model using the created dataset using `Age` as the target column. \n", - "\n", - "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Bxn6ATUXrET6" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2e7664fe3af6" - }, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if (\n", - " TRAINING_JOB_DISPLAY_NAME == \"\"\n", - " or TRAINING_JOB_DISPLAY_NAME is None\n", - " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", - "):\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6cb41277f4f3" - }, - "source": [ - "An AutoML training job is created with the `AutoMLTabularTrainingJob` class, with the following parameters:\n", - "\n", - "- `display_name`: The human readable name for the `TrainingJob` resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", - "- `column_transformations`: Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using \".\" as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. If none of column_transformations or column_specs is passed, the local credentials being used will try setting column_transformations to \"auto\". To do this, the local credentials require read access to the GCS or BigQuery training data source.\n", - "- `optimization_objective`: The optimization objective to minimize or maximize.\n", - " - `minimize-rmse`\n", - " - `minimize-mae`\n", - " - `minimize-rmsle`\n", - "\n", - "To learn more about `AutoMLTabularTrainingJob` click [here](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob) " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3l691PEMZFdA" - }, - "outputs": [], - "source": [ - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"regression\",\n", - " column_specs={\n", - " \"Type\":\"categorical\",\n", - " \"Breed1\":\"categorical\",\n", - " \"Gender\":\"categorical\",\n", - " \"Color1\":\"categorical\",\n", - " \"Color2\":\"categorical\",\n", - " \"MaturitySize\":\"categorical\",\n", - " \"FurLength\":\"categorical\",\n", - " \"Vaccinated\":\"categorical\",\n", - " \"Sterilized\":\"categorical\",\n", - " \"Health\":\"categorical\",\n", - " \"Fee\":\"numeric\",\n", - " \"PhotoAmt\":\"numeric\",\n", - " \"Adopted\":\"categorical\",\n", - " },\n", - " optimization_objective=\"minimize-rmse\",\n", - ")\n", - "\n", - "print(train_job)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "391c51c98647" - }, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "454f077b984e" - }, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "c4f338cdea7c" - }, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if (\n", - " MODEL_DISPLAY_NAME == \"\"\n", - " or MODEL_DISPLAY_NAME is None\n", - " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", - "):\n", - " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "de7e24205889" - }, - "source": [ - "Next, you start the training job by invoking the method `run`, with the following parameters:\n", - "\n", - "- `dataset`: The `Dataset` resource to train the model.\n", - "- `target_column`: The name of the column, whose values the model is to predict.\n", - "- `training_fraction_split`: The percentage of the dataset to use for training.\n", - "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", - "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", - "- `model_display_name`: The human readable name for the trained model.\n", - "- `disable_early_stopping`: If true, the entire budget is used.\n", - "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5caae7fc10d9" - }, - "source": [ - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "IIfvPCGYyFCT" - }, - "outputs": [], - "source": [ - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=\"Age\",\n", - " training_fraction_split=0.8,\n", - " validation_fraction_split=0.1,\n", - " test_fraction_split=0.1,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " disable_early_stopping=False,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rYirKB_9yaa0" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "KkgCdQQAyZP1" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3f4d0c17150d" - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "581a188f0453" - }, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5adde0951eb5" - }, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ktMsqtibAUzz" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = \"jsonl\",\n", - " batch_predict_machine_type: str = \"n1-standard-4\",\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = \"\",\n", - "):\n", - "\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " EvaluationDataSamplerOp, GetVertexModelOp,\n", - " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", - " ModelImportEvaluationOp)\n", - "\n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size,\n", - " )\n", - "\n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs[\"model\"],\n", - " job_display_name=\"model-registry-batch-predict-evaluation\",\n", - " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationRegressionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " predictions_format=batch_predict_predictions_format,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format=\"jsonl\",\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " problem_type=prediction_type,\n", - " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", - " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", - " model=get_model_task.outputs[\"model\"],\n", - " dataset_type=batch_predict_instances_format,\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "a712dfa762ee" - }, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_regression_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NOvOMTEgCVcW" - }, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_regression_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bdd9e2fd6841" - }, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "8f17c5c7b3e3" - }, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "1aa7d7bbb1c9" - }, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if (\n", - " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", - " or PIPELINE_DISPLAY_NAME == \"\"\n", - " or PIPELINE_DISPLAY_NAME is None\n", - "):\n", - " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "90f424d5dca0" - }, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", - "- `prediction_type`: Type of the prediction (In this tutorial, it is \"regression\").\n", - "- `target_column_name`: Name of the column to be used as the target for regression.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "JeSiA6-TSgV8" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", - "parameters = {\n", - " \"project\": PROJECT_ID,\n", - " \"location\": REGION,\n", - " \"root_dir\": PIPELINE_ROOT,\n", - " \"model_name\": model.resource_name,\n", - " \"prediction_type\": \"regression\",\n", - " \"target_column_name\": \"Age\",\n", - " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", - " \"batch_predict_instances_format\": \"csv\",\n", - " \"batch_predict_explanation_data_sample_size\": 3000,\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, you create the pipeline job, with the following parameters:\n", - "\n", - "- `display_name`: The user-defined name of this Pipeline.\n", - "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI (e.g. \"gs://project.name\"), or an Artifact Registry URI (e.g. \"https://us-central1-kfp.pkg.dev/proj/repo/pack/latest\").\n", - "- `parameter_values`: The mapping from runtime parameter names to its values that control the pipeline run.\n", - "- `enable_caching`: Whether to turn on caching for the run. If this is not set, defaults to the compile time settings, which are True for all tasks by default, while users may specify different caching options for individual tasks. If this is set, the setting applies to all tasks in the pipeline. Overrides the compile time settings.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e8dce0638349" - }, - "source": [ - "Run the pipeline using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pdHib_yUEuEk" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_regression_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "625960707c60" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U2zocUvk2YVs" - }, - "source": [ - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "mtHA8rhGGQv3" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if (\n", - " (\"model-evaluation\" in task.task_name)\n", - " and (\"model-evaluation-import\" not in task.task_name)\n", - " and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " )\n", - " ):\n", - " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", - " 0\n", - " ] # ['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e69f183f902b" - }, - "source": [ - "### Visualize the metrics\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "b7c5e5c35ee9" - }, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " if (\n", - " i[0] == \"meanAbsolutePercentageError\"\n", - " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", - " continue\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(10, 5))\n", - "plt.bar(x=metrics, height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "c26ad3958895" - }, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance.\n", - "\n", - "To learn more about Feature Attributions click [here](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "b09056628b26" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if (task.task_name == \"feature-attribution\") and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " ):\n", - " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print(feat_attrs)\n", - "print(feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "4d9d6a82d826" - }, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "c26a2091f4fc" - }, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print(attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "77151be8d776" - }, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "069bf017e0de" - }, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - "\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "automl_tabular_regression_model_evaluation.ipynb", - "toc_visible": true - }, - "environment": { - "kernel": "python3", - "name": "common-cpu.m94", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/base-cpu:m94" - }, - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.12" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Regression model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI regression model evaluation component to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a Vertex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML`\n", + "- Vertex AI `TabularDataset` (AutoML)\n", + "- Vertex AI `AutoMLTabularTrainingJob`\n", + "- Vertex AI `BatchPrediction`\n", + "- Vertex AI `Pipeline`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI Dataset\n", + "- Configure a `AutoMLTabularTrainingJob`\n", + "- Run the `AutoMLTabularTrainingJob` which returns a model\n", + "- Import a pre-trained `AutoML model resource` into the pipeline\n", + "- Run a `batch prediction` job\n", + "- Evaulate the AutoML model using the `regression evaluation component`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook. You can skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). \n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "IS_COLAB = False\n", + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BiVlyW5OUnjK" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bViYfWfpVAiF" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "20S9En09X0PY" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset\",\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A-QQkeUnq8Xt" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple regression model using the created dataset using `Age` as the target column. \n", + "\n", + "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Bxn6ATUXrET6" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2e7664fe3af6" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6cb41277f4f3" + }, + "source": [ + "An AutoML training job is created with the `AutoMLTabularTrainingJob` class, with the following parameters:\n", + "\n", + "- `display_name`: The human readable name for the `TrainingJob` resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", + "- `column_transformations`: Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using \".\" as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. If none of column_transformations or column_specs is passed, the local credentials being used will try setting column_transformations to \"auto\". To do this, the local credentials require read access to the GCS or BigQuery training data source.\n", + "- `optimization_objective`: The optimization objective to minimize or maximize.\n", + " - `minimize-rmse`\n", + " - `minimize-mae`\n", + " - `minimize-rmsle`\n", + "\n", + "To learn more about `AutoMLTabularTrainingJob` click [here](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3l691PEMZFdA" + }, + "outputs": [], + "source": [ + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"regression\",\n", + " column_specs={\n", + " \"Type\": \"categorical\",\n", + " \"Breed1\": \"categorical\",\n", + " \"Gender\": \"categorical\",\n", + " \"Color1\": \"categorical\",\n", + " \"Color2\": \"categorical\",\n", + " \"MaturitySize\": \"categorical\",\n", + " \"FurLength\": \"categorical\",\n", + " \"Vaccinated\": \"categorical\",\n", + " \"Sterilized\": \"categorical\",\n", + " \"Health\": \"categorical\",\n", + " \"Fee\": \"numeric\",\n", + " \"PhotoAmt\": \"numeric\",\n", + " \"Adopted\": \"categorical\",\n", + " },\n", + " optimization_objective=\"minimize-rmse\",\n", + ")\n", + "\n", + "print(train_job)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "391c51c98647" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "454f077b984e" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c4f338cdea7c" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "de7e24205889" + }, + "source": [ + "Next, you start the training job by invoking the method `run`, with the following parameters:\n", + "\n", + "- `dataset`: The `Dataset` resource to train the model.\n", + "- `target_column`: The name of the column, whose values the model is to predict.\n", + "- `training_fraction_split`: The percentage of the dataset to use for training.\n", + "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", + "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", + "- `model_display_name`: The human readable name for the trained model.\n", + "- `disable_early_stopping`: If true, the entire budget is used.\n", + "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5caae7fc10d9" + }, + "source": [ + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IIfvPCGYyFCT" + }, + "outputs": [], + "source": [ + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=\"Age\",\n", + " training_fraction_split=0.8,\n", + " validation_fraction_split=0.1,\n", + " test_fraction_split=0.1,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " disable_early_stopping=False,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rYirKB_9yaa0" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3f4d0c17150d" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "581a188f0453" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5adde0951eb5" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ktMsqtibAUzz" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " prediction_type: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_starting_replica_count: int = 5,\n", + " batch_predict_max_replica_count: int = 10,\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = \"\",\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " starting_replica_count=batch_predict_starting_replica_count,\n", + " max_replica_count=batch_predict_max_replica_count,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=\"jsonl\",\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " problem_type=prediction_type,\n", + " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a712dfa762ee" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_regression_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NOvOMTEgCVcW" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bdd9e2fd6841" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8f17c5c7b3e3" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1aa7d7bbb1c9" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "90f424d5dca0" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", + "- `prediction_type`: Type of the prediction (In this tutorial, it is \"regression\").\n", + "- `target_column_name`: Name of the column to be used as the target for regression.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"prediction_type\": \"regression\",\n", + " \"target_column_name\": \"Age\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "859fa6611d9a" + }, + "source": [ + "Next, you create the pipeline job, with the following parameters:\n", + "\n", + "- `display_name`: The user-defined name of this Pipeline.\n", + "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI (e.g. \"gs://project.name\"), or an Artifact Registry URI (e.g. \"https://us-central1-kfp.pkg.dev/proj/repo/pack/latest\").\n", + "- `parameter_values`: The mapping from runtime parameter names to its values that control the pipeline run.\n", + "- `enable_caching`: Whether to turn on caching for the run. If this is not set, defaults to the compile time settings, which are True for all tasks by default, while users may specify different caching options for individual tasks. If this is set, the setting applies to all tasks in the pipeline. Overrides the compile time settings.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e8dce0638349" + }, + "source": [ + "Run the pipeline using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pdHib_yUEuEk" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_regression_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "625960707c60" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U2zocUvk2YVs" + }, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mtHA8rhGGQv3" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", + " 0\n", + " ] # ['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e69f183f902b" + }, + "source": [ + "### Visualize the metrics\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b7c5e5c35ee9" + }, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " if (\n", + " i[0] == \"meanAbsolutePercentageError\"\n", + " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", + " continue\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(10, 5))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c26ad3958895" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance.\n", + "\n", + "To learn more about Feature Attributions click [here](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b09056628b26" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4d9d6a82d826" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c26a2091f4fc" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "77151be8d776" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "069bf017e0de" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "automl_tabular_regression_model_evaluation.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 } From bef3840bb41a3cb77ac6b653c592a10943db3012 Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Wed, 7 Sep 2022 06:01:04 +0000 Subject: [PATCH 27/34] addresses review comments: textual updates, removes unnecessary parameters --- ...ular_classification_model_evaluation.ipynb | 2804 ++++++++--------- 1 file changed, 1373 insertions(+), 1431 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb index 6c4a483a0..ef9d876f8 100644 --- a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb @@ -1,1433 +1,1375 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `Datasets`\n", - "- Vertex AI `Training`(AutoML Tabular Classification) \n", - "- Vertex AI `Pipelines`\n", - "- Vertex AI `Batch Predictions`\n", - "\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Create a Vertex AI `Dataset`.\n", - "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", - "- Import the trained `AutoML model resource` into the pipeline.\n", - "- Run a `Batch Prediction` job.\n", - "- Evaulate the AutoML model using the `Classification Evaluation Component`." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "import matplotlib.pyplot as plt\n", - "from google.cloud import aiplatform_v1\n", - "from kfp.v2 import compiler" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8d97acf78771" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3390c9e9426c" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2011a473ce65" - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print(\"Resource name:\", dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6da01c2f1d4f" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple classification model the created dataset using `Adopted` as the target column. \n", - "\n", - "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "5dd3db2d1225" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "0614e3fb19da" - }, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if (\n", - " TRAINING_JOB_DISPLAY_NAME == \"\"\n", - " or TRAINING_JOB_DISPLAY_NAME is None\n", - " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", - "):\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce9c9f279674" - }, - "source": [ - "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", - "\n", - "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", - "- `column_specs`(Optional): Transformations to apply to the input columns(including data-type corrections).\n", - "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", - "\n", - "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d33629c2aae6" - }, - "outputs": [], - "source": [ - "# Define the AutoML training job\n", - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"classification\",\n", - " column_specs={\n", - " \"Type\": \"categorical\",\n", - " \"Age\": \"numeric\",\n", - " \"Breed1\": \"categorical\",\n", - " \"Color1\": \"categorical\",\n", - " \"Color2\": \"categorical\",\n", - " \"MaturitySize\": \"categorical\",\n", - " \"FurLength\": \"categorical\",\n", - " \"Vaccinated\": \"categorical\",\n", - " \"Sterilized\": \"categorical\",\n", - " \"Health\": \"categorical\",\n", - " \"Fee\": \"numeric\",\n", - " \"PhotoAmt\": \"numeric\",\n", - " },\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "391c51c98647" - }, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "454f077b984e" - }, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "21b5a27e8171" - }, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if (\n", - " MODEL_DISPLAY_NAME == \"\"\n", - " or MODEL_DISPLAY_NAME is None\n", - " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", - "):\n", - " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "93ebafd3f347" - }, - "source": [ - "Run training job on the created TabularDataset by passing the following arguments for training:\n", - "\n", - "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", - "- `target_column`: The name of the column values of which the Model is to predict.\n", - "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", - "- `budget_milli_node_hours`(Optional): The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", - "\n", - "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", - "\n", - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "9ce44a2ab942" - }, - "outputs": [], - "source": [ - "# Specify the target column\n", - "target_column = \"Adopted\"\n", - "\n", - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=target_column,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bfa52eb3f22f" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d56e2b3cf57d" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bd2e1da7a64e" - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "19c434d8b035" - }, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "1db1b1337f20" - }, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "e6e1c0ecc3b6" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = \"jsonl\",\n", - " batch_predict_machine_type: str = \"n1-standard-4\",\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - "):\n", - "\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " EvaluationDataSamplerOp, GetVertexModelOp,\n", - " ModelEvaluationClassificationOp, ModelEvaluationFeatureAttributionOp,\n", - " ModelImportEvaluationOp)\n", - "\n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size,\n", - " )\n", - "\n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs[\"model\"],\n", - " job_display_name=\"model-registry-batch-predict-evaluation\",\n", - " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationClassificationOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " problem_type=prediction_type,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " predictions_format=batch_predict_predictions_format,\n", - " )\n", - "\n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format=\"jsonl\",\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " problem_type=prediction_type,\n", - " classification_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", - " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", - " model=get_model_task.outputs[\"model\"],\n", - " dataset_type=batch_predict_instances_format,\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "1abb012ce04b" - }, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_classification_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "e526b588cae9" - }, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_classification_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "26eef4b83c88" - }, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "63b84f5490d2" - }, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "e0a18b803bb7" - }, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if (\n", - " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", - " or PIPELINE_DISPLAY_NAME == \"\"\n", - " or PIPELINE_DISPLAY_NAME is None\n", - "):\n", - " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "9149549cfd4d" - }, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", - "- `prediction_type`: Type of the prediction (In this tutorial, it is \"classification\").\n", - "- `target_column_name`: Name of the column to be used as the target for classification.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6223d67277f3" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", - "parameters = {\n", - " \"project\": PROJECT_ID,\n", - " \"location\": REGION,\n", - " \"root_dir\": PIPELINE_ROOT,\n", - " \"model_name\": model.resource_name,\n", - " \"prediction_type\": \"classification\",\n", - " \"target_column_name\": \"Adopted\",\n", - " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", - " \"batch_predict_instances_format\": \"csv\",\n", - " \"batch_predict_explanation_data_sample_size\": 3000,\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0409b0f330c2" - }, - "source": [ - "Create a Vertex AI pipeline job using the following parameters:\n", - "\n", - "- `display_name`: The name of the pipeline, this will show up in the Google Cloud console.\n", - "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI or an Artifact Registry URI.\n", - "- `parameter_values`: The mapping from runtime parameter names to its values that\n", - " control the pipeline run.\n", - "- `enable_caching`: Whether to turn on caching for the run.\n", - "\n", - "Learn more about the `PipelineJob` class from [this documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).\n", - "\n", - "After creating, run the pipeline job using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "894afe1ba396" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_classification_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ec4ec00ab350" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if (\n", - " (\"model-evaluation\" in task.task_name)\n", - " and (\"model-evaluation-import\" not in task.task_name)\n", - " and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " )\n", - " ):\n", - " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[0]\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ca00512eb89f" - }, - "source": [ - "### Visualize the metrics\n", - "\n", - "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "f9e38f73f838" - }, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=metrics, height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "049c9bbae2cb" - }, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "03ca8c149bc6" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if (task.task_name == \"feature-attribution\") and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " ):\n", - " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print(feat_attrs)\n", - "print(feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "719d2cd57d10" - }, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "82e308dd8aca" - }, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print(attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5bfe517357f8" - }, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d7a7dca9e3cc" - }, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - "\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "automl_tabular_classification_model_evaluation.ipynb", - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - } - }, - "nbformat": 4, - "nbformat_minor": 0 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `Datasets`\n", + "- Vertex AI `Training`(AutoML Tabular Classification) \n", + "- Vertex AI `Model Registry`\n", + "- Vertex AI `Pipelines`\n", + "- Vertex AI `Batch Predictions`\n", + "\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI `Dataset`.\n", + "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", + "- Import the trained `AutoML model resource` into the pipeline.\n", + "- Run a `Batch Prediction` job.\n", + "- Evaulate the AutoML model using the `Classification Evaluation Component`.\n", + "- Import the Classification Metrics to the AutoML model resource." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43", + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "from kfp.v2 import compiler\n", + "from google.cloud import aiplatform_v1\n", + "import matplotlib.pyplot as plt\n", + "import json" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print (\"Resource name:\",dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple classification model the created dataset using `Adopted` as the target column. \n", + "\n", + "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if TRAINING_JOB_DISPLAY_NAME == \"\" or \\\n", + " TRAINING_JOB_DISPLAY_NAME is None or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\":\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", + "\n", + "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", + "- `column_specs`(Optional): Transformations to apply to the input columns(including data-type corrections).\n", + "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", + "\n", + "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define the AutoML training job\n", + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"classification\",\n", + " column_specs={\n", + " \"Type\": \"categorical\",\n", + " \"Age\": \"numeric\",\n", + " \"Breed1\": \"categorical\",\n", + " \"Color1\": \"categorical\",\n", + " \"Color2\": \"categorical\",\n", + " \"MaturitySize\": \"categorical\",\n", + " \"FurLength\": \"categorical\",\n", + " \"Vaccinated\": \"categorical\",\n", + " \"Sterilized\": \"categorical\",\n", + " \"Health\": \"categorical\",\n", + " \"Fee\": \"numeric\",\n", + " \"PhotoAmt\": \"numeric\",\n", + " }\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if MODEL_DISPLAY_NAME == \"\" or \\\n", + " MODEL_DISPLAY_NAME is None or MODEL_DISPLAY_NAME == \"[your-model-display-name]\":\n", + " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run training job on the created TabularDataset by passing the following arguments for training:\n", + "\n", + "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", + "- `target_column`: The name of the column values of which the Model is to predict.\n", + "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", + "- `budget_milli_node_hours`(Optional): The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", + "\n", + "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", + "\n", + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Specify the target column\n", + "target_column = \"Adopted\"\n", + "\n", + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=target_column,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a Dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(\n", + " name='vertex-evaluation-automl-tabular-classification-feature-attribution')\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = 'jsonl',\n", + " batch_predict_machine_type: str = 'n1-standard-4',\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000\n", + "):\n", + " \n", + " from google_cloud_pipeline_components.experimental.evaluation import GetVertexModelOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import EvaluationDataSamplerOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationClassificationOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationFeatureAttributionOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import ModelImportEvaluationOp\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " \n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + " \n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size\n", + " )\n", + " \n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs['model'],\n", + " job_display_name='model-registry-batch-predict-evaluation',\n", + " gcs_source_uris= data_sampler_task.outputs['gcs_output_directory'],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationClassificationOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " problem_type=\"classification\",\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", + " predictions_format=batch_predict_predictions_format\n", + " )\n", + " \n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=batch_predict_predictions_format,\n", + " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory']\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " classification_metrics=eval_task.outputs['evaluation_metrics'],\n", + " feature_attributions=feature_attribution_task.outputs['feature_attributions'],\n", + " model=get_model_task.outputs['model'],\n", + " dataset_type=batch_predict_instances_format\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_classification_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_classification_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\" or \\\n", + " PIPELINE_DISPLAY_NAME == \"\" or PIPELINE_DISPLAY_NAME is None:\n", + " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", + "- `target_column_name`: Name of the column to be used as the target for classification.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction. Can be '**jsonl**' or '**bigquery**' or '**csv**'.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", + "parameters = {\n", + " 'project':PROJECT_ID,\n", + " 'location':REGION,\n", + " 'root_dir':PIPELINE_ROOT,\n", + " 'model_name':model.resource_name,\n", + " 'target_column_name':\"Adopted\",\n", + " 'batch_predict_gcs_source_uris':[DATA_SOURCE],\n", + " 'batch_predict_instances_format':'csv',\n", + " 'batch_predict_explanation_data_sample_size': 3000\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create a Vertex AI pipeline job using the following parameters:\n", + "\n", + "- `display_name`: The name of the pipeline, this will show up in the Google Cloud console.\n", + "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI or an Artifact Registry URI.\n", + "- `parameter_values`: The mapping from runtime parameter names to its values that\n", + " control the pipeline run.\n", + "- `enable_caching`: Whether to turn on caching for the run.\n", + "\n", + "Learn more about the `PipelineJob` class from [this documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).\n", + "\n", + "After creating, run the pipeline job using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_classification_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if ((\"model-evaluation\" in task.task_name) and\n", + " (\"model-evaluation-import\" not in task.task_name) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", + " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0]\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the metrics\n", + "\n", + "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(5,3))\n", + "plt.bar(x=metrics,height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if ((task.task_name == \"feature-attribution\" ) and\n", + " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", + " feat_attrs = task.outputs.get('feature_attributions').artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print (feat_attrs)\n", + "print (feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print (attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + " \n", + "plt.figure(figsize=(5,3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "community-automl-regression.ipynb", + "provenance": [] + }, + "environment": { + "kernel": "conda-env-eval_comp-py", + "name": "common-cpu.m90", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/base-cpu:m90" + }, + "kernelspec": { + "display_name": "Python [conda env:eval_comp]", + "language": "python", + "name": "conda-env-eval_comp-py" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 } From 8f4a865d89ded20f290d21a2eb7c13fb026f1b39 Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Wed, 7 Sep 2022 06:01:31 +0000 Subject: [PATCH 28/34] ran linter test --- ...ular_classification_model_evaluation.ipynb | 2800 +++++++++-------- 1 file changed, 1427 insertions(+), 1373 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb index ef9d876f8..c0a64e06f 100644 --- a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb @@ -1,1375 +1,1429 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `Datasets`\n", - "- Vertex AI `Training`(AutoML Tabular Classification) \n", - "- Vertex AI `Model Registry`\n", - "- Vertex AI `Pipelines`\n", - "- Vertex AI `Batch Predictions`\n", - "\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Create a Vertex AI `Dataset`.\n", - "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", - "- Import the trained `AutoML model resource` into the pipeline.\n", - "- Run a `Batch Prediction` job.\n", - "- Evaulate the AutoML model using the `Classification Evaluation Component`.\n", - "- Import the Classification Metrics to the AutoML model resource." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43", - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "from kfp.v2 import compiler\n", - "from google.cloud import aiplatform_v1\n", - "import matplotlib.pyplot as plt\n", - "import json" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print (\"Resource name:\",dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple classification model the created dataset using `Adopted` as the target column. \n", - "\n", - "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if TRAINING_JOB_DISPLAY_NAME == \"\" or \\\n", - " TRAINING_JOB_DISPLAY_NAME is None or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\":\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", - "\n", - "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", - "- `column_specs`(Optional): Transformations to apply to the input columns(including data-type corrections).\n", - "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", - "\n", - "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Define the AutoML training job\n", - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"classification\",\n", - " column_specs={\n", - " \"Type\": \"categorical\",\n", - " \"Age\": \"numeric\",\n", - " \"Breed1\": \"categorical\",\n", - " \"Color1\": \"categorical\",\n", - " \"Color2\": \"categorical\",\n", - " \"MaturitySize\": \"categorical\",\n", - " \"FurLength\": \"categorical\",\n", - " \"Vaccinated\": \"categorical\",\n", - " \"Sterilized\": \"categorical\",\n", - " \"Health\": \"categorical\",\n", - " \"Fee\": \"numeric\",\n", - " \"PhotoAmt\": \"numeric\",\n", - " }\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if MODEL_DISPLAY_NAME == \"\" or \\\n", - " MODEL_DISPLAY_NAME is None or MODEL_DISPLAY_NAME == \"[your-model-display-name]\":\n", - " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Run training job on the created TabularDataset by passing the following arguments for training:\n", - "\n", - "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", - "- `target_column`: The name of the column values of which the Model is to predict.\n", - "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", - "- `budget_milli_node_hours`(Optional): The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", - "\n", - "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", - "\n", - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Specify the target column\n", - "target_column = \"Adopted\"\n", - "\n", - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=target_column,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a Dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(\n", - " name='vertex-evaluation-automl-tabular-classification-feature-attribution')\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = 'jsonl',\n", - " batch_predict_machine_type: str = 'n1-standard-4',\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000\n", - "):\n", - " \n", - " from google_cloud_pipeline_components.experimental.evaluation import GetVertexModelOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import EvaluationDataSamplerOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationClassificationOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import ModelEvaluationFeatureAttributionOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import ModelImportEvaluationOp\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " \n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - " \n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size\n", - " )\n", - " \n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs['model'],\n", - " job_display_name='model-registry-batch-predict-evaluation',\n", - " gcs_source_uris= data_sampler_task.outputs['gcs_output_directory'],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationClassificationOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " problem_type=\"classification\",\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory'],\n", - " predictions_format=batch_predict_predictions_format\n", - " )\n", - " \n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format=batch_predict_predictions_format,\n", - " predictions_gcs_source=batch_explain_task.outputs['gcs_output_directory']\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " classification_metrics=eval_task.outputs['evaluation_metrics'],\n", - " feature_attributions=feature_attribution_task.outputs['feature_attributions'],\n", - " model=get_model_task.outputs['model'],\n", - " dataset_type=batch_predict_instances_format\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_classification_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_classification_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\" or \\\n", - " PIPELINE_DISPLAY_NAME == \"\" or PIPELINE_DISPLAY_NAME is None:\n", - " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", - "- `target_column_name`: Name of the column to be used as the target for classification.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction. Can be '**jsonl**' or '**bigquery**' or '**csv**'.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", - "parameters = {\n", - " 'project':PROJECT_ID,\n", - " 'location':REGION,\n", - " 'root_dir':PIPELINE_ROOT,\n", - " 'model_name':model.resource_name,\n", - " 'target_column_name':\"Adopted\",\n", - " 'batch_predict_gcs_source_uris':[DATA_SOURCE],\n", - " 'batch_predict_instances_format':'csv',\n", - " 'batch_predict_explanation_data_sample_size': 3000\n", - " }" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Create a Vertex AI pipeline job using the following parameters:\n", - "\n", - "- `display_name`: The name of the pipeline, this will show up in the Google Cloud console.\n", - "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI or an Artifact Registry URI.\n", - "- `parameter_values`: The mapping from runtime parameter names to its values that\n", - " control the pipeline run.\n", - "- `enable_caching`: Whether to turn on caching for the run.\n", - "\n", - "Learn more about the `PipelineJob` class from [this documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).\n", - "\n", - "After creating, run the pipeline job using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_classification_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if ((\"model-evaluation\" in task.task_name) and\n", - " (\"model-evaluation-import\" not in task.task_name) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", - " evaluation_metrics = task.outputs.get('evaluation_metrics').artifacts[0]\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualize the metrics\n", - "\n", - "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(5,3))\n", - "plt.bar(x=metrics,height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "tags": [] - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if ((task.task_name == \"feature-attribution\" ) and\n", - " (task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED)):\n", - " feat_attrs = task.outputs.get('feature_attributions').artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print (feat_attrs)\n", - "print (feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print (attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - " \n", - "plt.figure(figsize=(5,3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "community-automl-regression.ipynb", - "provenance": [] - }, - "environment": { - "kernel": "conda-env-eval_comp-py", - "name": "common-cpu.m90", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/base-cpu:m90" - }, - "kernelspec": { - "display_name": "Python [conda env:eval_comp]", - "language": "python", - "name": "conda-env-eval_comp-py" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.6" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `Datasets`\n", + "- Vertex AI `Training`(AutoML Tabular Classification) \n", + "- Vertex AI `Model Registry`\n", + "- Vertex AI `Pipelines`\n", + "- Vertex AI `Batch Predictions`\n", + "\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI `Dataset`.\n", + "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", + "- Import the trained `AutoML model resource` into the pipeline.\n", + "- Run a `Batch Prediction` job.\n", + "- Evaulate the AutoML model using the `Classification Evaluation Component`.\n", + "- Import the Classification Metrics to the AutoML model resource." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8d97acf78771" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3390c9e9426c" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2011a473ce65" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6da01c2f1d4f" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple classification model the created dataset using `Adopted` as the target column. \n", + "\n", + "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5dd3db2d1225" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0614e3fb19da" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce9c9f279674" + }, + "source": [ + "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", + "\n", + "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", + "- `column_specs`(Optional): Transformations to apply to the input columns(including data-type corrections).\n", + "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", + "\n", + "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d33629c2aae6" + }, + "outputs": [], + "source": [ + "# Define the AutoML training job\n", + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"classification\",\n", + " column_specs={\n", + " \"Type\": \"categorical\",\n", + " \"Age\": \"numeric\",\n", + " \"Breed1\": \"categorical\",\n", + " \"Color1\": \"categorical\",\n", + " \"Color2\": \"categorical\",\n", + " \"MaturitySize\": \"categorical\",\n", + " \"FurLength\": \"categorical\",\n", + " \"Vaccinated\": \"categorical\",\n", + " \"Sterilized\": \"categorical\",\n", + " \"Health\": \"categorical\",\n", + " \"Fee\": \"numeric\",\n", + " \"PhotoAmt\": \"numeric\",\n", + " },\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "391c51c98647" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "454f077b984e" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "21b5a27e8171" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "93ebafd3f347" + }, + "source": [ + "Run training job on the created TabularDataset by passing the following arguments for training:\n", + "\n", + "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", + "- `target_column`: The name of the column values of which the Model is to predict.\n", + "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", + "- `budget_milli_node_hours`(Optional): The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", + "\n", + "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", + "\n", + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9ce44a2ab942" + }, + "outputs": [], + "source": [ + "# Specify the target column\n", + "target_column = \"Adopted\"\n", + "\n", + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=target_column,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bfa52eb3f22f" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d56e2b3cf57d" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bd2e1da7a64e" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "19c434d8b035" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ab9f273691cc" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a Dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "327d8d4e11b2" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(\n", + " name=\"vertex-evaluation-automl-tabular-classification-feature-attribution\"\n", + ")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationClassificationOp, ModelEvaluationFeatureAttributionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationClassificationOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " problem_type=\"classification\",\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=batch_predict_predictions_format,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " classification_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1abb012ce04b" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_classification_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e526b588cae9" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_classification_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "26eef4b83c88" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "63b84f5490d2" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e0a18b803bb7" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a9571ef567de" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", + "- `target_column_name`: Name of the column to be used as the target for classification.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction. Can be '**jsonl**' or '**bigquery**' or '**csv**'.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "52d622c274d2" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"target_column_name\": \"Adopted\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0409b0f330c2" + }, + "source": [ + "Create a Vertex AI pipeline job using the following parameters:\n", + "\n", + "- `display_name`: The name of the pipeline, this will show up in the Google Cloud console.\n", + "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI or an Artifact Registry URI.\n", + "- `parameter_values`: The mapping from runtime parameter names to its values that\n", + " control the pipeline run.\n", + "- `enable_caching`: Whether to turn on caching for the run.\n", + "\n", + "Learn more about the `PipelineJob` class from [this documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).\n", + "\n", + "After creating, run the pipeline job using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "894afe1ba396" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_classification_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ec4ec00ab350" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[0]\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ca00512eb89f" + }, + "source": [ + "### Visualize the metrics\n", + "\n", + "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "f9e38f73f838" + }, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "049c9bbae2cb" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "03ca8c149bc6" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "719d2cd57d10" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "82e308dd8aca" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5bfe517357f8" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d7a7dca9e3cc" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "automl_tabular_classification_model_evaluation.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 } From 4dbad08c6397045a863bcc7a62b92f2e04126d73 Mon Sep 17 00:00:00 2001 From: sudarshan-SpringML Date: Wed, 7 Sep 2022 07:06:54 +0000 Subject: [PATCH 29/34] addressed comments --- ..._tabular_regression_model_evaluation.ipynb | 2985 +++++++++-------- 1 file changed, 1499 insertions(+), 1486 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index 8a1046ca5..87627308b 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -1,1488 +1,1501 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Regression model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI regression model evaluation component to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you learn how to evaluate a Vertex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `AutoML`\n", - "- Vertex AI `TabularDataset` (AutoML)\n", - "- Vertex AI `AutoMLTabularTrainingJob`\n", - "- Vertex AI `BatchPrediction`\n", - "- Vertex AI `Pipeline`\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Create a Vertex AI Dataset\n", - "- Configure a `AutoMLTabularTrainingJob`\n", - "- Run the `AutoMLTabularTrainingJob` which returns a model\n", - "- Import a pre-trained `AutoML model resource` into the pipeline\n", - "- Run a `batch prediction` job\n", - "- Evaulate the AutoML model using the `regression evaluation component`" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook. You can skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. \n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). \n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "IS_COLAB = False\n", - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "import matplotlib.pyplot as plt\n", - "from google.cloud import aiplatform_v1\n", - "from kfp.v2 import compiler" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BiVlyW5OUnjK" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bViYfWfpVAiF" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "20S9En09X0PY" - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset\",\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print(\"Resource name:\", dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "A-QQkeUnq8Xt" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple regression model using the created dataset using `Age` as the target column. \n", - "\n", - "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Bxn6ATUXrET6" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2e7664fe3af6" - }, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if (\n", - " TRAINING_JOB_DISPLAY_NAME == \"\"\n", - " or TRAINING_JOB_DISPLAY_NAME is None\n", - " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", - "):\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6cb41277f4f3" - }, - "source": [ - "An AutoML training job is created with the `AutoMLTabularTrainingJob` class, with the following parameters:\n", - "\n", - "- `display_name`: The human readable name for the `TrainingJob` resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", - "- `column_transformations`: Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using \".\" as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. If none of column_transformations or column_specs is passed, the local credentials being used will try setting column_transformations to \"auto\". To do this, the local credentials require read access to the GCS or BigQuery training data source.\n", - "- `optimization_objective`: The optimization objective to minimize or maximize.\n", - " - `minimize-rmse`\n", - " - `minimize-mae`\n", - " - `minimize-rmsle`\n", - "\n", - "To learn more about `AutoMLTabularTrainingJob` click [here](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob) " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3l691PEMZFdA" - }, - "outputs": [], - "source": [ - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"regression\",\n", - " column_specs={\n", - " \"Type\": \"categorical\",\n", - " \"Breed1\": \"categorical\",\n", - " \"Gender\": \"categorical\",\n", - " \"Color1\": \"categorical\",\n", - " \"Color2\": \"categorical\",\n", - " \"MaturitySize\": \"categorical\",\n", - " \"FurLength\": \"categorical\",\n", - " \"Vaccinated\": \"categorical\",\n", - " \"Sterilized\": \"categorical\",\n", - " \"Health\": \"categorical\",\n", - " \"Fee\": \"numeric\",\n", - " \"PhotoAmt\": \"numeric\",\n", - " \"Adopted\": \"categorical\",\n", - " },\n", - " optimization_objective=\"minimize-rmse\",\n", - ")\n", - "\n", - "print(train_job)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "391c51c98647" - }, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "454f077b984e" - }, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "c4f338cdea7c" - }, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if (\n", - " MODEL_DISPLAY_NAME == \"\"\n", - " or MODEL_DISPLAY_NAME is None\n", - " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", - "):\n", - " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "de7e24205889" - }, - "source": [ - "Next, you start the training job by invoking the method `run`, with the following parameters:\n", - "\n", - "- `dataset`: The `Dataset` resource to train the model.\n", - "- `target_column`: The name of the column, whose values the model is to predict.\n", - "- `training_fraction_split`: The percentage of the dataset to use for training.\n", - "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", - "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", - "- `model_display_name`: The human readable name for the trained model.\n", - "- `disable_early_stopping`: If true, the entire budget is used.\n", - "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5caae7fc10d9" - }, - "source": [ - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "IIfvPCGYyFCT" - }, - "outputs": [], - "source": [ - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=\"Age\",\n", - " training_fraction_split=0.8,\n", - " validation_fraction_split=0.1,\n", - " test_fraction_split=0.1,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " disable_early_stopping=False,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rYirKB_9yaa0" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "KkgCdQQAyZP1" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3f4d0c17150d" - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "581a188f0453" - }, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5adde0951eb5" - }, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ktMsqtibAUzz" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-feature-attribution-pipeline\")\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " prediction_type: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = \"jsonl\",\n", - " batch_predict_machine_type: str = \"n1-standard-4\",\n", - " batch_predict_starting_replica_count: int = 5,\n", - " batch_predict_max_replica_count: int = 10,\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = \"\",\n", - "):\n", - "\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " EvaluationDataSamplerOp, GetVertexModelOp,\n", - " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", - " ModelImportEvaluationOp)\n", - "\n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size,\n", - " )\n", - "\n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs[\"model\"],\n", - " job_display_name=\"model-registry-batch-predict-evaluation\",\n", - " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " starting_replica_count=batch_predict_starting_replica_count,\n", - " max_replica_count=batch_predict_max_replica_count,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationRegressionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " predictions_format=batch_predict_predictions_format,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format=\"jsonl\",\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " problem_type=prediction_type,\n", - " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", - " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", - " model=get_model_task.outputs[\"model\"],\n", - " dataset_type=batch_predict_instances_format,\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "a712dfa762ee" - }, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_regression_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NOvOMTEgCVcW" - }, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_regression_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bdd9e2fd6841" - }, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "8f17c5c7b3e3" - }, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "1aa7d7bbb1c9" - }, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if (\n", - " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", - " or PIPELINE_DISPLAY_NAME == \"\"\n", - " or PIPELINE_DISPLAY_NAME is None\n", - "):\n", - " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "90f424d5dca0" - }, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", - "- `prediction_type`: Type of the prediction (In this tutorial, it is \"regression\").\n", - "- `target_column_name`: Name of the column to be used as the target for regression.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "JeSiA6-TSgV8" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", - "parameters = {\n", - " \"project\": PROJECT_ID,\n", - " \"location\": REGION,\n", - " \"root_dir\": PIPELINE_ROOT,\n", - " \"model_name\": model.resource_name,\n", - " \"prediction_type\": \"regression\",\n", - " \"target_column_name\": \"Age\",\n", - " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", - " \"batch_predict_instances_format\": \"csv\",\n", - " \"batch_predict_explanation_data_sample_size\": 3000,\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "859fa6611d9a" - }, - "source": [ - "Next, you create the pipeline job, with the following parameters:\n", - "\n", - "- `display_name`: The user-defined name of this Pipeline.\n", - "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI (e.g. \"gs://project.name\"), or an Artifact Registry URI (e.g. \"https://us-central1-kfp.pkg.dev/proj/repo/pack/latest\").\n", - "- `parameter_values`: The mapping from runtime parameter names to its values that control the pipeline run.\n", - "- `enable_caching`: Whether to turn on caching for the run. If this is not set, defaults to the compile time settings, which are True for all tasks by default, while users may specify different caching options for individual tasks. If this is set, the setting applies to all tasks in the pipeline. Overrides the compile time settings.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e8dce0638349" - }, - "source": [ - "Run the pipeline using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pdHib_yUEuEk" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_regression_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "625960707c60" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U2zocUvk2YVs" - }, - "source": [ - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "mtHA8rhGGQv3" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if (\n", - " (\"model-evaluation\" in task.task_name)\n", - " and (\"model-evaluation-import\" not in task.task_name)\n", - " and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " )\n", - " ):\n", - " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", - " 0\n", - " ] # ['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e69f183f902b" - }, - "source": [ - "### Visualize the metrics\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "b7c5e5c35ee9" - }, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " if (\n", - " i[0] == \"meanAbsolutePercentageError\"\n", - " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", - " continue\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(10, 5))\n", - "plt.bar(x=metrics, height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "c26ad3958895" - }, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance.\n", - "\n", - "To learn more about Feature Attributions click [here](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "b09056628b26" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if (task.task_name == \"feature-attribution\") and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " ):\n", - " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print(feat_attrs)\n", - "print(feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "4d9d6a82d826" - }, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "c26a2091f4fc" - }, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print(attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "77151be8d776" - }, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "069bf017e0de" - }, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - "\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "automl_tabular_regression_model_evaluation.ipynb", - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - } - }, - "nbformat": 4, - "nbformat_minor": 0 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Regression model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI regression model evaluation component to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a Vertex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML`\n", + "- Vertex AI `TabularDataset` (AutoML)\n", + "- Vertex AI `AutoMLTabularTrainingJob`\n", + "- Vertex AI `BatchPrediction`\n", + "- Vertex AI `Pipeline`\n", + "- Vertex AI `Model Registry`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI Dataset\n", + "- Configure a `AutoMLTabularTrainingJob`\n", + "- Run the `AutoMLTabularTrainingJob` which returns a model\n", + "- Import a pre-trained `AutoML model resource` into the pipeline\n", + "- Run a `batch prediction` job\n", + "- Evaulate the AutoML model using the `regression evaluation component`\n", + "- Import the Classification Metrics to the AutoML model resource" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook. You can skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). \n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "IS_COLAB = False\n", + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BiVlyW5OUnjK" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bViYfWfpVAiF" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "20S9En09X0PY" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset\",\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A-QQkeUnq8Xt" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple regression model using the created dataset using `Age` as the target column. \n", + "\n", + "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Bxn6ATUXrET6" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2e7664fe3af6" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6cb41277f4f3" + }, + "source": [ + "An AutoML training job is created with the `AutoMLTabularTrainingJob` class, with the following parameters:\n", + "\n", + "- `display_name`: The human readable name for the `TrainingJob` resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", + "- `column_transformations`: Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using \".\" as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. If none of column_transformations or column_specs is passed, the local credentials being used will try setting column_transformations to \"auto\". To do this, the local credentials require read access to the GCS or BigQuery training data source.\n", + "- `optimization_objective`: The optimization objective to minimize or maximize.\n", + " - `minimize-rmse`\n", + " - `minimize-mae`\n", + " - `minimize-rmsle`\n", + "\n", + "To learn more about `AutoMLTabularTrainingJob` click [here](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3l691PEMZFdA" + }, + "outputs": [], + "source": [ + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"regression\",\n", + " column_specs={\n", + " \"Type\": \"categorical\",\n", + " \"Breed1\": \"categorical\",\n", + " \"Gender\": \"categorical\",\n", + " \"Color1\": \"categorical\",\n", + " \"Color2\": \"categorical\",\n", + " \"MaturitySize\": \"categorical\",\n", + " \"FurLength\": \"categorical\",\n", + " \"Vaccinated\": \"categorical\",\n", + " \"Sterilized\": \"categorical\",\n", + " \"Health\": \"categorical\",\n", + " \"Fee\": \"numeric\",\n", + " \"PhotoAmt\": \"numeric\",\n", + " \"Adopted\": \"categorical\",\n", + " },\n", + " optimization_objective=\"minimize-rmse\",\n", + ")\n", + "\n", + "print(train_job)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "391c51c98647" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "454f077b984e" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c4f338cdea7c" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "de7e24205889" + }, + "source": [ + "Next, you start the training job by invoking the method `run`, with the following parameters:\n", + "\n", + "- `dataset`: The `Dataset` resource to train the model.\n", + "- `target_column`: The name of the column, whose values the model is to predict.\n", + "- `training_fraction_split`: The percentage of the dataset to use for training.\n", + "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", + "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", + "- `model_display_name`: The human readable name for the trained model.\n", + "- `disable_early_stopping`: If true, the entire budget is used.\n", + "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5caae7fc10d9" + }, + "source": [ + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IIfvPCGYyFCT" + }, + "outputs": [], + "source": [ + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=\"Age\",\n", + " training_fraction_split=0.8,\n", + " validation_fraction_split=0.1,\n", + " test_fraction_split=0.1,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " disable_early_stopping=False,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rYirKB_9yaa0" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3f4d0c17150d" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "581a188f0453" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5adde0951eb5" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a Dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ktMsqtibAUzz" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-regression-feature-attribution\")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = \"\",\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=\"jsonl\",\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a712dfa762ee" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_regression_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NOvOMTEgCVcW" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bdd9e2fd6841" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8f17c5c7b3e3" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1aa7d7bbb1c9" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "90f424d5dca0" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", + "- `target_column_name`: Name of the column to be used as the target for regression.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction. Can be \"jsonl\", \"csv\" or \"bigquery\".\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"target_column_name\": \"Age\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "859fa6611d9a" + }, + "source": [ + "Next, you create the pipeline job, with the following parameters:\n", + "\n", + "- `display_name`: The user-defined name of this Pipeline.\n", + "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI (e.g. \"gs://project.name\"), or an Artifact Registry URI (e.g. \"https://us-central1-kfp.pkg.dev/proj/repo/pack/latest\").\n", + "- `parameter_values`: The mapping from runtime parameter names to its values that control the pipeline run.\n", + "- `enable_caching`: Whether to turn on caching for the run. If this is not set, defaults to the compile time settings, which are True for all tasks by default, while users may specify different caching options for individual tasks. If this is set, the setting applies to all tasks in the pipeline. Overrides the compile time settings.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e8dce0638349" + }, + "source": [ + "Run the pipeline using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pdHib_yUEuEk" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_regression_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "625960707c60" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U2zocUvk2YVs" + }, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mtHA8rhGGQv3" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", + " 0\n", + " ] # ['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e69f183f902b" + }, + "source": [ + "### Visualize the metrics\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b7c5e5c35ee9" + }, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " if (\n", + " i[0] == \"meanAbsolutePercentageError\"\n", + " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", + " continue\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(10, 5))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c26ad3958895" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance.\n", + "\n", + "To learn more about Feature Attributions click [here](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b09056628b26" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4d9d6a82d826" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c26a2091f4fc" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "77151be8d776" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "069bf017e0de" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "automl_tabular_regression_model_evaluation.ipynb", + "toc_visible": true + }, + "environment": { + "kernel": "python3", + "name": "common-cpu.m94", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/base-cpu:m94" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 } From 2b1283fa67c01fd5d29788e7a8e99c4a355cfe85 Mon Sep 17 00:00:00 2001 From: sudarshan-SpringML Date: Wed, 7 Sep 2022 07:07:48 +0000 Subject: [PATCH 30/34] ran linter --- ..._tabular_regression_model_evaluation.ipynb | 2981 ++++++++--------- 1 file changed, 1482 insertions(+), 1499 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index 87627308b..1653a47a7 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -1,1501 +1,1484 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Regression model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI regression model evaluation component to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you learn how to evaluate a Vertex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `AutoML`\n", - "- Vertex AI `TabularDataset` (AutoML)\n", - "- Vertex AI `AutoMLTabularTrainingJob`\n", - "- Vertex AI `BatchPrediction`\n", - "- Vertex AI `Pipeline`\n", - "- Vertex AI `Model Registry`\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Create a Vertex AI Dataset\n", - "- Configure a `AutoMLTabularTrainingJob`\n", - "- Run the `AutoMLTabularTrainingJob` which returns a model\n", - "- Import a pre-trained `AutoML model resource` into the pipeline\n", - "- Run a `batch prediction` job\n", - "- Evaulate the AutoML model using the `regression evaluation component`\n", - "- Import the Classification Metrics to the AutoML model resource" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook. You can skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. \n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). \n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "IS_COLAB = False\n", - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "import matplotlib.pyplot as plt\n", - "from google.cloud import aiplatform_v1\n", - "from kfp.v2 import compiler" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BiVlyW5OUnjK" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bViYfWfpVAiF" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "20S9En09X0PY" - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset\",\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print(\"Resource name:\", dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "A-QQkeUnq8Xt" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple regression model using the created dataset using `Age` as the target column. \n", - "\n", - "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Bxn6ATUXrET6" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2e7664fe3af6" - }, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if (\n", - " TRAINING_JOB_DISPLAY_NAME == \"\"\n", - " or TRAINING_JOB_DISPLAY_NAME is None\n", - " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", - "):\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6cb41277f4f3" - }, - "source": [ - "An AutoML training job is created with the `AutoMLTabularTrainingJob` class, with the following parameters:\n", - "\n", - "- `display_name`: The human readable name for the `TrainingJob` resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", - "- `column_transformations`: Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using \".\" as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. If none of column_transformations or column_specs is passed, the local credentials being used will try setting column_transformations to \"auto\". To do this, the local credentials require read access to the GCS or BigQuery training data source.\n", - "- `optimization_objective`: The optimization objective to minimize or maximize.\n", - " - `minimize-rmse`\n", - " - `minimize-mae`\n", - " - `minimize-rmsle`\n", - "\n", - "To learn more about `AutoMLTabularTrainingJob` click [here](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob) " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3l691PEMZFdA" - }, - "outputs": [], - "source": [ - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"regression\",\n", - " column_specs={\n", - " \"Type\": \"categorical\",\n", - " \"Breed1\": \"categorical\",\n", - " \"Gender\": \"categorical\",\n", - " \"Color1\": \"categorical\",\n", - " \"Color2\": \"categorical\",\n", - " \"MaturitySize\": \"categorical\",\n", - " \"FurLength\": \"categorical\",\n", - " \"Vaccinated\": \"categorical\",\n", - " \"Sterilized\": \"categorical\",\n", - " \"Health\": \"categorical\",\n", - " \"Fee\": \"numeric\",\n", - " \"PhotoAmt\": \"numeric\",\n", - " \"Adopted\": \"categorical\",\n", - " },\n", - " optimization_objective=\"minimize-rmse\",\n", - ")\n", - "\n", - "print(train_job)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "391c51c98647" - }, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "454f077b984e" - }, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "c4f338cdea7c" - }, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if (\n", - " MODEL_DISPLAY_NAME == \"\"\n", - " or MODEL_DISPLAY_NAME is None\n", - " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", - "):\n", - " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "de7e24205889" - }, - "source": [ - "Next, you start the training job by invoking the method `run`, with the following parameters:\n", - "\n", - "- `dataset`: The `Dataset` resource to train the model.\n", - "- `target_column`: The name of the column, whose values the model is to predict.\n", - "- `training_fraction_split`: The percentage of the dataset to use for training.\n", - "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", - "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", - "- `model_display_name`: The human readable name for the trained model.\n", - "- `disable_early_stopping`: If true, the entire budget is used.\n", - "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5caae7fc10d9" - }, - "source": [ - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "IIfvPCGYyFCT" - }, - "outputs": [], - "source": [ - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=\"Age\",\n", - " training_fraction_split=0.8,\n", - " validation_fraction_split=0.1,\n", - " test_fraction_split=0.1,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " disable_early_stopping=False,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rYirKB_9yaa0" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "KkgCdQQAyZP1" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3f4d0c17150d" - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "581a188f0453" - }, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5adde0951eb5" - }, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a Dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ktMsqtibAUzz" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(name=\"vertex-evaluation-automl-tabular-regression-feature-attribution\")\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = \"jsonl\",\n", - " batch_predict_machine_type: str = \"n1-standard-4\",\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = \"\",\n", - "):\n", - "\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " EvaluationDataSamplerOp, GetVertexModelOp,\n", - " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", - " ModelImportEvaluationOp)\n", - "\n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size,\n", - " )\n", - "\n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs[\"model\"],\n", - " job_display_name=\"model-registry-batch-predict-evaluation\",\n", - " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationRegressionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " predictions_format=batch_predict_predictions_format,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format=\"jsonl\",\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", - " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", - " model=get_model_task.outputs[\"model\"],\n", - " dataset_type=batch_predict_instances_format,\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "a712dfa762ee" - }, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_regression_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NOvOMTEgCVcW" - }, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_regression_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bdd9e2fd6841" - }, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "8f17c5c7b3e3" - }, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "1aa7d7bbb1c9" - }, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if (\n", - " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", - " or PIPELINE_DISPLAY_NAME == \"\"\n", - " or PIPELINE_DISPLAY_NAME is None\n", - "):\n", - " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "90f424d5dca0" - }, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", - "- `target_column_name`: Name of the column to be used as the target for regression.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction. Can be \"jsonl\", \"csv\" or \"bigquery\".\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "JeSiA6-TSgV8" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", - "parameters = {\n", - " \"project\": PROJECT_ID,\n", - " \"location\": REGION,\n", - " \"root_dir\": PIPELINE_ROOT,\n", - " \"model_name\": model.resource_name,\n", - " \"target_column_name\": \"Age\",\n", - " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", - " \"batch_predict_instances_format\": \"csv\",\n", - " \"batch_predict_explanation_data_sample_size\": 3000,\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "859fa6611d9a" - }, - "source": [ - "Next, you create the pipeline job, with the following parameters:\n", - "\n", - "- `display_name`: The user-defined name of this Pipeline.\n", - "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI (e.g. \"gs://project.name\"), or an Artifact Registry URI (e.g. \"https://us-central1-kfp.pkg.dev/proj/repo/pack/latest\").\n", - "- `parameter_values`: The mapping from runtime parameter names to its values that control the pipeline run.\n", - "- `enable_caching`: Whether to turn on caching for the run. If this is not set, defaults to the compile time settings, which are True for all tasks by default, while users may specify different caching options for individual tasks. If this is set, the setting applies to all tasks in the pipeline. Overrides the compile time settings.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e8dce0638349" - }, - "source": [ - "Run the pipeline using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pdHib_yUEuEk" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_regression_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "625960707c60" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U2zocUvk2YVs" - }, - "source": [ - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "mtHA8rhGGQv3" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if (\n", - " (\"model-evaluation\" in task.task_name)\n", - " and (\"model-evaluation-import\" not in task.task_name)\n", - " and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " )\n", - " ):\n", - " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", - " 0\n", - " ] # ['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e69f183f902b" - }, - "source": [ - "### Visualize the metrics\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "b7c5e5c35ee9" - }, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " if (\n", - " i[0] == \"meanAbsolutePercentageError\"\n", - " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", - " continue\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(10, 5))\n", - "plt.bar(x=metrics, height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "c26ad3958895" - }, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance.\n", - "\n", - "To learn more about Feature Attributions click [here](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "b09056628b26" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if (task.task_name == \"feature-attribution\") and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " ):\n", - " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print(feat_attrs)\n", - "print(feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "4d9d6a82d826" - }, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "c26a2091f4fc" - }, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print(attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "77151be8d776" - }, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "069bf017e0de" - }, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - "\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "automl_tabular_regression_model_evaluation.ipynb", - "toc_visible": true - }, - "environment": { - "kernel": "python3", - "name": "common-cpu.m94", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/base-cpu:m94" - }, - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.12" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Regression model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI regression model evaluation component to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a Vertex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML`\n", + "- Vertex AI `TabularDataset` (AutoML)\n", + "- Vertex AI `AutoMLTabularTrainingJob`\n", + "- Vertex AI `BatchPrediction`\n", + "- Vertex AI `Pipeline`\n", + "- Vertex AI `Model Registry`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI Dataset\n", + "- Configure a `AutoMLTabularTrainingJob`\n", + "- Run the `AutoMLTabularTrainingJob` which returns a model\n", + "- Import a pre-trained `AutoML model resource` into the pipeline\n", + "- Run a `batch prediction` job\n", + "- Evaulate the AutoML model using the `regression evaluation component`\n", + "- Import the Classification Metrics to the AutoML model resource" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook. You can skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). \n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "IS_COLAB = False\n", + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BiVlyW5OUnjK" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bViYfWfpVAiF" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "20S9En09X0PY" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset\",\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A-QQkeUnq8Xt" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple regression model using the created dataset using `Age` as the target column. \n", + "\n", + "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Bxn6ATUXrET6" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2e7664fe3af6" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6cb41277f4f3" + }, + "source": [ + "An AutoML training job is created with the `AutoMLTabularTrainingJob` class, with the following parameters:\n", + "\n", + "- `display_name`: The human readable name for the `TrainingJob` resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", + "- `column_transformations`: Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using \".\" as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. If none of column_transformations or column_specs is passed, the local credentials being used will try setting column_transformations to \"auto\". To do this, the local credentials require read access to the GCS or BigQuery training data source.\n", + "- `optimization_objective`: The optimization objective to minimize or maximize.\n", + " - `minimize-rmse`\n", + " - `minimize-mae`\n", + " - `minimize-rmsle`\n", + "\n", + "To learn more about `AutoMLTabularTrainingJob` click [here](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3l691PEMZFdA" + }, + "outputs": [], + "source": [ + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"regression\",\n", + " column_specs={\n", + " \"Type\": \"categorical\",\n", + " \"Breed1\": \"categorical\",\n", + " \"Gender\": \"categorical\",\n", + " \"Color1\": \"categorical\",\n", + " \"Color2\": \"categorical\",\n", + " \"MaturitySize\": \"categorical\",\n", + " \"FurLength\": \"categorical\",\n", + " \"Vaccinated\": \"categorical\",\n", + " \"Sterilized\": \"categorical\",\n", + " \"Health\": \"categorical\",\n", + " \"Fee\": \"numeric\",\n", + " \"PhotoAmt\": \"numeric\",\n", + " \"Adopted\": \"categorical\",\n", + " },\n", + " optimization_objective=\"minimize-rmse\",\n", + ")\n", + "\n", + "print(train_job)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "391c51c98647" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "454f077b984e" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c4f338cdea7c" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "de7e24205889" + }, + "source": [ + "Next, you start the training job by invoking the method `run`, with the following parameters:\n", + "\n", + "- `dataset`: The `Dataset` resource to train the model.\n", + "- `target_column`: The name of the column, whose values the model is to predict.\n", + "- `training_fraction_split`: The percentage of the dataset to use for training.\n", + "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", + "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", + "- `model_display_name`: The human readable name for the trained model.\n", + "- `disable_early_stopping`: If true, the entire budget is used.\n", + "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5caae7fc10d9" + }, + "source": [ + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IIfvPCGYyFCT" + }, + "outputs": [], + "source": [ + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=\"Age\",\n", + " training_fraction_split=0.8,\n", + " validation_fraction_split=0.1,\n", + " test_fraction_split=0.1,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " disable_early_stopping=False,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rYirKB_9yaa0" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3f4d0c17150d" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "581a188f0453" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5adde0951eb5" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a Dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ktMsqtibAUzz" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(\n", + " name=\"vertex-evaluation-automl-tabular-regression-feature-attribution\"\n", + ")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = \"\",\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=\"jsonl\",\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a712dfa762ee" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_regression_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NOvOMTEgCVcW" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bdd9e2fd6841" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8f17c5c7b3e3" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1aa7d7bbb1c9" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "90f424d5dca0" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", + "- `target_column_name`: Name of the column to be used as the target for regression.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction. Can be \"jsonl\", \"csv\" or \"bigquery\".\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"target_column_name\": \"Age\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "859fa6611d9a" + }, + "source": [ + "Next, you create the pipeline job, with the following parameters:\n", + "\n", + "- `display_name`: The user-defined name of this Pipeline.\n", + "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI (e.g. \"gs://project.name\"), or an Artifact Registry URI (e.g. \"https://us-central1-kfp.pkg.dev/proj/repo/pack/latest\").\n", + "- `parameter_values`: The mapping from runtime parameter names to its values that control the pipeline run.\n", + "- `enable_caching`: Whether to turn on caching for the run. If this is not set, defaults to the compile time settings, which are True for all tasks by default, while users may specify different caching options for individual tasks. If this is set, the setting applies to all tasks in the pipeline. Overrides the compile time settings.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e8dce0638349" + }, + "source": [ + "Run the pipeline using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pdHib_yUEuEk" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_regression_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "625960707c60" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U2zocUvk2YVs" + }, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mtHA8rhGGQv3" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", + " 0\n", + " ] # ['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e69f183f902b" + }, + "source": [ + "### Visualize the metrics\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b7c5e5c35ee9" + }, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " if (\n", + " i[0] == \"meanAbsolutePercentageError\"\n", + " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", + " continue\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(10, 5))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c26ad3958895" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance.\n", + "\n", + "To learn more about Feature Attributions click [here](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b09056628b26" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4d9d6a82d826" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c26a2091f4fc" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "77151be8d776" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "069bf017e0de" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "automl_tabular_regression_model_evaluation.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 } From 94d65c998759e9dc2d36ae1775608380e6a41a17 Mon Sep 17 00:00:00 2001 From: sudarshan-SpringML Date: Wed, 7 Sep 2022 09:25:44 +0000 Subject: [PATCH 31/34] removed unwanted variables --- ..._tabular_regression_model_evaluation.ipynb | 2982 +++++++++-------- ...tabular_regression_evaluation_pipeline.PNG | Bin 0 -> 46871 bytes ...tabular_regression_evaluation_pipeline.png | Bin 33828 -> 0 bytes 3 files changed, 1500 insertions(+), 1482 deletions(-) create mode 100644 notebooks/community/model_evaluation/images/automl_tabular_regression_evaluation_pipeline.PNG delete mode 100644 notebooks/community/model_evaluation/images/automl_tabular_regression_evaluation_pipeline.png diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index 1653a47a7..01bd788ab 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -1,1484 +1,1502 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Regression model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI regression model evaluation component to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you learn how to evaluate a Vertex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `AutoML`\n", - "- Vertex AI `TabularDataset` (AutoML)\n", - "- Vertex AI `AutoMLTabularTrainingJob`\n", - "- Vertex AI `BatchPrediction`\n", - "- Vertex AI `Pipeline`\n", - "- Vertex AI `Model Registry`\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Create a Vertex AI Dataset\n", - "- Configure a `AutoMLTabularTrainingJob`\n", - "- Run the `AutoMLTabularTrainingJob` which returns a model\n", - "- Import a pre-trained `AutoML model resource` into the pipeline\n", - "- Run a `batch prediction` job\n", - "- Evaulate the AutoML model using the `regression evaluation component`\n", - "- Import the Classification Metrics to the AutoML model resource" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook. You can skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. \n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). \n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "IS_COLAB = False\n", - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "import matplotlib.pyplot as plt\n", - "from google.cloud import aiplatform_v1\n", - "from kfp.v2 import compiler" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BiVlyW5OUnjK" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bViYfWfpVAiF" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "20S9En09X0PY" - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset\",\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print(\"Resource name:\", dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "A-QQkeUnq8Xt" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple regression model using the created dataset using `Age` as the target column. \n", - "\n", - "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Bxn6ATUXrET6" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2e7664fe3af6" - }, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if (\n", - " TRAINING_JOB_DISPLAY_NAME == \"\"\n", - " or TRAINING_JOB_DISPLAY_NAME is None\n", - " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", - "):\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6cb41277f4f3" - }, - "source": [ - "An AutoML training job is created with the `AutoMLTabularTrainingJob` class, with the following parameters:\n", - "\n", - "- `display_name`: The human readable name for the `TrainingJob` resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", - "- `column_transformations`: Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using \".\" as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. If none of column_transformations or column_specs is passed, the local credentials being used will try setting column_transformations to \"auto\". To do this, the local credentials require read access to the GCS or BigQuery training data source.\n", - "- `optimization_objective`: The optimization objective to minimize or maximize.\n", - " - `minimize-rmse`\n", - " - `minimize-mae`\n", - " - `minimize-rmsle`\n", - "\n", - "To learn more about `AutoMLTabularTrainingJob` click [here](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob) " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3l691PEMZFdA" - }, - "outputs": [], - "source": [ - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"regression\",\n", - " column_specs={\n", - " \"Type\": \"categorical\",\n", - " \"Breed1\": \"categorical\",\n", - " \"Gender\": \"categorical\",\n", - " \"Color1\": \"categorical\",\n", - " \"Color2\": \"categorical\",\n", - " \"MaturitySize\": \"categorical\",\n", - " \"FurLength\": \"categorical\",\n", - " \"Vaccinated\": \"categorical\",\n", - " \"Sterilized\": \"categorical\",\n", - " \"Health\": \"categorical\",\n", - " \"Fee\": \"numeric\",\n", - " \"PhotoAmt\": \"numeric\",\n", - " \"Adopted\": \"categorical\",\n", - " },\n", - " optimization_objective=\"minimize-rmse\",\n", - ")\n", - "\n", - "print(train_job)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "391c51c98647" - }, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "454f077b984e" - }, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "c4f338cdea7c" - }, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if (\n", - " MODEL_DISPLAY_NAME == \"\"\n", - " or MODEL_DISPLAY_NAME is None\n", - " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", - "):\n", - " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "de7e24205889" - }, - "source": [ - "Next, you start the training job by invoking the method `run`, with the following parameters:\n", - "\n", - "- `dataset`: The `Dataset` resource to train the model.\n", - "- `target_column`: The name of the column, whose values the model is to predict.\n", - "- `training_fraction_split`: The percentage of the dataset to use for training.\n", - "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", - "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", - "- `model_display_name`: The human readable name for the trained model.\n", - "- `disable_early_stopping`: If true, the entire budget is used.\n", - "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5caae7fc10d9" - }, - "source": [ - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "IIfvPCGYyFCT" - }, - "outputs": [], - "source": [ - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=\"Age\",\n", - " training_fraction_split=0.8,\n", - " validation_fraction_split=0.1,\n", - " test_fraction_split=0.1,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " disable_early_stopping=False,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rYirKB_9yaa0" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "KkgCdQQAyZP1" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3f4d0c17150d" - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "581a188f0453" - }, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5adde0951eb5" - }, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a Dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ktMsqtibAUzz" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(\n", - " name=\"vertex-evaluation-automl-tabular-regression-feature-attribution\"\n", - ")\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = \"jsonl\",\n", - " batch_predict_machine_type: str = \"n1-standard-4\",\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = \"\",\n", - "):\n", - "\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " EvaluationDataSamplerOp, GetVertexModelOp,\n", - " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", - " ModelImportEvaluationOp)\n", - "\n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size,\n", - " )\n", - "\n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs[\"model\"],\n", - " job_display_name=\"model-registry-batch-predict-evaluation\",\n", - " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationRegressionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " predictions_format=batch_predict_predictions_format,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format=\"jsonl\",\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", - " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", - " model=get_model_task.outputs[\"model\"],\n", - " dataset_type=batch_predict_instances_format,\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "a712dfa762ee" - }, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_regression_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NOvOMTEgCVcW" - }, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_regression_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bdd9e2fd6841" - }, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "8f17c5c7b3e3" - }, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "1aa7d7bbb1c9" - }, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if (\n", - " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", - " or PIPELINE_DISPLAY_NAME == \"\"\n", - " or PIPELINE_DISPLAY_NAME is None\n", - "):\n", - " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "90f424d5dca0" - }, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", - "- `target_column_name`: Name of the column to be used as the target for regression.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction. Can be \"jsonl\", \"csv\" or \"bigquery\".\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "JeSiA6-TSgV8" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", - "parameters = {\n", - " \"project\": PROJECT_ID,\n", - " \"location\": REGION,\n", - " \"root_dir\": PIPELINE_ROOT,\n", - " \"model_name\": model.resource_name,\n", - " \"target_column_name\": \"Age\",\n", - " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", - " \"batch_predict_instances_format\": \"csv\",\n", - " \"batch_predict_explanation_data_sample_size\": 3000,\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "859fa6611d9a" - }, - "source": [ - "Next, you create the pipeline job, with the following parameters:\n", - "\n", - "- `display_name`: The user-defined name of this Pipeline.\n", - "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI (e.g. \"gs://project.name\"), or an Artifact Registry URI (e.g. \"https://us-central1-kfp.pkg.dev/proj/repo/pack/latest\").\n", - "- `parameter_values`: The mapping from runtime parameter names to its values that control the pipeline run.\n", - "- `enable_caching`: Whether to turn on caching for the run. If this is not set, defaults to the compile time settings, which are True for all tasks by default, while users may specify different caching options for individual tasks. If this is set, the setting applies to all tasks in the pipeline. Overrides the compile time settings.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e8dce0638349" - }, - "source": [ - "Run the pipeline using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pdHib_yUEuEk" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_regression_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "625960707c60" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U2zocUvk2YVs" - }, - "source": [ - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "mtHA8rhGGQv3" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if (\n", - " (\"model-evaluation\" in task.task_name)\n", - " and (\"model-evaluation-import\" not in task.task_name)\n", - " and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " )\n", - " ):\n", - " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", - " 0\n", - " ] # ['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e69f183f902b" - }, - "source": [ - "### Visualize the metrics\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "b7c5e5c35ee9" - }, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " if (\n", - " i[0] == \"meanAbsolutePercentageError\"\n", - " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", - " continue\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(10, 5))\n", - "plt.bar(x=metrics, height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "c26ad3958895" - }, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance.\n", - "\n", - "To learn more about Feature Attributions click [here](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "b09056628b26" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if (task.task_name == \"feature-attribution\") and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " ):\n", - " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print(feat_attrs)\n", - "print(feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "4d9d6a82d826" - }, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "c26a2091f4fc" - }, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print(attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "77151be8d776" - }, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "069bf017e0de" - }, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - "\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "automl_tabular_regression_model_evaluation.ipynb", - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - } - }, - "nbformat": 4, - "nbformat_minor": 0 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Regression model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI regression model evaluation component to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a Vertex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML`\n", + "- Vertex AI `TabularDataset` (AutoML)\n", + "- Vertex AI `AutoMLTabularTrainingJob`\n", + "- Vertex AI `BatchPrediction`\n", + "- Vertex AI `Pipeline`\n", + "- Vertex AI `Model Registry`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI Dataset\n", + "- Configure a `AutoMLTabularTrainingJob`\n", + "- Run the `AutoMLTabularTrainingJob` which returns a model\n", + "- Import a pre-trained `AutoML model resource` into the pipeline\n", + "- Run a `batch prediction` job\n", + "- Evaulate the AutoML model using the `regression evaluation component`\n", + "- Import the Classification Metrics to the AutoML model resource" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook. You can skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). \n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BiVlyW5OUnjK" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bViYfWfpVAiF" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "20S9En09X0PY" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset\",\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A-QQkeUnq8Xt" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple regression model using the created dataset using `Age` as the target column. \n", + "\n", + "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Bxn6ATUXrET6" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2e7664fe3af6" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6cb41277f4f3" + }, + "source": [ + "An AutoML training job is created with the `AutoMLTabularTrainingJob` class, with the following parameters:\n", + "\n", + "- `display_name`: The human readable name for the `TrainingJob` resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", + "- `column_transformations`: Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using \".\" as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. If none of column_transformations or column_specs is passed, the local credentials being used will try setting column_transformations to \"auto\". To do this, the local credentials require read access to the GCS or BigQuery training data source.\n", + "- `optimization_objective`: The optimization objective to minimize or maximize.\n", + " - `minimize-rmse`\n", + " - `minimize-mae`\n", + " - `minimize-rmsle`\n", + "\n", + "To learn more about `AutoMLTabularTrainingJob` click [here](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3l691PEMZFdA" + }, + "outputs": [], + "source": [ + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"regression\",\n", + " column_specs={\n", + " \"Type\": \"categorical\",\n", + " \"Breed1\": \"categorical\",\n", + " \"Gender\": \"categorical\",\n", + " \"Color1\": \"categorical\",\n", + " \"Color2\": \"categorical\",\n", + " \"MaturitySize\": \"categorical\",\n", + " \"FurLength\": \"categorical\",\n", + " \"Vaccinated\": \"categorical\",\n", + " \"Sterilized\": \"categorical\",\n", + " \"Health\": \"categorical\",\n", + " \"Fee\": \"numeric\",\n", + " \"PhotoAmt\": \"numeric\",\n", + " \"Adopted\": \"categorical\",\n", + " },\n", + " optimization_objective=\"minimize-rmse\",\n", + ")\n", + "\n", + "print(train_job)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "391c51c98647" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "454f077b984e" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c4f338cdea7c" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "de7e24205889" + }, + "source": [ + "Next, you start the training job by invoking the method `run`, with the following parameters:\n", + "\n", + "- `dataset`: The `Dataset` resource to train the model.\n", + "- `target_column`: The name of the column, whose values the model is to predict.\n", + "- `training_fraction_split`: The percentage of the dataset to use for training.\n", + "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", + "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", + "- `model_display_name`: The human readable name for the trained model.\n", + "- `disable_early_stopping`: If true, the entire budget is used.\n", + "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5caae7fc10d9" + }, + "source": [ + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IIfvPCGYyFCT" + }, + "outputs": [], + "source": [ + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=\"Age\",\n", + " training_fraction_split=0.8,\n", + " validation_fraction_split=0.1,\n", + " test_fraction_split=0.1,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " disable_early_stopping=False,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rYirKB_9yaa0" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3f4d0c17150d" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "581a188f0453" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5adde0951eb5" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a Dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ktMsqtibAUzz" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(\n", + " name=\"vertex-evaluation-automl-tabular-regression-feature-attribution\"\n", + ")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = \"\",\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=\"jsonl\",\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a712dfa762ee" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_regression_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NOvOMTEgCVcW" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bdd9e2fd6841" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8f17c5c7b3e3" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1aa7d7bbb1c9" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "90f424d5dca0" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", + "- `target_column_name`: Name of the column to be used as the target for regression.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction. Can be \"jsonl\", \"csv\" or \"bigquery\".\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"target_column_name\": \"Age\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "859fa6611d9a" + }, + "source": [ + "Next, you create the pipeline job, with the following parameters:\n", + "\n", + "- `display_name`: The user-defined name of this Pipeline.\n", + "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI (e.g. \"gs://project.name\"), or an Artifact Registry URI (e.g. \"https://us-central1-kfp.pkg.dev/proj/repo/pack/latest\").\n", + "- `parameter_values`: The mapping from runtime parameter names to its values that control the pipeline run.\n", + "- `enable_caching`: Whether to turn on caching for the run. If this is not set, defaults to the compile time settings, which are True for all tasks by default, while users may specify different caching options for individual tasks. If this is set, the setting applies to all tasks in the pipeline. Overrides the compile time settings.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e8dce0638349" + }, + "source": [ + "Run the pipeline using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pdHib_yUEuEk" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_regression_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "625960707c60" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U2zocUvk2YVs" + }, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mtHA8rhGGQv3" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", + " 0\n", + " ] # ['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e69f183f902b" + }, + "source": [ + "### Visualize the metrics\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b7c5e5c35ee9" + }, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " if (\n", + " i[0] == \"meanAbsolutePercentageError\"\n", + " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", + " continue\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(10, 5))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c26ad3958895" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance.\n", + "\n", + "To learn more about Feature Attributions click [here](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b09056628b26" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4d9d6a82d826" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c26a2091f4fc" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "77151be8d776" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "069bf017e0de" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "automl_tabular_regression_model_evaluation.ipynb", + "toc_visible": true + }, + "environment": { + "kernel": "python3", + "name": "common-cpu.m94", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/base-cpu:m94" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 } diff --git a/notebooks/community/model_evaluation/images/automl_tabular_regression_evaluation_pipeline.PNG b/notebooks/community/model_evaluation/images/automl_tabular_regression_evaluation_pipeline.PNG new file mode 100644 index 0000000000000000000000000000000000000000..10cc8a2382ffa673ff553fbfdbee36cd07e49bc7 GIT binary patch literal 46871 zcma&OWn5Hm*fpxCg!D*AOG$%t2}pN?ASor?-O}BubeD8@hf)I442|T_Fu)LJkNltW zyzlvRKJa5^*n9Tg_kHDB>mp1^K@$BX;mapao}f!hiK#q!f&hN<NLA!`rDl4Hq10)gbr z5Q&J0thzoM1$Z8bHyGoVL9JJo8eubK7cyGX4CWs_@9zZ^n`>*?%V5j=j@x5t$VE?) z$pf)OkjY6T@;zl9U+=!5Z9)0>>F($EyVOV+#7Yb`wYAfK@E6SS&5FQebyLj06ieTf!bO$;Hv30zuJV{`w$kIx ztDZzV$d;6si|gq8(2@-O78V&U=Ik6LX}1qT#(De_r?wXnU>nRUJOOguFt?J+PmoT( zb!;$ObT+B%Z(xBG_#yJb{E)h-WgMpYv$Hd0P0bxWioRR3Zl;ODnJXd6|L(;pQK=7= zJjWkz$wN?y$K&6!^Wnyks!$oyidH9SsJ-bi6#h7#dB}u}5;;cPE*2_*BXcSnn=y|% zF)^T`mPqfHFUdOjZ}}7k`Tq`Hk)cqZiyK+yJTA{r4f%L*xK{Y(%)5;e)2iDLGO%n2 zEge%o#C5r#km~C3iK#Y+n5s8wQe^B05|CJTCn#=@2DKUs9E;1!dY@qmg+usVnr7>w zphCA&xRdUi8hD24_Y!JcYOS+Q7$dun`#>=%N-CB>!3#Fo+6s*fe@R=Uox1?_Rtbap z@aPyTl7a$uO{q%FJ&TKryN(|0*e?%8)rZ+hedXkfR(y)SD`aw@Yz)TqM?AQ_cy


Y zFxpu@aXal`-JT?CrwJ4Zfp3xCUyL|5okcJHxei1q|GDeq3bq>2dH4nbe_soZs;V*V**)mZ8j%!%5UCpHVXJI%_dOwhlT5deV#3U%_$h)mC#v`xQ zukQG_G@A$a^rlk13tZZuMj5QmpN)@b^9LFs!S|OC{y(TndUt~nQqs~A78dk)zAzFl z+xgN1JXxxW&O^S$q@=YN?mJQ#bbGBxLpSQzFPU`j83K#N_TZ+ZYiij}+w*4gDTUs> zY0Jp*hnM(zb{!y_L3zH9Ve7tk+nvSSKBxwNhOnE~ZuZ}Kj_GV@mL{$0wc51bWKK@1 zo^HjKnr{ z*9;o(8^1yE&fSjV#zb?;(mzBj1fS&w;lXEL-Z;Ogc@|-i)Ui%W=uVNF;e=6nmtK@s zw(pbKZF3>z&>*n@HtOH8kBfRPKl_VI%Ek43GU?fl?gac)R8+#he=iIwfmVmb9!0;e z-<0>Ng&{#F*NkZ{RFsb^U0GZ!QPKgIb^I3;4LzuSF||FkI;r0jk};Rs3Q%GXAMFLz zO7yS&yq?}8A8vT(w4+3wUL=?CS9-1K*#GQ!mC{#zzoO-@7PUY1g8M@NCb!42l30Cy zU?j19qRfkvyYwT`@H^Or>I(U~=B4Y+{C0`;1OJ)8`^!ceyNMwDVv4i)4iK_v2-?Yk zY=?HVF!PghUTKSnRZlAulPwExR}mHIFBMrEdVJ{yfQ9O(2>6}wQXZdA`nJrvp#6`3 zP9~?@xSKibMkDG7QFmj_N=t>sbd^O|zS}sXCl;bY#MDXsk-Hebf4AvxagkjRde*S% zIy=k#dGXTvHK#*9;jYoIFq&doc2+W`EhpLMt#qtR{;>WtvaTuF<-YWC{Ss%EOz4xh z=cFZn?h%)E$OA{M2+HB5lJY0N@onE6 ziY}a`oTyC;)JC~`>OWoG0qEk~@N6XVlGFd^4;fCK>^BqEBq*4O z?H-qE?o;vma@}Bos}Zaj8ua%cQH~0qH^wF;?q;G2K89)(X{Rw48*>=NVm^!lL3_@_ zj@#z;T#&l$!Fl(U&kNw-rC|Ylum4h>RHN8wt%(yz81*Qm!=t-$H93UNF`nQt#=TAe z&7KBWbmZ6Bryu?Wg5zLF)|l#)ctnaJblKZRcGZi^7uzP?2lLSV3MDZ~qiL3_J+m%j zGga(%_sSA1%e*FFx=#U}B#$=)MM8YG^!8rAXBmuReO^{t8k?AykJj-U$Qyw1w+fjR zwKKWgIi%m-dR^2c=WlL)(T{dqbDK?jJ$uxc(``xB1w_*lox=itjP`fiBXQMk+c!>V zqmd4k<|OV&FOPSZSWaTNmxZ&^85wWC(8VVtBw}S=uURoZ*mLIQ<*8kNp3xEYV(AX7 z&ifP` z5Z&)%+CG@(r@;#VazA7%f)E&uB+OTPJ#<;wBJ~5oDAjA0!EYx+;JIcRe90K1DBS_I zU%RS`eda~!UsVH5{UZNo>d6v=X7iV415xH4b9|E?V|{xuNxF^xCbR%2?Sj^<=xrqJmro8v&X34R3T7 z2e!V?T}DlYn1sK_&s4tYdS244F^29a$jX`Ns?NlF8n`$amKwEaIQDOSb{mBrp({60B@3Jg|?EygI{p?GyBCo|*HxJpFRbFmOCQJ_(7OL_r!-D$49|)f;inXQe&5=xMk{z+769ESH{eUgtmwr7I z#-o35 zh~T()oy4DwpCa2xKa0DydTBFB`k#M%7j`J4PDzWs{P3@20}3#tjeliNVqtcM%YwP> z%LVhIjLSl^pL+kI_t@w#HI-)Fz{8^p0hM9E+79)pu)HrK`(*gM5mA&*!Cxb^A}=s&INpJ2`PkvcN-ul-V0YYH~r+dS~ZD z&iwY!Il;pT!}JR$6bZ0)V2s8|BRXD#2NCF%j!_R1MP2)UwNOg7xU8`;6-9bi_as}r z+`{0Lz8Ym?U&Y^Y7aCI751G+~OL8M8jl_=>g3JonKYkOgjh=(xJL5#p*DluiX3(O( zML%BEFMQ8J>=Wa=20L4Mt>Mwy_^Rrzbv?GeTvph;bA`r)E4LPrwGG31R`mqykd*92 ze$tRyUt|CA?1$)p+<)eu5uQw{^$>!mhIZ#v`9 z*i3sK+!&hd&QwtV6pd#!df<}B=LjXhz9#S2c8l1VIa^TpHp%tZ=J~SD!q-|!iM54| zRQaWkVsiVr;tioTw%ttBSnqSSW5`o8GRg~q=+gRuA94?Dl%LDO-9(~p#4ZqDsO#9- z*@;kWzTuy5*Vfeh`GZfet(|>m2U9&OvtL-1F{YMGJ-vEfyNHU4I?X{tQxtbRBQD6Y z|Lnw9?*{J8`BT_AxkXxCq1hAHHLR?x7J7w@)(oNH5?U0#y_YzU0Bd==li=OQU1zI9 ziz{LF_*NG)`1j))w20~rbtdb~<+^uVafBWQPWV4paH3mQ>C;G7Dy+-s94{LfkBJb? zsuQLB7CW=XP}-iNP1*F~@xW9BnfTSy9nyq|9U92$4muuQ~$k^;-*?aQ2WEXB%5FMxO>RE z!J3c{GQ`2qoz`!8nFzd1l^4L}d zVoi4c7cF9KkOz0U@NXF&pG%ag)M9&;y3p^@!t#kISiUwY?S4bNrvn}vIA-_r|2B31 z`|5peE9VBoJbtxy5FFS#h!S6hoZ)z=idVYWq(cDWNUFZ|$^JFeTyl_a1|p!uE4qHFdQ*oB1jM`?=kvs()BFhWGjXA=_> zi_Ik@g1_I0@Iyo-=VfJOvB~fw{GK5eW9Yc9dsxie6riZ{FCyXJA2g1KcWEqwQ4v2$ zV|aN4_}ZHfY(U?px|Af?OX$kWD_s6jKVkXi68xaz)S3f#r?WZd5c;XMw)cq|>Xf{^ zC_lrbt2D28UYZkF2BKp$-4;_~lUWyAek~u1?%eXd`af+Zps z;u=D)T-LK5ASELw7Cun|0?}?=Bo$XrH>dDr(iXj#p=V(8yyTH{OIYo`%rVh>Xh4Me z=0^RLm63QB>Gb0M0c{f%KHtZpmv`g@UyzI#uXot4d26=yfe>yQ%sk`-+ZP|mR8PwJk_Q79R`%ZC+kRRTN zPpvU&8ZUc(!(!jquRI5@Zops-Tih;q-Ux#gAv)Ah-fWwgMUkGcv5GR{qBJZ+`h4kl zZ66gYE+Rrg!m6VWPbElMm)gZw+gh3sVX_BJ7m-x?eXbB>B=$@XL{)|NPvMP;r~K!- z3&-W>Mr7TE`E@ukcMy~%_d)9+DMi;xbmj~Jg+`^8PBiZtHtbig0*Z@iIT5R%JYD(T zb&cM3A1nNfASQM<*z<7#N5Y%go=dVB(Prg08o+cSsnrJ&*fQKHw`(d zA*S1z?K^SF3&5rV0ZgjVX!*v27;46Pn5AA42+fy_IR4Ixj+^{&oCo^Vb1np4GB9ke zsKCH{e_OnFfD)_!DfHX^_G&%uQOW7(A(_&=5~LwsN%DE4_tvi6!Y?e|TNJU3Ds*KU z{2sQy-eOcyen#7(c2*gO%im*Hd!>NwQLCeguXmDs_GP525ghJNYZ*EO%Ljrf0{0PctDhHx&H z`%8Y#(IF`fQWu5%O!u4+0;Xs0-P+79=eQ(rUt!d?gUaAi)!-CI! zlIp$NgMiywLCVPqO;)KhzAn%i?+}JjJpeubv`-*Gj8JfUbFojtGeQhCB(+O^MEgK*5S@wmJZ1|IJRC!O&PUpEQ*pA#B`&zW)e8ab}u$)$lR-gHK z32$X}GftYfhJDq}{O1Vwi0notUY#rqvp8$+g>g5IKcwk*GfADpqOW5%_QAtoODS3k zWgF~2ad-Et8g{73*p+1^Mn|<43m?CiA$4514UT2Zeg30f(yIbW0L7)GtQ1F0_@@QJ z+rrq`qTHFVxc0o`yN8RuZQPvy?BJ(7!)eonaO9pBj;3YNs(stPb3_w0 zan|_1oYN+ep8hJcU4?M#t1fZ@-`{FCD*$C!EAO{=l#-D$P>%B~VGTgS>^X=L>e`4A z$j#;M*h$0%H+yb)j#O7y!&d|)_{{@U+qN_mwriRd0$FtBXsw)~$V#~zIx2}4Xn?p< zXbrdSe<|aLFMK^?lpXFQbUuA1$BAOH#Jhyu&a9-`DdogZk2_Y%d5B9Q>3E@eY;7b3 z{R*Ug&?yjqw8X0z_R`%p7C5MXw5v)1c!`?UPk(LCY}=L>vv;*}-i1v0`T1|?Se3)Z z=~RnKkkY9L(~^?H;wYn4Fh5A|g^HCjw{6l~tq%v$($P8l64Dol+OX)u_KB2q)uWs? zYLk+1EwrUOV~dl$d?sdB+(|C*chz7m58WXKVKp4?Bk^ZU*tWbjVMD{>!ABhy=-3mWQ}_9Y>Os<|x;nLbW@ctQ z(=7Mkg&5*8v03U#@Ta!cG2cLKtnRz@Ys}A!-QZ%2;_h^&h$Xy9YgRE=$#v}*$>`T> zC%)jfL;bb3M4~N7001En{Mp5&;a|2#fYiY#Q(b&{)A%nB`U6?>rn&A@Svy}Uj=6^I z49A2GKEtOfn~+(C<={0JxabJ=SGU+1_Q>CHS4#=50@cg-8nvUr}pa9S&Z^i*`@wW!e;p+=B^FSxuH> zXgnW`zvMdY^kIWRb}Wd-SQw3MNq6D8b88-^1RN&~&_;1mBs{O)~K=6g`{~ za{3+~mo#lB5=|@lX+#RnR;~&H-LczkB>$4^JfRg5K+$sqyKi zx%=1)Lt7@c%L_PEqiUH4Flj!qKC z|GGj#xku|+9IW8I{+Y`7{aNQHi)FH!jO7O2d>@ajT1IYYbKyeiWgXd<0f$aM&rG0# z^V_uETatHwW-PAG36yGf+L$!#ydVulg%3iaaxC)`UlG$R+uuvkwL$Y3Dm8*J+IIhl zy}z4MLmNfY0r0GhU&FSyNc&vY-Jz7hYaa}-Nqug44~F@_3O|Fb1~RvYdThoO zmsTie=y_<-KO#L}P~Jw86a@cDP$;PAAeBTX+go-tO1`fGNs_SQltn7+Uo@H$l1zYz zX=!_emfes5+9ft4sf4e7cG;(K;&cSll#-WD+WX-)DB|Qi!j`rb&6IYK&Fq6vz3;aa zqxd^a0-h(Rk=9$V{4;q4g`SYCkzW2I@5r@YM9(pU32xcAgp87Cc?U9+c0uU*xmp|h znbGFotvrnwwDTb;=-gK0_jg}#@L2A;9hp!0M|6*IWn_l`UX~S9aua0-|IgFCRGkteC7eduHxUs~pC@va+ zJFDv*R4O^CC#8$?G~G4(3EAj9v163;`h&`YGs&L|Q7`f30s5;3PSmJN(Y;zLcB`Q` zCQ!X(F7ds^(*Bh@>7_}#g+hZFJLU1ilHcIQ~ z>YQ8oz8ra%>(oA-mLSs5SjmnS-z)GcI#6ABuI`I;sj=_9D74bB2YK*u)AY>q4T;XK zYa>P(;C1<;TL6$fsnHECrIau+h7%GZrpmxZE)ZCVmyeW1H>osty?VmhR*fObh>l(` z)Y8X=uHk4ds#_wY#;f0A$qtef8Xetpe~Xa!sB=z88XboRtZq7?F$L_op0;^rv48MdE=n+Z|5V^wfGt z=lfBpV*Ib;E^?d6m3=Ti!k|WFf`gL-Wem!u>pssdU%D}9=tz?p}ancRy z8yqB(8Yf*mfL0-jozYIo=C9l2>f;pk>j-k53$O-+TQ$al}O(uQx zy#0C1P&QqRePMGiP}}W zkoPCk!;If=y@#T9cTdX&r0!3xl2L~1y2S2>I0+bCt$OAUDIWSCKj!QuyOJcXFTNe( zTWKhy_F6<)arrhkYl zNalx@YN0gFs)()6N=izGi7li1dCb?VkM*l(M?OIIp*d=8|X7d7}acXz0wMELN9b46T(e~*`IBGM_s_9A zYJRM+(P5K=QGQ+j;*}a?=&cWvkG0-ncTjLZt7%8;r_QJ#9zSF0G)af9kyNGErgz>7 zYS~E@F?A;0N8WoI>&@)0#Ktzf!Bmxqb{pkTehD5>f;?7 zmsz@i-Fy`tZsU2A^=%D~Dyin!63LoFc-k`jh{r`^gd*Z69neVska6GN?DbLz_Yv1| zD7Z>?*|+fH)iz?@$D-Ovmo#17vQLZG>%1B6FJ2#Hu==A{*KBOQjwjzjS@DU+3>w@oYCS-ME(b0T zCj}W-O!~oO8tX@ArAi++>?+Yo)h`rAQuTvc1xx!TNh)$=f0F-jNjoJ2z<@@5_bIK1 z^Zvy#cr#tMBk_)&LZy8zwk|7R8bjyQknaA(86ZDc_N8E94q8}b8-$bPB^c??W0Y*j zK0@l(*YRbJ|2A^E-9%X29wf0fQoG&WCaJvV?Ie7-tw2m^R|_}cr)%Rx|5#=pbYKDH ztF3St`XCZo68lyf6ZNgA|E@vzOU&dqMg3LU$oVBc*>mZ!{_d+v_?PbS_)pnzA4%ob+PWCw5KG!G1K*LWi|w zpdClhZC0J`l>YwD<|cDV8N4i2%}JvImn=YB6bI?gRNDB}{YEwSn;sld1s8YrG{*?m z`tD2-i}En>+xShsrHm~YRu|{;+kcxSX!RTMg{#|&Ad^8^CC|aFQU7!+T)v++v znZjADU9Leks4>@wyPNJjLDJ~7xt!6|kg$p!4(a{w{gH9ST-*ugow*WP^e_7YM{5=Y zkYV~x#saHgjYB0#M1v;l-8m2R`*9zma}v-r8uB^SQmKNgJzR^~Dk1sSUhM_TSYYf_ z4`a7&w+1Az@IJ<3>$BPY!05xF-Y6vtGOn$~DlTnnaBPn^f>x!J^sp=j*O|x`daw>B2sg) zZcXjeyUE;lh58%==aBD6NoT{5D?4!#oA;b;;*Fn!TlTQQ9%$VM2UeUHK`sj~7R;fN zlJm}k-?FkQ|H2828ru+ynT^@h2|-(L0YWPbop;R3iG$lw^ANVN%GiX2jopzK0-Ugu zkW#P}B!IJ!reo2j|L-%d$bI~v3GY1Cd=0fC?@l34gMw!dno6N7#4DUDEU7;ab-A6f zTjlsSU;*r0=m2BzWEnkQjLIoaJD{Tx>64ZQXq_=MV$F6gbA24M-8Q2ry`8%ShrbJCL*Z zo6mSPeZni2Emg_N6+HtiC{@wWX7xoej3f^xW(252h~um~st<=PGu-Tz(eE8FrewW# z!~d-`xxo;(Hj&b&?XG5{)G2wXH=9ZG(`Abqht-Ck6(c|klpd0lykl+nTDTGnp6{H~#p>+M)vAFzoyW__|L`12w=?mtJ*D*qw$)Qux z;o?VizukKy+pt%<%Pgn<5rzh#(}!XKQ5me{BJu&RWL>pResOcUIS8Yl4wJ}yUU63d zjS&$=khhDRsYfRdAm6 z1M=)wuBDr-42dPqxmE(=%Ee3Jyv@|=|5uUaxie3>%5mRpHoV=QlX0n3jjuLWZRaSE zsjnGTgr`;#LK!1f)>ldMaqZM-eR^)8qwdSY{fII0Na0?Cw zffTUlL*W7+4`}O3c1VHL{&PIpzHVzYRy#62!J_+^5jEIEJjYjpTYYxaZhSCl;DrRp zFgI0UW*V;5l6uYND=OfEyJ+A06XlRIFp$|3V>~+bkZwPcJ!PKl7hlrlr?YF zhx<=SO&jS{eUhq9uAJ~V1`7W1v)<4<6J0~WFh4Djl+C9B*$R7MG17c|Qwj~T1)3MF z&Hdo+9vbpRV=dV{tdLCgJ@Mmy8{hdb0BL0nKY^|#?7Esr#$^9#8F>|C?xUGx#&k*W z3DI|gpzZ+<%D9+f&i*_{di#wMX{o59okNSq-HS_~HQ_J6+Csqr_oW{&HyC*4^kU6_ zJPBMUA-CnQ&JFtC0_+etvCv?|&rC z!ksW+0OE6u3rk=36=mLSow$p7LhJC=c&hS#N2~M8RKc=ed9x+34q?q!fVO@LbM2p4 zj@bBK%dn zk$F|+;&Cg=I#iP9dk+OI*@%8VqHic+U8U9%s0tyk@zi?vo0ixh-_mVspD>+^0#P&) zm2|TH(Uju#zYLC$?f+r&7SWOoK$p=OmNm~m;F9@L1Dx0*fk$ALSohACF6xt;wLQ85 zA!*t2FXVqiGZT@YYK1A~K@UeFd<9bV{GNN%)VZ5No$ROaK z?`m8VWmw5|QQ_2RAUVbN<*Np~#w_s(c|uC%0|6xKAtvk`YgKbf`{3+Dmh}bNyZn4@ zEa4~;(Ct|(p&TJ`K%PUY-v5|^iWa16c;(OPSnandlEVWQNAXJPas5CS^C(f2{f^+O z83cYNznA=KIXhgKq!ZSK_GoZ+7Y_dgT+S?_GGw@!E8`xvzL^K|13IL;7XKq(Rb?Mj zRa8yoAHVz|LgS(*lt_gPIZXXT>?EB+X12@s2TRXhui&S$Eu;_PE4T=5eSil|2gcP^ z?=hRGe5~@Z`t{-RE`A0IyQ9G*H%nO`lZX_}4yjD?+VAh1`iu$k1_SC4G1fwLHWpq5fR zChZ({Ky1VthmgkcRJ_q0?)|BRV6mTS`0a8F!I9(M<39+YAiL{{YPVYz43at0y5GqF zpU;aWQ>xperKgquyn(!-Y>!w*vn(EO7khp}uJvvC*WAVTaFdgZXzX&~A-4Fgq4c%o z22qnSBoQgCdgf#h)2NE8v)Tc+H=qU8fl??#4?&g28wHU)`3Q_V!!&YH|&W z5Sh1w0~XP%Jga=`rQ=G65k1vyJ%5lp!{IKV!qWRQx2L41zpJ;Wd3fCw)@+ICTmm@c zS4WA#-0YrR&>XId+6=y|XwybH<#2J~sGUE^H2IbNzSw=QsX0 z$>t}a&T16wya<#?CO1)fY1Q{YY%*KqRcLfSXmb@uW+ge){6M*;tF0fB|6K4GA5G#s zx+lK#ysu50t?Ise#D<}XKkc62kbrv-EwU#mVcmWbYjsGDW=b3x#SxzBSxJ@+wZo?^ z`VW`9U%PK&KPrkNoqSuT!$O4YbLXyF;@X+q?oFj%FEPA~i;JVLY^6LKMlpX`n39U5 z=K|;ELv5IPP*~Y%Fk-se7Aa1;oj-bArSY6Ax)LFi_rYV5p;q1RnVz&*r8f-o&M6Yv zD;&Uj@plk!nso$h-C{;2!(yxRZ4BKN&)w>$R|O-nUj!`V$PTyy8n?TcUY}d zKLglE+PyuDsGPg|5VDfVkKbV(m#gh71d0LyTSB)x4S=rUBt+6_n8)9PQP}sNc#d0K zauN0AOQlcqH`ugXoAY+_vIRVQnIb^vv!J%lpX!uYg^{@H+d}Yr5WhvZ0K$o1Oy|M;%Zo9$(>KY7W=i`2pVFhh>>z%&Z+}bYjmU^yjO9QhEH_i0SndoTi)V@6V zE3`5&X)=5~x02ZfC~8mUjA1t|9v9xH?KNZE6cBH;4%oxd2=ou1b;|e+KPJiU3>V#L zvxdHJucOZLgeN;Y0ITcVbzAcsCJn2_UGr*pI%oq-AOx-q3EhqTA{{efT|58#_wTO) zuuG$nAz-U-DGg!u{hV&j_GikFpQ48E?jprFtr=ueGu;r_*d|Z@a;i;m~SvshoXHz=ic(|;x=K&4dYvGUTaZ2k^ zzc_w+<#G>Qy6f2d)#59~^77P244( z2pDQN4N(KK=KYY}#+CWa`-|#d&0w}Wm&BBmGHaVhHrF=q?D$^c=DK#1kH zeG3Cosr^O~)L_3Ge8?F^ux#{3kLTejz&rrjY0r6~KzF6D!!muVlf4;w;qYMpm)_qPaVDGO* zV@}lZXLhzVcRaW$Kx9hlmJ&Uym;^biMHMXqZ(}vCqG`cKR|!*P+_edYIz^ z{ACL|iflt(S)&x@AcAbp0JuS{jo+rN?NIHyA8Nu0z&u*9g?sN4glXDYVy1J_W1P** zw$37^ptA5_VhWyWr2AaD1G)~iIP-ReB9lG>|MLU(@vi#Lj_!o$38f>MpcC;)+4ByG z=+}Nfe#qgLB&#SXg-W?TSEm@SQM@Ld-&UX1iKt=INNnM4gR%^k#<_o@;hN9N&%MeF z;a({0Y^@l(fPEKe7gLdG;G}FCC-HeHcylc)<2Kv?=#20lV}`XfOWcyRyO~PgrK(1L zwtld9dFvMVi5$)ADIEN=wl!2Y*K;(PUQn7irDjbeJ?xh zgk0_CP=TVQKtNEdx8)-H{pVte;9A2?$S!rC+`IO^_wRu5h24*#zu_XjC81&J(iHIY zt20}OA6zWFu1#8z7{b`PFZDM60^r*A!%`^e-U^SRcKBG$yBS>bafVA|)MP(lnuR~fw50PiJ@$OUcAorp$^IBpTbp2^K5EjyJcIa$J69K_$}qq@*+q?%5pUz z5fGb9WFpad$4$r-X2k^xHsUtT5{r*D)L1r#ImbObSm&1h!$MN^L+v%*+?Mc1}Bt@;wA#|Letz=p^SwPEX z`1a~efqT_|qYP7TdbA^KTfxz%;WEZ1mk!#Pk`mN$NHjoenVWm4#3&qiaUu~n+vRzd zkhd{y_O|52x^<1l|7vY8s~1jv#HR-WS-!rF<>(0_`ax0P+J9_R6~SPq=eQuTbDt75 zqrN9h_x$Hi5tomJ3yQkl0?~wmf7-W2Lp4(<7TfMGE)GR4lTV2~#u0#_P5U;E)s%S_krNj5qFFf8v3Mrau|&zY8Cc3eK1^%6`P>a(kH zG@0zoI65nhfW|qhLqA&M;O6Yen*)hGYP?db@u|Sn)X5<%i-Ma)fArzvCw{ff6ka{J z$gxw*&=BFrN22(I_{4W9Ij>&uDy{|V>z0gB((tR!8@Ch3EbB!J2r^>;1PAAg?b6Fw}qCLoch~fF)^BkI0*wQp@$%QYG-g9Wgw$N8i*@SKAfj~*6wkklm%`T!Zgivi3cs65=D~Q2ZC~YgVFl_mh%&a z*)+$-9wj?3bJ5-1Y;tHdx<^>vBW?IgG8^CCxqSa1-Wa+oE0LZ8ZPTc{}PvoO>SrUOa=dGjzJ(VfhmE=-nEikUz^`a1oH*rR< z1)Zuqcd_#MCSK1;Mt*-m&_#QmBzM9j?-&-V?VI{}H^ftV^F_HHD)(Ev%Oeb(o- z|L4$QcLOZ^aA`>gCn<3~aN{iGDS1z1ECPJsyV{0a5N0Tnf?4J!w5~4gwJZ9e5x=ys!FSaV@+_fy;zdU(y5l!Pt={ zt|AcdMxfz+E4M)PmV$v76yV8P--q_En+bEhFB=)wCNpU>9c{a~;5n%_Jnw9O(%9yC zg{muc=G9h;F({xC>F0{G9u2*Q)hB;q$lx&zen#_tv+I>3QJGcQs7vYtATfu_8UL7d zPf{|WBfUhOZnInT#GO;O3IRWb58}Ot73^~!<(ri5W$cMiH~zsW1Bdzg>PO6WmfLGf z%i<8s(qfJ}AF6p&b&qfi4>{p&;3i+fI_|u6o%aJnN1svRt{N@=dTFb zjUVpssgSdF(IUHIKjoY4gartI5jz!o(9jx zrfWNm@^}U_&TZF{p<|9X=&s*+NXDV5(Sent1P8-1U_~XiQ6JZri*D@;S@Oi!^uz%K zPdgFmym7abnZuPvrvk8^e#|r2WeLUEy^JM9(&P^%)AC(!bJ!1-UK-2vE4!!{zFGPK zuF2DD{goEgwL=(4l`l}p5&x+*2?3O5PS;AqoRU9+5DORE@1Ed;hd+6#=M#Zfkg7#p z6VJG3EG%JNelJ(+*1FVRK@QtvKg{X8_A~83JpJXDMMmPsFu>UnNnZP~*$9h7dP)LfbK;`M z4!9+92_$Mv=7a(v5$B@nmjEDI(bU%_A0(*QK$P!Ueq!e^mAv)P`_8N@wWwKDO{$H0 z&}3g~uP(X}m>;!6;jgXdPf17Zp%*@iZE)56tTp_{uN4N145`p4B?=vhB&@8Mm@oI- zyRT_mMxOyzl|O|QC*UjEPOg+4=pqqMUW2}t6XarXn-j?m=J=_GYHJIO7Az<%1S21k zxlC{T+MsP&4ENDg>9n8Jw%T|0&wR<%>kP$NOnj2k%BOf4?>d@ySC`r~cWVtCnGRi{QI{WYcnx^6IzC)!g%OJ^pKA%(g(!5{WtSSxtcT}ZF- z3j^zLB_5xO%1jdi5^zSba)0(6O};27KyK=l*26)AO|Efigx*`oMs0oloKu${Fe245 z8{4qXR0Rwfmx~8?tKpZ5yBxeLq*lj3W8I6%FG&~T!V^f~K}0|y4!1djL)7N@pqy6* zz-m>GVj(GqMud0KN%m#yOW095SylKC7kUvTL#(=36hG(lr`Cy^aZ8Gq??+GU=`dg< z@?%+>73H*Ji1%z5`NO>PJ>j_sP}s8%(!uLX)Vj14W=uctng&vAyHYb93!Ud418M-g z_jxkZO6(e?=iBbp9d?BaQ_DEvx(8*!e6*sdr;%p|d0Qi{W)fBkUn-t%gdc%|-0K-x z6np&RJTc1IZhtpapzgVep{VjtrhRl%r%VRjl(wozykR z=l3$J5=J?WlF(gAHSPbJsr=d`^eApX+*#)R3pdxl*Fl`GSA<^Dplw+ywGsli)Dz2u zq$A=97G$VV)d6s~1%fcSUl=sBExoUHFSrSPVd$_Zm#(I{!`WU#J_Y39VD0Kn1u!E@N9wb$I~JJ%16zRM03{sCx=#L=9}xoph4{*&u~ z%Zh52g{5rfAs?<56{a-E6-S+GM|Nh9w9* zHo3gD15L^gF(7>@ujmcP`dGpdZ-m&tdXE}!zt+S<-Gp*N&CeOXE8fRKXCDV={%rU- zu~QT=xYKw#cQ_+f~B~0K#BMCakYI zKnGJCG)YL?BCSnVZ1?%P0x?f>dPp(>h!jouS-7l}BfI|qzd)>@Bn(FI#8>J?A~i~n zrB$qajdlTjK~Z30Keh3;Q3aZf?6H7opbBBO(QbF6G@MH(PVo z`#W!@mkh)<`{WBPzUYwAMb}M@&szFwXR2phE@82B7i)7?k2|c$_sb zcE?UZ)U@W7t*AwnsiPDWl(rAXn64WN5w$51(FRvdl(R)Cd=dS zcXb@S5G0^s1GV}eH$rsLZBWUHQ0J-M??$`u5GTd9@YmsG4n@Q(H6hI?dB5U3-BSp zTXM2MIQ>Q9_Asag)dHlr08Zn?wtxBQq&uAN_Hy%icY1#^>)uO?HwaOH#W~Mc1(;I5<8uP=e?3e7m#iBbnHOs^&d=<6UO88Jfd3!1-a0I*w(t7C6cG@SZUGrUx1 zr5ouI20=PTxlc%;M*VxJXLv;{hEnXXKo+MrUk^{H=c@OP1p z&Wd*rZhUG#G0({bDVpJjS@6F%5IE9?GNCAJ69lE$%1fU5`H6GI`x)$}?#PBiK08u@W^t7l%06?hbl#(mTxE-Sb@+#p813b_fI^pY z(uqKNnuux?iN5r$YB2FvKv8-a=6_X|t=3vnc5cqlQ74(5k_s4$#QD)T4)NxS$);PX zVjL7OlhkEX2)V8<8z-;6Bq>+W0@$>>EfU!pFmiSXPmFIBRXs~+&X<1Z$KuXtRvwSd znULR~By>i^r~=kD5=oFL|8&F!VU76Q^s6eqC)}jHZR!%eZ~=ypZoI_x+O^h!^tY?E zytIL&(Q|Qo+gp0_?Y1m@k&f+0_T>eV0-{Py5ptXMxixk^J#}QWXY7PFcZTS+$ zw}fQmOx!Gn+$uOZD!u~&hEQlMnED)c&nC*wsM3%{&vW2tEimmh7KNiv;a;`xa3MAL zk&*Ek9|}`K=2B-IEuU!;yF}sluCm=>j1Ft9daI~2BFbL_6{K$xp*CdyZd@EcQt+#o zBeyq7?;z+5{ynM`gc+Jd9tNg@ppmV>q`Mm{|^wp-N3Mys2u8mDD0b(c<_6yO|H3tw6&<+!rVI{kSd!7B*Fj zaXvhGbNU-47+H;k#JF&x#GsYc=(9X=IQ7d-4f)&$@W&-nb$D23$F{Z5GkeaH&8gTy zrWhC4Get8pGV->y3MTHykIk}P`Gk9$!P9SHuRX`Lnd<(pFVa|*TZ+$fHt$6E}6j!t&p*A(3-CGQl_Ggx7qc z`r0bpMKkiC!15eXRda*lCDToXx9Sg6uZDX?jz8ogRf1$*4tumyX6B)&S~269^o9rKCbrhE&VEmXr7 z8%J8!R>NCWye66}is)y9Y7O^Cy~7H_m38OE3~nmW8;~%HTdoW0!e3Ho`i(vwFJ3y} zY0!CGG4ziwY#~x`Xp#q$*lDOB;pVnGI*$7BAMvL%I(-D;-`wVliM8?%=0&^+!!Y=v z`97C5@3g7}1b_D%=ZiW!aPc!ww3H}MU8?)uf3lz<*&&6x6#nGBjqQg)Cddo^8uI9!yD@%O)Y0BXFL_5jun=7FV`m=}M`VY{dJ1Cc9YgdWOO4sr2Hb1@V zblj({Bh-*DEetv+$s%{2T~hLUfpDH7*iYV1KE{;JUHG$?bO_# zU*{}u2YS})6}no+heEdjEk@IShqoMY=gWABJIp=oTfaCKKsu41pg_DNp83MxMZC$? zr*WKqepzZdj5g2{D}`5TM2Vb?%zUNK_UE0seY^vX?2k@3VwQgwaP&CJLrmh_i-(<0Hl-)K4z3+zQ0eq7ksoKm7DkG*V6_ zllf(3^t4ftDyIPxq0ja7%FcDq^=gpm>0ie1XO6aK8ZU=d(TfKzT$k&~hD0cb*emP~ zsiSvh)Bnh-Jz7WL?l^Y35Djj{ptH{lP|9t)W=sH9Pzp{DtmI5VIJo6_N-8}_E zytgmEd^Z_d^<=2*Z04u^FYE8rg0Ib(P$QQGllHt>w(^OWshFG}KDC(H6ebY@EYmjq=dww))0ObYV17(0lrQAz7Uh-nJ4&M5l4 zTfqqO!eJifWDAgeReiE|t2$E{;GXE|{T9nX#84{yCLz@Tbn9Reih8qNY}ek^NMi_O zyuw^{2DH~0tBCtLd#EFC^w-TEs89H%G?L#8-7E@8g|u z4WtkG=Tb*IRHr8zvLZR(%`_U&5d81?N;C2<+YFt9hW}a<^3%b)-)0>WBy|o|2gba-tq7U7Q(+BZZ^?b4w9eRBPkC3$OmSd7Xpo*~B5HJKFdANxXmw!-0h;C zSHfhW*ZwC0Xn7Fo*>oT&IZ<#wU%M~vX`R21KSJQJ(CFo_KRqyrfZ`U6l;ZDf~4$#!;Q`djhe~ow~;z@ zqHXfm9t&5ibw+eCzj_6?08z&KDW>zthsHTvf9kxyO{Tfx{ohnUra%}qs5l}i@? zm;F{V+5j>k@dYC?;up;wCIZG*4>e1*SsI^++_= zBXCfQOCl0c>Z+paAPg=mFJC+~mb&1D$@_kvb?A>ZpzdhSTj>j|scG7l39=;)8f9&8 z-7NLAvfO$f{V8>!s*CrcI(gqEgTE77{eei$%pGgpVb3dwk4Yf}MfVPVKabWi0d9ys zGxWXeoUg@47!(doy^=Eep<)yY_pDl$R-JUh^HGd|KG-VoJK-LNDbx>2;wI#f+zH^| z?Q;I(xV|J3bjA#gu>fQkrF7f!)oq*gjg7q-48v^l_{{3a?a`8(g->K(H(#|VcJ3!G)3lfjG z=27nZnV)!#wSML9s}D8jfkhKx@Jf!AbT?RomfM#VBZo4!5@Cd6UKG=g@^HwP(!;Ou z+X}|ACj!tBGca?>aFJbD2~b=C@QO)QW7*oR9-EUF77!d{NO#Lgu^VY}`Rc`kyy>B= zU2I5#2@`AFwt{|yllN1#{Z|*n?rFMeOOOF!Pd%qKHUAw>ezJj=gn=TZW zo_I<=l{^WD6VLYEo?I;SOumkwIVx-;wve=jaeWaYdyEN*;7jvf#2q3Cqn+JTVxK9* z86;@oMzBe;BiJQreuJMfwse5J;qPv$U|n5ZMP+5J+faaU%rb$9Xzd>yj2dL$voNvh zr{~FKV9sIFqzWG?WW1R>G%NTvGEF$bV!=IZ)%hV;Tb9n0LObCtntM;(c`mMOUOc@0 zsobXmP8Nv3Xaew6JO>FLmC&oWsY&PVx-wTvzjwJYQ~-7-s#lIQyT7APCFKG4jm9?& z%+AJjKO!)&{M`d&AG9k#OZ9%z_l+$OVB_Xh$kSZB!9iC&C}&d|XxE(7Z4MM+NR z>4~}uYnC`l=;FmJzG4*P9CqA#;jx1(@UtCw6R`a_l&JZ%|H_KJIX*1yH!CYEBlqal z>yStaUfsa*{LNs$Ec9kcKZV&r4c!MmtE9j!i`=7$eCvXUr00mKo)`7PBqvQ2Z#+>M5dX+~s3 zl4h&GI-u*SXp3uMrEXF#mpeMF*TcVWS7X%UcG&%X|5(XzLaScgW^3)h^uC-YP?Bc9 z#qgDp%7CjjWmPJ%J2-bD#RFInd^ToM{X&<0UA1a=-wA1HJEA`iuRZrZYA8)|UR-J- zQ#(Zo)82zTF20TR^`^a)|7UPNja^M4m)2&;$@FL$FYAbQV;g#n18HzC_*^K^9$mXS zb5OU=_5EGc!R*RwJuZdCA5+FG4dDIPdHZ8Q&gYhhjWUEZ-fi1A3bEoO*Bq?t2~k=V!Y!3g0A9OUQc1$p)FG*Po%NnPLI&3#dS)(}fim#+h8maiZ*& zNBy2};L+AeCn}F-<&!%{3YYQRDHx}_eDayVcr_?bSHBH&s<)g>T6HQ7tdVo;>By$w z_~hHP#k6076Ul2v|NI&fjfXYwcOkL`+u{EhM!5Uvy6&+F|(T{RI} zFNEsaC&56HHv85)RXJ631oqyB7t?xQ+zi5U$74mN(NJb#~-#8uvZRg8KMy%zQlEdx>$rxOb{7%WbqN zDqR0hvX4Iip>gtOefW^w=X#VD4W3gvp;>;KTi+ziUqm`l-&8;U%05UJ78oH_qvwVn z|75(dXciez>@lo2cSqi)s@&()5K~@a6vfE z)YjCS&n@=4x1U!R@j#LmR#lyv5ty>#_sX2QV1l?OkVwVQc``v>)(zHIIju@Yhf#$uT3R2q9%uTF ze3%@1iEk0DzJyj%pq2I%6QU(J>E~_64Z%59*lW=OeuQ?$y_bTU`_#6pmZA=;4>l(g zzl@!%#|NcNipDzb!^6X(!gOvbiINgiLEr*ILvf9~48rH@$|X}3a^T8flk@WO)ce|h z>F+Y4Lo@dE>zxNQvyYBZLU^w2 z>xww<^B0q*`3$`}qO+2f_2g}XaZyA<2a!fs4w0)u2({Ck-<&8xKOJOzWqbXQV(}Q{ zYZvvpK+KkBy==WRNQXI0=1=##Ip7p=ZmZVk zc*%3~8NSn=8u9G>x2A6bnnvIi!8oNucv1Sf{O(VFhk5&Xe3m>*s)2H09+49__ZzH~ z|3Di2{hJHshJ(*9T^t@ETzrfo_F)sKrPPej>j>`HE=EZqwhga~E@btC%~6w7iNV4| z>I*Vu6Yq^?2D72|%5@l5ju9kbl2xw4QmYjON0Rly|j^r9f*B5h?`{u^q0n4r0$Um;P~$QVg}{+xo@P#93=oVp9t^ ziLoDNZ8=%Y9#AIiV^oll(b0S83iOon!cG&bXS21TWch^#rm`yh-@x(k%Rq&R6%c6G zr8I+7ESW#^J7intnv%Svq) zdukFbHONgFqLpT@{gV0piIRNzt1Ie`AT;k*OQk?Tx|^u7Vo#gB);`Lm-2L;6>PJEO zXcbOElVZx{AK~4^usR_d){xtWx`SjNl|v!Kgti%Sf0Qi($3p_UaO6XHGDFFbhCMox zt_z30I)|cDt6X{UDUaNCf5;2LmVX-;j@iQn!Xh6Z+v8GwJp1Z=AoTre&)Vy#)KdbJ zpq|Ylnchq^yyIiwE;;kU@A4zZkgSospC%>`U5Mk2LG7~6l^_YGJlo7L4XMi9nonls zZFy%9%m)zp4ph5v&)t^~1GP`5B2dLg^2De^`I=E)n>)9*LTI3<@$S7a?tT#=V)J11 zGLldt$ZbI+haf%|N|LjOOuY{?acr^{s~0$EjSA&^xv7}^sM@}E;~ZXwwgCRN52BV| z3lMgPC%DaL9ZPn_8>u@mieix^@0&UOm{ZzuqYvosTZ`v9KhP_uY5m#I_Ax z++n9I(ECyy&wzUwU`;$&+Av4AUi$|}YN|Ea(7s>X) z`LA{py^Ae~i63lESC13TEpf=n$Ob(aObqNlH|XtI!Ct!sSG*Het|q_0#>FA89`j|` zu->XEP3YZXPZ_FCPR`0o+@*H?IrF1tcZ~W8McF;uzLiLwq@T3m=uw&+`Ci(tOuPGu zepz7`zVrdo$cW^H!4+v5Cd5iVYo3kltgPDE;K9@MD5IQqHKKu5orvF&ACgwLtUFOY z=(uMb!K}dYLy0+F`7(tvx*LYNTxl{?o{f;|;r6Ph<2BcDfFv~w9g?Bqg(Kx>~O7Y+77t0>XWgg;L zVWOlslh&!}{n@uaLOoI7S39GTH8Zg;{>u0XJycdu!euARxsO^KY5$_z`3%#$n8BiN zFvKDq)csQ9>_|z&*!BUh9SL+KfL)R!cl{RDPK}nq{>7amjIGrEG1i_w64AZGODVbU z+k@hNC<=}r?#v1_K;^XRrd-7hnsQU3j~ta%X-y>Uy!;<9f#g8hpYbVfWukc$2~sE+4PH+<08>trlIn`V+Edtx5|w0Qo$9ets@QT$q=-a z?|JN{l4wH8xysCEMvGqI{|;FGLwg7D!AaG^B{x#n{qt74l0TY3q@f-2D~a=MJO$|= zQeOweC45M%S#*=y^M2fU9ND?XWIk=LsNVv{@RT#;Cd--f1w)#fPi(pZQ%24A0Bnwi zWc?hN=Yg+)d>Izr-t?{1Ys+}N!qky?YQ`(jRFs)m^ljDcJ!)^U92o7mp5F-*CLfwG z27v`83#pNYX&kFYJVm+vk`VaOd=u|Z!+ck}Y%Mw=xMml5d^-bez<+t(eCPadNlW}u z*O8Gf$)sOh&X~l?Jdy#p6&fgm-=OpE8ta)1qkVC0!}q}>IqSpX)2DCMvRD0oB9_E( zZ3*_}m2HQE*An3eAnCFtuOCo>iktM3bZWP@!)u`r+dIuYwujI;_fziUcN*^S`4M&QtFUx za#C`U@T51!-1s8Fje+Jgk3~!F^RSi{5y(&PTUl~ka>&&ZAIpOEvBs z_Uy`I4EX#Kfad=H4FI-`^K3J*@bYS?`5tc;iIN~IZ5ruAo(KqNtNCA3iiel5iwX;! zA2k1t*~2pDE3}4d<|IipKOzw;_5qUjt%n59(|rfBV76CPo=qmAI3{jg&bzbE9%m&c z=B8AeENlc~lizb-uXqU!S9vQuqHAF$f2)(==kvkc5wBF3T|Cnzd%(mO{c`yLb~(P` zodv>4u^FDr6oAd>=9Qc6{{F|(@O*fR_i<2F*1Y}Kz^%mj(1z0A{r&nLdm3AbGj5%y zz(UkhHr0}o1Btz|%}TOF>aAZ^vtRjoY~P65!B?e5He8y>`0j41p>kQ6pTdVWe(^>J zAy_!u9DaWPwC;`4vX5x5c0E?Mz^Aa@X$S{)FSq(Br&U**){R~$Nd#R?`=2# zRnc}bIDMHn;KbAT=_F>-3!|y>Do=KvxsC8uSW9ESN~cCA^+rPGpAWD$SNt!dM21-< zMP18S9wHfKZ1H}vk0M0uc*Fn^_V=SS&y;Ww4<%+qoRFcaPkHHzv?6+nEl;eUa zg}oK(C$1z-Q`^9~q#vgfvx z5l$>U_9yk=4W-TRyu#+b#Dhskmbteug}J8tHmpN5eits|&&2js9RHZU>Z-x=cb*cRk8#P=Wn5S zd}&i$G^O`I^R-wF<>Yfc{Ksx{v!5uUwiu$(t1(hM=BA>Pyb~7*V?z}Gk;K~!UWRSK zY9Bw4d?+n(=)ue>&i`ZyB*5#7=pPfP1E}CjgM4fd9k-B;!Dio!3MQ47Cj8g?5*+hg z?)Bc-?RGg@a}2JGgY~zZ!~q$X7U89-O8$O3GTJk^Lc=Yu9k#<=8p;eJ>ECyO!o3^j zBoJ)@I5fR56tGqj;3U4}KkR02qgANseeL&`&jxsdknA66`M2uY=5&g*mKh}KWG?nToFMoR|>q4nT%d+MSQ!7b}Arc1zS+d%F34v?B4H>Z}Makh5GiXv?h0T z`uWo#ChR(@i1|g1 zx(!l;;`kKgr44IQ@2fE-tBF=E9|8uc?89DrBS1|{!_9wpoKZ*G3(n=z@-E=J$l!lO zQFMPWh^V&{l}HXH0*Ni`5co)Txq=#ARH z$?i>LNG~mEWp>wxS{nQ$81H`{O|fyr(i;K>YT!$q2-yB0GEA4n3!KJpbDAoKVz9qN zR{CU-THxf-WLgNmw-nmG-|Ke{jx|`uhWnfwdM&IC-raxy{WhkF!QT1u&$Z2rkxs$t zD|P{a>-<>J4k?-0aotY5?R?iyT5-w@2;AVN6mkBjk+E)rPjEQ+pz~3KS{QX)ZBsyFZ?V?% ze5!fheIiLWC0tv&f9+sqAg#H>Q{FO>yYQ!o)LeS8GFus|Tk z-joZCFVq$=KYvY1aRgV8w(t{g%I*;FBF^ZQ4}znr{x#^`uzSOiwU8e=ZCKV zp^6ar<*rQ&PJZGP?=$hb7ea^1aVF;CCMia^w!A^<%DC&pUpY`do{>$e(y<2TMFtsMkM2p;NOm z#wX<f4cF2u( zb1pc(0*-$s0I~S(<*P(+W+gi}!t_KLj}b|rM4MiVpqCQnDUR)4cBOesB&lfokZS!d zlirGvrs)cN_8~kSA`@_V_{ZyfxUSB$v47bVc=fM+_MyZm2ge^(x#mcfw{hEHvfs+G z65d>Ked(yVc?Wi#Fd##|Y^{;j|HKO8B_6t<2H{nGddgTr{jtwALDX;G5}Dry*3P#_ z04qNg3#(*eenEv~y`w-wIRSMj_ACdXbf(cG^`ucj{wm@%9o5qZELmH3dG^PC2Kt$V zY^aBIg|~g^r{jV^Ak(hQA1W?!VtGC{#3T&wtZvy|oP{ zoPLCUY*ks1pAUKXkVZn_5fNAm*Zlm&ETIhz0Uwdu##fQ|Qh+ClMmVEE?b%es+TCM?BqAsV1LlS@7 zNA4m3B*>!|Z76S)Mj`~?imEFhTB;7cY-hb2-QwG<=E>{M05Ve%_9{pfIu-a{Iii?e zi62icWt@oPyrSn1%fyw)r zG_If1skZ`R++2N%I>~yuf$A`zoPE6{9d0`$IWSAZVRsKT)`t?NBV%R^4augT&4No2 z#PnYHN(cMKRFQ}7@rtt3+QA=K#c z^CX+jjFxB(TBKL{rEhqJ$5`4z7Vl(R=f=(@83u>Tj;nGE*tG8?W{SHwrKz)4qOx zbeEd%>8Sr$pt%(lFxQ#TuZ?AvZF`mKapvT%Qnko(q(@CmnP7YB6Wdn)aH{n^*B{Ux zdNkAgk*w(Iy`AQU;x18TFRtfqcvKZz*61bYDm$Z94)B`#=aqT7$CY)Et+6x=4&8hY z;Mm4)!=A>mC$O$F4QV`B*62GTfgu5Q693_kW!AS<6NgKgZV!bKlW>If7vplJ8Ln;>)O-m~$9!_W95bX=xLalaJbvVGTXfI$_ablN-PC$GOAzSHzW01$9@m znOA-7bhStygLLAOv$3%Yx6rd~75ykvt^Y#0+s-sp2^2tlC_mh-HdUW;e6~&@at~0o zAF{BpPzgC0{0R0vC6qGn)b6iK4GujC$QvG^D|pBxK^H4^9F}%FIP7q0=evqp-ochS z9UmUAi^?W?`1hR`$6D+lX#TpKWu^iFfG^{u`|(a|ukzZGWAul4+ky+q6k8Sx=79c0u~p8!6N&<>o~C7gNH^S%1%kp9NXEJ;GOW@|zRi z^!*<)F`39U1-K&1M~*F*$fIkL`z0k7obS>-lHD>El@Bk&Ozk8~HQxLtl^(jk@ABL0 zoSzFs!u`&F7Svrf*jENz$ZNoj$M9UHuLIp0WPWowve5 znf1cv_oV|5uqwTi-}p_n1#^UpHGd@%7!Q13isMk!On$EcmD>nZ=9*KF`Bczf;6&U0WZ+z2=@*4_C&cx1NKaKt^9dewN)?{!$& z{6$DMPP9>Yyjlu2$4NPO=Bj0lb9c3Ty>JBCxLuDbL^X?-G#NBml}?(vu8(NbAh(7b z-#A73Z8SRUAp2JRg>+HwjUzgkC5$0b^4D>o?;+#&qhqnqi+eLG;-F6I%W2KPq~b|N zX=KGC7}NfjJEQx%^dYZ13SHw8Gm28Mg5sS>IQWJf>7|8+ni%`@A;wzYv1=dns7)C4 zz{;-3^D{j-oX(L1%CmO+`v*j{l#$k10VDuzt_}HP^=%+7MvUA*`OF$a zJvOJq3224qddE%mSoqnAy}eJbJrie9+2DWeS(dMw`5xA69K6&M=vMx&_#3P0r;_}h z6mDUjP8^F?^j`Gj+k}K=UOu@W%D#*?d)vmSS{N#5Z=thjq4}h&*-Q5OiFZm;iuYt5 zz=sP@1+2NI_FGY#>kLeFb#|Uz(8*TxSTCa2*F_~IWY=p(_O6SXxWAhmK__q8l3o{K z&48A{V2$pGZR#xjYA;U1&6D0@H(qQU_V)Fe_35!1ZT z-2{;Dw7k-(i^c`+KHK7~X(FEDMCowem>I0n zfyO-Xfl&O;EM2cZ>8QiCF!VfZ$5YOZL?>C=;<$jS!nmxXHHQ~oE#4do;sZjT%gRjg zEbDt7Zs|Vog5LEarKH@jO}%~Z{;KIlYvVg1VE^dRqcmhW zJgD2zaHk&)L2+hD9skK4oLc>XZ8%oz3MAlB9Z@lF{KD-19pEUG(K63rMdmd=U}x#Y z+lGI*_1$X*tFW-V^S67H*GJ+a43n>d#e|83Gawu&#vBB4Ua=~4SzIS_i^Ki$nRZKg znG~XaOjv*Vyxv%L0GvBoU;$aL-}y!r42P}u{}h6Jkn7gS;G<{%$-X*O5Q`FemOF)2 z`IAnB_!o(7iqO-DDuMXzB%GbIZRBl#@eR^sClWTqRw{gQ)irK`C|!6c1$}vsWG&TF z=}EkY@^;C^c78g)=gi8wu-ELI!o{Vjqul!8SV$wd54i%4_UnFj;rX?^D;-B{#<#a< zHPrvIqo!B}{MOhS`_^V)f^`YQYf)hi`~{8uj@Z%rLM&-4@x0!>MQ*KxrXNZ+6O8s3 zZ|YasUy&MaYb0(fK6rv8>9AWrt|XaWkqUDJE1q%xPyTR&-JaTMy(_O3w;rm(g#L*} zPgwJlOcUK^L8~g?6}OEu*khyIWtRq)OxgXqyqK$YqA`*|gr^?MB|t#uaQ8cloBffe zp+o{&#KNTpNA%V_fO#-mveJdZ$KeG9U&P+1MNbcD-bxsj*vit+EpE9AIp%5cPt-ig z>H;-wa}k;=^BP^yjheGAv4ZrYERU9w|L18k)WIj*b0!~m6T9~KIiZHK+eV3|v)u)? z2Hj@TR1dS@0HNBiwgkx6#{j_)XFM3?@sX5xvi7@ikiaMLFamGAYoP3J;q|MHW;f&o z$)b16#--udwza6@tOK$^O~QtK4U)-ThxKgzE5$UVqG^wOI@pO5my!9y+r?EDlC@)b zdCvN);8!vZT|PO+h)ixKBr^Re1lf!^GzDqLLMKMT4Sw8a;S_9WOKH4LA|RnRq}~h7 zO|2r_QS{8tO;vifrefnA^? z;mg^!#%p{)bDw3&HLq#@9IX&`e`ruk+txd?gACd++_@Gj`^<9b_8b_r%8N=ce3>h_ZrYt=(;hf`v)W z^{BDbU9ofVa(CFM=&=Aa{b1_-q$wmfun#~-k;Q1frs=AT-;%CdPO@#IEj=R+0Gy3UMi z96wp&NkECV49=bXe@3Osl-u6kP}}*!0vEV;##HQXu!Mz0J@qjvHZrxuZNamUpKFR; zHXF0XA83X$fGRt)gE2ZP!E6SKfN!_1(Gfm{R)6B2*-53>ZvI%bxM-G@7R>;ao$Xy6 zb4P)_?P3B_^8PYdrh&c>kC*6vpevd7CeG!ekJ+=l(R(Wuo`Ll53F%(>A)ayf9A8t@ z8&5X1wWHn4cfOzS-@dmRS>9}%ALDx2+EFMJaFA>-ROo*KAQx%QG5Zr}w21pBmp)*T z$~(?G8JzmF9mvq%lFvQGZ`V%TE%ki0vibP9(N4-m{OrZyJ^*P{SY15q_oSqtleH?R zubb1?FFyGBvrlK?u8IPtH({v`P#^|tHDWM#(>mweZXMFB8mQb=XYdJ5`-O{5;E{TT z>w-D`JnYo57_fk}Ph)L$dV!vxvrD>t@lP@jGvDG%<%_3zBUcf{cCU^%=W6KH?fl?A zulK!VR98|&#obNT5f6@_Aj_<2aoB^EwEwXN;=a8kd>P`NUSrly+%^(2(-1>I^M~*n zFrrsJ5jMyatK*nD;);~#dNlb4&q9GcaO?yiQd6J7&d;{OH8Uqkg3|3%7cANimwT0XfN9QJDI zGA#XRFZdsvyl%Pc!T#VEh1))_;J%yt#!6FlUTcJ(T;oyCCW*Y;^1iES$7mp?%qwSE zDYGK^b7NY|BE4)OabGlZWIs+?svmEV=WFnp z?x?f)1si$&Iq12*VQ3Ou*8N_|Rr~LfzUpG3O8rNPv<)C-ReLc z3h2G%OH!4M%rOp&j(JX3E?c?S^3x}`X*9>UbJZWTH$HaDb{MCJ=b}%SC_mQ@-bhMG zw!rjB*m>{%xHPDBj&xQZ6*>8>&0i4+t!@f)T4P?sCcCDpuV=f3j+Vn>NR}^&_BiMY z+?K8g67LQ?`(1{ZujeFe%fU1TAi)NO?3y1 zc5+U=@~*waha8Lv$rCCQ8_^3y7N2V$jC@TpXHXkdrN#AQMoMQa{F25 zj-HWenW(XU3j`u}>o@RGbd)^7ukfC_HySQHetMgu0h@R_0mYqJ5@q_JHeo5KcsMj8 zj2_Q+yVR)D*o~5cf@Ve6@%;E_fEco@1R89(Wd!b+JA7l0SsNaoK^n6LBXfpuc>vn-6%nGUIAL#V)dAl|qNltgU zh!-zZXc>=W-hK%)20&XeD13WMpnU_)&ot@*a47}r$)i?nyyf3jN9j-vEbhKfn`i1~ zr-w(awW5W-js20I3{{htP>zaK2p=@{7q>=KUMSxee7miy|01XF*<{Il2urafB?Ff( zX7tFYCfR~)kUIyuo~vqUI7xEKU2JDJ;=;RNZul?YUcPB;OY;iRq&sC`Y`pS>^3()= zsX{kX=iMl%ur78G73qF@Z*-e_%4U%J*_&&$y~*KfWYjXQjvKyKS6x64=dBcO4_&&wBm|iIBR^$sJ+#z=L=8c!+$*K61 za~T5p^Wh5HTGGp-wKE<68a;f^s9JqMA?Lc7Y(=0@;ODcF&D{q4w4zV>ASkO*u}A&pYi&;2aZuykAus`dl}tXxqGWoPYx)G1-aF#7+?Q~ zP+vR~VD_T>*WNuqVW}p4M1HeqywRg^lJfM^(_Kl*X9)dU^$&7K&e%K2v)W9^FG zS1IP!!-@0q3XcV=QQPC4^pIvLOhm*FMOhh04Ftkct_Ar`LQW%g#>Qn*qlLVa(l=&R z{lzsYELGNZv1YxjslND|Ns|XtfIIEI)Z1X`ZJG2 zD~?1wS`9{Tz@{pD>T-(pvJ!1RsA7kA{^B+cpZV#mk?sQlakm)$?bWk{9sR<}1QtaB zk3(FIhVH-_43(=Sw6I;SVX&UpxYB4#L$pIv;a*plE5&U0oXp5E$#E=h8fh?2|NDbkK1G=hs6cque!YiWsV@&v}0 z9gh%xZk%+78;@V5fIY+L`R%((Mc7vZK3K~*$2;?6&8~J>M}J;lI75$|o4Zq^847AHpeN3xMOj$je52df|L zmm0HXi74ec{mgu@C;X(wQOuKbkn3nE>~qLyoiT}HX zOX3>KzeGWq{UwpiaJPRTqVIr#q$uJw@d0snk8>~qG-g;5z72TxNl(GYkKKFXhv;ve z9rjOdU;9A={i{hj+x>3s)1b}TQWxT$&t;)T&4y;F?k5lL9}vSWK=KZGGh%z7Z8InR z?L81-79;>N&1`PTHN+n9Z2nj6FDEyOi7k&^^zR%xVxYrz z`R%Q4Mk&YgtI$T1@xU>E5)Lnno$p1K$E4WMc47J3rkeKuQWDk3*HUnvo15p1#_2PQ zB#nyjT{PXvxNyS*u8(A2ZoJzBM`Q9b#``yl#}+Qqi@YPYvmbeyNbp=K6mdM#2o4>d z6PXSK;uW>3fKPbTI7cpJcRB^Win4QiT+6!B>mkW9@20y@2hov<887!`5w1A`_l_=4pMEtS?uo+5Jm2-+DG6CQo^o$Gy7CQx@(FEGze5+daS zw}a*kk+54K>I_Zrmhkt*V5FS?cRW0(@;iiGw$`m~rtS9F$XHIvR}v864Rf4#e8%)gXNT{cUd)||N8qZ6{@!g89 zcuru@>v0Wyumzj)9Z=a~^fm15H_WOHM}^>y=i9l=4(y`LFZv{NZUjo!k2T2}g}CJp zRsKqsTH;O`92_kEmRwX~T`(mIH&q$5aL0^Y6|dV>e>!S*Ea|PCP0^r5jryFOb42~D zY)n`el@N)HLo_Vpn-(}DG#CfaoeoOXa zuW`$ zfE1L%*$-wGZNF#|Gj38h1AD@)$?9T91Ob>B`33i>bMxUx6_<(6m-oDrsV2l|R$}Jf zdFlhY!hZnsc}!a_xaO(0R#y3G%6r?dkEno9V0hgusy4-Gx8*he&3xJ`=Lt}Aup8Y9 z&BrrK$Rv0#Ifu&%rKH)W46nq-5U;f*e*OCOy4B2;K8xs%$5`N`%;^6^5!R0S$r_#f zNtr1>65(G^;4=gL++^z;%}v5TSmt;>kymAXrPbO}=07y0(BuzFUdFCssws8katb@E zqoo>?jyIvoB7dqid@yWast$kr0n#M9;LnB#wajU|`-Ed7!zXu$dzLC^A>kpX}aN%d(_ZjZ3MnP zHP2OX)1ZAeNT+IWrBwL$6k%^rdQCidG2Kct?e28qOUl9s_Xf-#JOFj=4PkcyZ?k`E zGx~sTcZ1yh-XFC3UFlYm`e7CTYE>9;CYsZUC@BQs@cVuJjPxvr$`wddvL-Ws?75zMXS>MpS_aU`0{0N8o_w;K0u;NLaw zx*GQ|f$z<=mVf|p8tGQr7@*fvt=+@A-)TwCTZ)UP0Un^h_FdzzLvO?r??Z4-ZZa~L z#GUc}&(^73SJw`yv4c6pf17mE6F%M#Z)y6_?+A3n{}|=|%X4rAgkIsTs?hRGz`j!V zMNe{aa6EtUV)MUS_kY=9uQRs!%%DXCJATkeu|9Q=Yv~c6m#0{77^R`6qYWSyEkF~; zVnV)rs{Df4;a&{2yrRe>aZV$0@JCv@JF>RbZB4QbB!GeeHZ#siSqL;D|QdjTxr9tq9+!fA_d7=4L%j_gg5=|zA1W2&_wa%2MU$)E-z17(X&8bi}}pj05l_W zt_iKN%oNP|D$h+PI?qe?jZ8j%q|Tnz<$px4H&i2zOkTRt>Bu8tpg)4vEWVsd;Ka!^G3N_U1}?3=gu0Y zjyhywsfHVn-Er(_|5M%qG3PlO(>O(BDl}h1y{34lqtZMjA~{)%@Pk=lgR<|wiE#O`^Vhee4s#sp_z&5L+_P8 zo^azxV9|-zb!qWWG;Iw1W0G(@ykFa~1iA#pw|@OVv)#8Jx$2W1Z}}^jVp(Eh=t_&f z(FIY*eCE~}9WlGQ6whUH^IC8IZt(Nu^NVk%&DUlhLCe3+EYHZpgr|iyVy~x`&A(Tc z-xN6D#Nzl`>L)KbMRB}2L0d6PvFC%y^enQiDEjh$iaP7CsJixTKLQGfASDul4y_1C zOG`>4-Cfe%B}jLNlr#)T=fHqS3xc%7&{9Kp^R0Q__x*kI2gl(d=rFTq@3q%;pVwK2 zs(yg-^%S+e=evBz@4a^u{@V|!q3Ni**v+2gBScpYfSu z?Y3Oq@kwlZ`B(B}c|HQ$p3tDn=Xp|VZEh-Ut&T$YxfOt z+t~S#n4u@%`wAxEs>V~hf=!n*$T$F=WA3$3C<@)YKHdJMHlR4{@vK%TJvvtSLeRU|@nln(Rc3#7OG-+@7JRwfxAg+VVOQoBrBbQl&A>iP9>_xW)YUd>;!$)r}Jmvt6&j43>u7ZwsdzD(d5}W5nOXYB{_;x7yo*tSa{r z>QZ1tv^gmS4a54lmtg2puIw|_$n{N}l+)qU*0Vi0k~QQ_RhfcR+>oNHX~-YqmKT9n zGY0zK__#KcBpj~9_(qO~cqs-_?>6+&iHV7!8;LxlrIwB?rzX&_{lVb6_D$5(MZ(z7{vE#Gajb?*Q^Ox*57jFfw zjuT~Cx43h8zm$!Y0^-wGUw_f{_ft!qRgN0PJzL-5mDNNAt7CZ0-ju#yzw;x0x81ID z$CWCNk*6Ma2=_-LIk}SqoWp;usAs;>sw#Us-@Q zkS$YGAJz(GMWYHVfX=yB?`#gJA=}RFVkRdy*R~9;4}@Z) zjyca}QsUjDM~7DP^oNfAuTFt3C~p6AUJDU^Jo88?N)}w>hVU{qGlR}HD0h);wq?B( z<>Y6QMew!lC*+iDB=C&JOBeg@RN8;IB0SioV4iHuHp{S`ZRl8Xfliq$+%0zYG@Mvb zTu=ce;+pTpQty+uH`_=12jz;YGSaAt0ZK((-Es?mV1nM_`1}A*6}jq+DndsEs8(=R zYARET=KFrz)$chllU!uG?h2`-DB)XQtS2~#6SxTp2*Pv4LQAk5%*=wJDhXbYO&aoQ zX}Fr}@2r#&-7J<$hx?BnJf9wyvb2q{BZn9G-LT;r(y*&CLZa?rO=XBR*A>AM#c{bwz25qu< z5klpaTu<|nJ zm%(a-%WISgx3E4bSz|H2T{srRP@X9k0(a0aUNreRKIfYP!;Fl9KISN2FP-RMbj8a2 z$`P{RyLX$Q6k^2nE;}^mZ8?=Iy)ut0{8S=FHbyb#sG)_*(y8T?dI43B-je!7vdv%y za^u~97+HB+rsPdiMM)hLM4fVlJ9h*L6>o9rnzd`ppZxe~48dxBP*~DYA;}!4;Hv6W zPTp)9r}HsZF|c)5Gl1cdjd9RK4y!C^gdYSxQ<)dHpNR91Q;`Z_h;o}um8FbStDNMN~svj?K82?y+{~L9aqrUw%j#pJ)Gliq@*z$ueiE`1y)F2- znU?*jM-dvU( zd7;2+i<8?EN6#sL-ZO;vY3b&F3H)87orKr2S53C-ET*KrKu>I$Ii@_()@RS@zhpdY zU+30ouE>Af`Llk@!|P-_*H7`L<-X|W5kGZxb*K@(>&?0z%P2xK%UWW=1;>kyBfqH( zn_@)I*4@yr8fiu#p&nXHWLFxdDa^y6o;-1X@7a-6B}2Wx zewEU5=yS)avF8AH-(Ttg6TdLt;@7B3$Q*A=hzK0VehK`~5j7DH6|)3KHB&C>TtCcU zmaaP;lp~=wtfCy#B^FK{Hy~z&-bz#cU)8*SzI?n_5fXt)f8t*)tEYHVoK)16>tc^} z1$mTfwfC#U{yj^WY8(&}zBXkMO5>V(o3gWoCUTmKbkz5QaNEppUHI|RH}9u3$o6cc z!hA32)=O97oCS~S5|b8FY|XMN?x^jK9ut$Jwd*qvT!Kicw4tD=XmWyu%=YV_s4vBS zLvW3|ma%R83&hprcE-U}afvX5YM8Kp!g2mEhmbN`re3z@OJ-+A40}B$W zV&VE@4XYiZ{7Xtx@ATT}-50MoD3=8276fORvZ7HDsb2c!)7s&urFfo!S8jVldfsXn zo2W8*J0_fje8KgjU@QH+-oxLq;@?-D+qgGBu9IFR#V|7TPNV~gwZ8lfWW=!vf2y&vQ%x7fEV#RBI3TbdRa9C)mPN(3|aLqQ8_*h?-1N9FTsS^!!0$8h`0(- zmdrag_Az@(a0g&5#~mY&J4B%ZN0wIsXgS%x)$+(K*=b2n<7{nL@)5Q?{lxs7B zI>P?vHrrxdM<6;VJY^`QPMCzaDsFvEx@+Z;wPk&}8Yn=!p%+^y^wWKNHm7p3|E$l? zEX!SoR_Wv3-kD?|CfQ~wR6foyu;5Fdg=uqDNt4n|Wr;&H7TwN3s^l!hHmELuFzYlF z%VT1)c{X=&Fgx0hhPS9_1|mHXzo0(DmG$$z)LWDAsCPb``E%yq!G8At{Bvv>dd107 z@dS_lbLTe&c9_iH2`|1K+W4@VOAT7^sd)*>>~Kdt{TmDkhIoCU46npzV!%!eltrfkG68`Opdu z13#-TA9i?y&0Gy)?4CXV!JAgEYiGs-q{pSZ`0j)CJvc~{`SpbF`h%gjn98C4Cy@gO z0Nvj2qH(`JuAk}LqgoG7I(=xDTfd-)#rD)y{Y6av{V#!4WW^bAMT>6{xuNz2ErpIS z>;ueRu$cJnTl>m!eOfv!HrK$N>%aSDE@HIMSIQZIv(LcC_}>Ua42sMOD;VX#y;k{8 zrXeO%Mkyuxb({kSa+A70)RK6CFR!@g0F7coxL%d-(k(wWFrd!;d<`$?JEB~5a3h{k z1-?q5{F~6YT`8r)&#c{S1c+tUTnf3e=pUo0u^JOb<8X<$$W6j$XCK+R8y_F0ygSRN znLY@4acNI)ubULQEM;PMYtO6Tb^R427vwaRt4|jQz}Bd{sed{pZz(atyyGY}?mzIw zn=(C~Bnb(Ls);EYp5c=k%;n|kVW?q!%kDowJ9KdTbiyjT3F9o=_v$x|px%X|h=@oc zrM=FL3gGv4F74@PeyOUfE`B;V%ikC33XDGS@DE1*eH%D9z_%>wQVl)A9(|2oVsSXt)wjLfI~AwS z`iflQjouxG;whBFx|)B1#z-sHfXJrh@Oh`bm$_+)!Wp>rLCx_3v4Ta@JNrA-u|jC) zc5&4&kvQ&KHW6`ibyjxy{0UJcEX0dzJA=pkNZ5ey$L&Q5slY)!Q$$(w&Z>5%XAp9y zc0TRk9FE8|abGS;7R+H6HEzYWBWtz>zeS@Dl{IxrqhPJ&Eplu{Fx ze-G$055`O=415wRXiOytSnetQ_MH=fe*fCf9ihh>W_^<`s0=Aw#y< z!Ue!S?OrpwdFj@)?eZ(O;;j5izP1I!!#o}d`I%8HX=E7k1y={H^iCOT6R)+ZguIxN zERx3goKL0}|1eKS(U;#4t}q)OsW>@aI=5TK!0>{X@SF!^o2YwuQZq z-%~S?Y?dE5i=wgGOB(HanIwjBwq3-~CvU!MHuWbD`)n!Q9%Xgnf=~vp^->-e^oRG1 zpywm3!^p5aN0>KM;(*OB{hXn4&^*kZiTmjv$F(q0xo4y+4MCXml;z;>g7C_(n2-KS zPJK=+COgS&GQ43xqyzY>%C8-_oj2XRaZs`m9}#$Fd7e$kzgRSF9JXu5Pw_!@>3dy# zHR)`l_qyMO8+3kP+d+(I@@CERs70F#eu^O^*L|r7<3C(nl?8?3YnIZ_a%O`7Sdbo~ zdhq`1X?gi;u9NQ9y%y?MQ`V0S%~%X0WqOi~M&8;vtdyip$F>W)ZQA_u0VpB!Y~f3e zqXTR`qdQ|0bQxJ$HZv^$=l-Z5rX>=QkQP2gGPMZgRcUV)KWK+*#l)U&?;c3X*caS0 zv-IU?tP;^;$dcHB6 zCl<0f9R#<(le>2>UiXigiy5FVA+*QSYBs$V z$gMcSrvadn8^V=+;{d`w`I)X$Q3~$0Y4D~!>357L9pibpa@kAzB!@R4C=FW@hP^37 zti}gS&*;Fh>TnsR8PiE=46s%=nhVnmJkoCbCFyDc(rJc=Ku=7=5v(yEMx$qfwYYI#>p9V2N1oDU90TJxVlQHMYe8xSbuS`0H+o9N^!J%Y4(8qkFtJ1nOy zk;w!x^m5s80!U#V)Scr;xDg2m}D(QIAG}Z8G-$7@ew%=4u zS!Z|Bp(B8$P>GG|06xscGX_3K@uF7dfdGs4@}9(_GB$7dtcWNgQ9eBByBYme{TD%L z37;pv|M+zR_4fgePg-)XUkBKdcj+Y4wLm`Jd%laWRh zjk@apZ`DgwZE9x}bA#n8gi=Tr%htv`4VP=qEHzqLgpc zNcLtz1i{<6ia!Ded^UWvskxZC74tYQgjt!(A-wvTayUM!R)3L(wKSdpCT_T%GE=?4 zcoQvDV&H{Je0~q7?Zx-`g>eGURgi$IdNJYBmH4N!<##rmZ(fMyEml0JQsm2Dq7_JQ z@1TPXlS+@sOX)C{I~e9bpsv#fYwtoJC4Y18+9VgKZzARLr+>wf{2yIx`4p{wJ#|)% z*w~;{oW|yl=T~xH<$!qf(ci3{bY1YH&3~;K3N61$TEtTpV6l%s_aA3c!@#k}Tl$(`kWqDkIUHeRUXRVQ=Bg{p%KuN&u!y^>^mr+c&l#-bB z*0G8RyP7~V(YhP3s*AbPY92?k3wAvMI#}EYEPk;Bh&*ZP(dfyC95*W3p0zoT`2X-0Vt) z%&{cRh8%k7`Ic4LgV)kb9FjuA8;>8ti=fRUxW{boK^)Kg_gyo3x0)M3<;NZBrX~Ab zhArqh@TB=r-rm+YNiBpnr+AZW)NDLHQNX-qJP3gt!u`Ttq`C$Lp@BA%7MjkV>f|9-^9j}TZ1vI*OGx+21C~6ggKO-a!O9e;Zyj5_&HMq&z>V4Avkbh#8jddf$ zd&hX!Ua>(?|H=y*jVENHus-|=!j2RI@rO)$d(Pk_Lk>^S4TKf5JqchAUjCAX4W84? zr`F>a3-}9wzJK;cym?S#iXZo$ok$Gf@?ll9^($8K7X@YDV~`t!SmK<~D3(>VcC@W) zl#t}7prBj|*F;he`1C)*eu4_236n!MQszYZ$&#pPO2SC@;)yrCmIb>7xw=3WHL4{5 z|Eeu-&nu`TkIjazCh6VGJp~P!RvEE&AJv#ZWVk#5%>2oBm_QSwif4;MI0R}&PKJGA z6VuoOqAWU_hv9~iAu=j|@IM0H@+mA5UZar5r$8)aI*0$4|5W2^$wMh=g?%gs zPqQl7B%KesrWGq^!EQ%05-z}{G!q*d(cb7< zw#7a7P@~kpYj}Wi#AiKMU97i~=xAd+X5{ z3Z##skWZui$e!JV2eE>N2U?7`#|3TPZ_6L1^TiP}bs9D2Ehk*1C0i*SRV(GrmR3tY zWVzGVE-uoLk5N$}d`1mZ!V}M$NIuM^8D~$9^viHGGEiFI7cYhvlSvb7q<3Lffe@Ce|qHUc^)P=>2FtD2x0R;;ynf`?@;aio)Nk^5OGxhwxolhMuL; zfy8kE0C}R#dh@DwxtnDX(NP&8JgjOv>58HgmPjT?)upU-ywKFRARU3=jAj?+QbFBj z_EC&rxtsbY)6T2B38rZ->QAk2esIWqQhKdNS~L19POeaOYMiu?1TR$V8^bq?13bga z-srm`4i+clpZI5JmRvxsD0TfqF-F2y7Cq?;x=H-tQgyv23_=K?K z5cbNci1ML>!|(?dnXGP*M?B=a*=yaY)-o_b@wHgz(*PryU?R=<8$C+F*4)NLl6W}l z-6x10bzEi|MkIt-YWCcr9@qVsQ?zcL=jsnqU3RcfH||>{`DgK6ue;fDKCUZn)x`7& z4}(tl{t)qef`Oj08Bb6hQJK>>sfcJB+A2Ckd8i{~N9N7}+nTwGmz@v-h;$x^c>j~| z5ZT^u!5-Bk@lPq1TWj!MQL_*@5&RV-di+b4VS&o$zB_OV($djW)1_PD???pw8E=2C z>aQZwHI*maA{o13_b;itCX>JUC@+&lv#h*tdzxnBhYm}W>>PnL!4!4PU;CFwPh1tA zWbMX(WSg9JoLeM~tx7rZxuS-wq5O!WuTQu=(TfpuKM{|$!r!F2s&~>?>h3~}=h9#i z;)Bdmk6zT!|B5AAwEI3x-)-{I{4CN5CyN;te2D&p zbgnXo5TWgX`3fd$T|m{`Q^BHiRKXcXX!1%bNjn@#i^EQ+yZIu)nmHJs5cYp-5>%DJ z0E-fD_z&AzopNf&m^4+fcTzJEQ9g!PPGV7!9$__V)PnQjOXyypa8HbW3+YQR3p@fI z+p;)oHVJG$nVXI>X>~cX^KIk4Z^(_@oR~A{Yj21vH4~*OM0La0>2j;8K49CIKBsP_ zusQslH;$|EsxQWujezz>Ad0hKhq3sF^lesqqAN9?TghjYeH@-&8skHxI zYcRQ35&LR?JkY8YUiTp9+qd2*iTHC5_ zjmM)GGLt?H#KiQLzV9JD!Q9ANhAINa9|xuB1J1@g>WgJy%>F26a1DJR>rNgq8|m3- zYUr}?@r)26Kc*GGEARQ8Ro_k^Z>Sob<{;xG_C{{DbC@#kWGUtKJ{yFV(`euw)~3sH z9?fqCMBs>T9-EmaCrb_~vY{!iaZ<^FsMgUmMxb+1-@$o0NK=l2;B!7^k0EF^0J|Io z(*K%*u}6=ks%Lkqrfob{TC{!YYVgq6iGWi}9N;-`veL)kR*(08YA6sVPdkn@Y0hzS zQ7I`YRcXHd=9QL!Pw0o*ynta?dL5V0QNt)e?!Nc#o+K6999Y~3j6BItP1gr+~*%6EY<3;D6&pNOmkm0gOYlpjX8P)MGG$B>Ky?6rq`}rkh{JL0ez@z}JOKsRK z{+7e^KU+2!(a}bIHfc@MLC-?3pmsS%Bi5dDEXfj=i46G)a~dnkZJ;eH>BUFozfHh} zOYdYv{glxucT{UtgT#34lGnqw+Z*ardxfdWt=jSpOhK{z&l))I>9%*AfV$y7CM+8v zzn`Fw>ZAY~Y;_g?zRziZvI+!7={aj{MPFaRimPAt)fV;@prEQhk$$P;YOiC)70GsU z%1`kv<7Qe{;Ajpf+x5>HDb5b;y*wh;?%$O?_}rcrth_rf}>-_!Kg&lz^%V`=SkDqE#{YNJ+up4 zl;66orMwkwJ=>Y&_GlrU^mqI7Y~Ra%?(oaWf>eR}Re8E5-+t+#F6Ko4@I*{1r2d$| z{3adf=6_XR;J!ESHzqr%dDjXyGy+az8aaT|82jaa{p0cv?I4X^6%FxH9^M;SV-&SQ z*Exh0%BK1yrQf=j?b}t*oDT9>-+cW2Zu?I&5e2v1kBmxqxs0(g_2J=Rwd0Zb2^pyB zGy;jd;m>i)^V1eO&9>|JXWC!Ce#QSET-*B_@|n%(+JjC?A}u8~706mIh+~20Tr1pN zB+{MWJ9VFg1I}#Y-gl5lj~^^}PZ8I>#hZ2^BKw01Arzk&6N2ESOO*Z)gGLGL&?R!F6WM?9Z-N9XOu_Jd#74} zY#7Y>Ie@j!zr*nXplhTU{&L0|^dT!3($QLYo9;3C-1!SSuJ!eFKJw!D@n`L3p>U;K zshTo-oq3bsU-Gt-cWonF#v!8VB*rfCO^WM##mukbrC7>ZDa0|EnKpE*7QTEGpzuCh z1|ApIh-J@Ql#z!*YyZAv$o|Um5Uu0N_#?M&Qw4~0 zOymbkIvl&5fVNE%-Lql#lk}K(UHhM7ub&j>_);Yh60tdTVP*QRku{x8@sud*sYQ2u z#(bG!L!)Ht|M-o}Wyo!&4Nag{&~+E~N_;|QV4Lsd3mO{qY*2a)qR=av))@Pq_j!jAsX}kjzvY0clwB; zZ;!4$gU9LC7<8?N@*hKfxldihI^a+@83^3nv*B-|P{ZX)t2K*9bTU;NX``w}`e1SRk=e|q{zu#C0p$So zxIDrad4?FqA`&!<>NJrQ|5Kl;L$pf8pSeL^Y6&V@4lsQU;ez+zjs&B~uQV2E~>ago!vw!-g9aq8)QDx-HA z3Vtmmjq1XTupfkh+>%TpSnvONUv;}wrP7%GT^_@%zG7E+`i7lLZr!v=`^5LY9d(|e z(FT29Rq*W$)}M1Fx}rjZ$>X$YSfxe)o;0Z(%_ZzIe~*bdPK|#d{)s#MMP$$L2}P61 zPDOhfK7pR}BZRoPi+tn`<+pD$RTm!F5A1{N{F&#nD7oXpN|?08czOspO|UP2@FB}< zcNN`{Po6yCBe;(v+7Pf9Ei~VT=58fdvuY~mTIs3)YRb!Nhb~W)EM$2If}_*5;`yE# zS{Pv@N+X&?6f}}D-_m&s@3~dpEHq{`Z2dB9U1IaFdD~GHv|w{7vVQAxJ^etd@%A(K zl_A+I=);ESG}Kc@0dZ)IdEkgVTCic$O0kc$tZaO+ z;^5eurdzn8*>NAi;i50c-~8dWBu4Lo*B-_dNgC??2L3rojwbA8h}k{lN-F%k4s;b+ zmPp!eQ4+bfFa+%$cc9bVO~IWC_8&{jMWxb3I-Pi|>Ar0XSg-R$;%Qy4`ZRa*J^isY zj~?nG2!ZGWaLoJVEi!1%cgnQ?&deUU(!S89a!UF3b-dCkcW-aK5hY}1;o?_)O0wF| zI(~-Dqwmgr>1^`IwYRauS@~0Jb~Z(-8ab)P$S-?v3n-ommIoz1H1rYq3!R4b`pGJU z^WBdX3D{j<)ROJ#Q8ANyLDw&P2aTVHum+PdldgoqL=vfx77Cg7Z@sVyT$6XWBE8eY zHlU*W+)FLZ&0siui>}cAx%))>;4MJ{{|Mw1YmTr@`aNp^*be=>q?5mj=v}E-o1-G| z_y<~_EjuA>vD*KhDj`o;)^sCIq{d?Jp66O=FD-RmY9>or+~wti{`xJfAgtB2hp{&= zV~2d-(aV#vdbYR?lw^bqjF?C`Ij>3ObPxn^j657*P)zN=;T@xU#XjmKuf4jslh%5C{@c&@4@aGG zR#h#V0VUg)8z_p~e^dTd)5^D*S6#Kd_c`a+oM(Jaa$Dud6L&w|sp;B@!zkA|;RspN zpR3s9EX9f$d9hiJXT<$fiG@=f#((f{@&JvRu*+|(2@6qXtQO^JGxVIQ@)z~-0U70_ zcW!x}Ou`lYrt4xoNALb$W9NL%dR?c=dV48R|8)PO*Zm3u<0Eow7QaR-82(K5#xKs7 znG9(ccWG5N{xY}5oi(SXJ`6^HqxN;~eTq(u<=vzW$v|x8dHbQ*!~kjTGkrwp*QgKM z*vy0J`Rp#<-po~>QuNllBgtRSey&0`Ki2L^Wj6T6ywe!lE-|mwD8nA|*i}PfZ7>@$ zf;W|ltBCby@n&xDV1!&>PX#%DnR7by927P1Wj{HG%eOM?=uAzuo%kVLFTLd*?B*U7 z(|p9X|LE}ZFu4-x^}f>a1%A_rJH=6*=9IOhII$c!PY_SySN&S`bpN^QBrfrDf`4lO_O{Ytl_-utPD3G4aE?`c9FpL}j7B&A(Kwd$-) zD2ZUn=ScJz7My(c5Tui1KeixdK@qKW8P8Cx(6D7G&(GMFL)RPK6VLGI1L1MXE&g)f zMw?%WQAV4f0s0-B#06L6j6`vMz4GAUrUxao%=*})iNwXi(t+=@eu6l)&l&gGL{c~GXZ$qCu>Q?SCXpo#@2hetuHFR^obdqYr#XL=eN8#WM%qQ z*x!@^=3TBlP}HWjcg9MfW29t`-B@MI;h>h|+dNusK1mzy2AmJQU+kJ_nF+7&l9z5~Hf)k$TVB$eiQ_Xl zv&Wl~|5#u#Q2V3)2U-C!o}!{6dffT6sfj7so8SisJE6RDC5c=^QO-52$)*!OIhQ){ zOh5kVnLB!)1m62qMe7Z26D;skTXPi1a&%<*6`>R&=pzPlbW}NufsW2{-_Fi%vOj1S z2qO?Yyu4cL^H3<%Wrz>ihM~=Kk9<$~^~#NP?X(Mf0L4HFjZ6M6lgn9`zkjx-1*PU= zzbyY2y1ckIOBl{_Sj=mRCs-V5`|I9v^T>H3WI1O3E*w#waaQV%YycGv7e99C#EIG0 zRxq9GtTM2_&6#JE5(P(KkI|dvyHX688-vT?kRkXI$NpS|-qge{<(FIDha*-tWKj)znnYbQKMo-u>=Z*Lv2oo)z|4K^hB_4D-p8Cs?3Q63S1W{AKjy$E&A8w7k}g8-)88*Ykr)QHeHdQN{|Wmms3D=o zL;F5ZNu8Ns$m(m&Ux9J&@SxDxcSNKni20LMm2QMjG=xwqu9Z%EA9=CkOSdB3+zgHO zrpj>P3I$$8{k;MlVUK@=G~rbTpyQPE68Djsr#tGC0B?OhhOLJJ+C7hV`Ue{S z`!}1B1n_#_B@qeyo9R<>C@%0q#lYvHp`ponWk@3hd`du9Au+P%!TIl2fTl>1^owvC zTw35JOuRALf8F$-Yli>VCLQUG?#BXt2*4wGj-CT=W8XO@sLUfsfX6DuI#B)lx&OZA z{P!dNpPLM0#XWPQdhy>w47>cF5uW{fDE}F8*uNf8Wep8w>b$Z~^o0H}aL~(j=mCb# z^yS>IIP(E~T?zZESM=l*lF@|@OhL|*iH8g!3T#MQLp#;mH2HE`Oy1hT$euv=#4o!b zlKARlxJ=KvAiI;3({w&#A1N^~CM-C*+6KL2Q(1AjB*;S$ijR+LT`ho+oanRqy-K%i zRi9f?KqXZ^k|NKrxxHO-kvXWzY9 z1m)rT!v@at8I8>`<$b|=%$A&KqoHA8mNROg?3l-8_xH}iNQlEZ$+u3$e9{h zwQyL$&2%n{=_3d1kJ~(rFE#V^m~lmgpeARI5CXBs?SUG4BS4vxNw=jiL>)x^xH3<_ zF&dP;U?w?t>Zkr|Jp`EOxUBObMRqM*WzO$OqOc#w8!|m6na_PHp4=S#=#bWJKC2f46A}^%R#J|woesMr`xb!hM<4mx|r=y*;aC@BPYZbA%YgYs}d1$EwfG@j1_U@e)r# zR_KiNY>x7hMwq1e>td;99{pHY|y8ip?O!1<_2th@$aw>!Z z)!Cdhb(eKzN6;3ldJ&{XaU;6Lw~`(G91((!=ulkhK-U>R`{m=;wS;I5SzGgAk}_-?y5BjEVUWQS}ml|0q(D{dGD24{+#6+4%Y7#F5b z-wof`zJ$ayDl~EzL+hh4ZKM=F-Ud8lqT8oB8#k##=j+|pDyyZ|!|9|DMJc_BE~wA- zXfY}^m8|s~xf2J4M!MVT6S4n6yTPIOe6qYMnC!7PDB!xsMZ#ea7PV;I z@!S8*aRXz@0da@3F%d%bd7q>0RwhBX<)!`iBk--wWp7b=d3VIPc)-R6-Y4rXE?h&0 z@n-3v4?Zmv!mbCZ$hNKRv?{w5Pm{}`&gB?!wWZh{L9-zONESRsT)nORv#NP%NNz?$ zbj$VwA@zOT6M5HCXZZ*+Hs%|v{zmWzSkQV(K)SV0m1Ux!k-QeJZ8es)+25!M~_%kxrDa+DCmUqxu{?|!P+OumXOKU}tLuOW1XZrP6sgdGs>tp|h&t2W@27NS3>=K<06j1`B0`Tz zdd-EFXgRaE`Bgj|3})X{wj1{3Itx>7z4^FOrBdG_=b(`u&RIlAVk4vr9)r*;;Nv4Va|E*v`_n? zPuf|$ujBq22eRAlqkrn;9G8$BsxmI{97pW*!Qn{IyjW7VgL~nN?V!VzNO3n0s*-*U0(wYd&`)-(k&fl2ze~8J+9R4H=sLl#IJoL|$+$b-L@!+}%Af}X{6t*i3SjJK@>hPKZO22gLbfSy& zhQ822_bQbKNTN(L!*-NwedI1^zF|_=-NJD=xgDb^_t>9@>nzx~r;Gg2A^30R=R=ZD zEvjX@*ad~C5Vt>_p5TaXa?D4K%JZHFf zy7WjG-~r8-iT&WVaa_iT`=8$`{C3U%D%ejA#`}(PB zy0DnS%28xMQ18{%S>-WOWAy{r3bP%2E*>dj!~|o1(9_j#vZAc3gU2{EujAq${ze)&2$P6mElDhu7j@Q5eSFHoG zE-k%ru@jhTvd7iF{kcHI<>vHkOICJ|vmj<+hNfZt3n5NsS{Yt#1ysT&8}?b(;d1xN z<_8TOBSVh&zl4I7soIWm1t`0ClT7@c8+1VqIb zHM$F8xDbjKhOp4vUy5@kZgCZZrmPypvsGu~V)FEclae0W#XJJOrRb0X>q+W^)ED*@ z5^`B^8b(HGT~DKQM*t75N;vD>2W{`{luDFoRO-Es3-y4)SV2sAwaP-%^%4bnhv^KL06ifo4eFd{SQFMs=O9KftQ2n@`y|i4ddP;wt>W*Ar7Y_P9KJNAV`=Dl=`9^SgQDvpHDTJ@Y%1X!a z^srKP^7t_GgYv2nq_?YsW*v5(tTjONYzK-avMlC*k@P zr9D~1W*=5bjb9u$o*fD_KV&1Br;#0n8_K>eP!rGwJzG_Ne*T`Po?a#^?i7O7|N8D_ zWFw_H;0xoZ&xaK3zsAaZw$F&|HL5P^J!`m`RF{K&icuu}8OhJi-hKWf>gs?Pje@`K z!sl+qH?Kz&!8n&+wyUE4+w?O>3<1?V9|*v?cWXsDGS8Do7#Ye9|JaI3wGS@wn)_n; zx_LsKv95C%wI<0JB33aA7>)hIG_MREk>Ha*9m-86c(Wd-M0Z;eR=4Ys0M4|6r4EhaN?H=;ed>-j%-ohE)c(YIaNJoGrP zC0pi+(2YNIRBZ6pEj<45xE+A_p#E%=%}YYI=e|dxY%2Ln5dS06C2%Zs@_LuU4J z84>8}Olb^=5dYC3UgaDE0-Bqye@8X8{em(7C)x=0SBOr%^~-Xu5|gBE01&j&&p~%* z^d(d#%*TWI7()I3jc=5Go)AbAl+XV=OncmyOuzFaN0@4x@gzzA)njPDf}nD6&ddML zA>I3aj~*s)wdB7dL*N+%M#~-0OnpUPUFO&qRA;VDYnPD_{)SMcIwRo259gJ+I(rem zX+Eqf!R2F{nm2{*HZZ7_RBk@_48zUe?KPprPe!#{EBa z@Y$17zvvBe>EyVB!g(w_pygh~0a*AU^)6rSm(5ikH~EJSx|~kqIXb1-UBXON-x|$w zb)6e;Ht0sw)lPW|(3`rFTE1^J)r%gbUb?qclvC1a$028M!?$VL_2Jh`q6>OzwCf=| zYU``D%IVhL`(YKzIhkEVTsbdu8YNz$M%>__`0Xw^>7P?mtP&k!>&>-r{CRtRh#Mpf z>@U<5z3llz*Lq$)zV%>Ya$w5IxM zrz%QkLgH@TEH8(~OE-uzaI??)ALz9iyiL)$=_TUEGzWH;zg%Sz9Y*bfO^G!$jAYkS z2v^w96^hs@|P0TmLoKDis_V^uYQJ)b zhpM^O9#m-CkUv6_-@*y%MgAo9JDinwD3yP%(X-R{o#A6~o%J9txe#iDe6p|pGcJs1 zSR7l+^uea~xeZ&4_SHx3Kp{ts^uhTdJM8s1H42xzLB#r9r-ZD> zNgZOVdrcRcil3m{?GL4MWr7D619unwdl5T6rjGU)XWo>5-kSKhmD-i`ww6bnoR+xF z)D#%R^Gw#B6S07xLet7KRp`;WI6tZ{6a+3sQRZ3FzcxVJZzNtv23&*z%Ufj<_v4S# z^k&q9#lcU5J0DZmL~jty_9)gAp2N)uU0#oy@2gtfr7ZE=6TQTrp~Wf%23>)Udj|fx zEy!OrS))=?QfR(DONaBdZq(Z%m&9-G)*6#tZyhU!BO_f}FS6VmkQt5esd2IXsd16Q z2IsMb+A5pjOeug%ikjgqwQgoHHx zR%a#Mtrj{|xwq}yx8L1b{wd$95&dJ}b5{f+EklSUX-faz(h@&5DZJE+rsSI>6;03m z1xfruS~C{>+U`{npo2XL=z^Wo-x_PxBURmY>LYrsb+>yrANeKV~{N zm6%zeWOqwTN;|KXNsrRA{}opLn2?t5s@Yn5EzBz0i)cdn-X0b+I;;wsmoBx7+#NXY zDL2qB*+kGu1O-WsX@`93qPscS*Y?}haUQ@aRfkh^wE55J`)D2J3bzr;`5mn-dqymd z4TAlV+t!sO68qCUV_nevhyCMy%ZME>Gw#&%xTCYnKZ++mH+HaPyLx6PSH5q>-+cb; zbLF+Dd%lKrTG+PtvBa2ya>C~yx&-#JLP@ zH$#V-4qgnI=^6f;-@CP0{}hlvhSKg^sdelS6*6NZhdxUlHDs1cIS6S`IU<)wS79QZqK&Jk^tv&V zqf9q~-~7ZQMNdCvRraUd`g=px(5bKe1d!>q(?ZC(><;s{Ef2aqwsFt)i?<$}R)v<^ zrtE(Nwkp}K77Dx2y?09RL-}~BNiwPEQekOGA%;E*E-Po3VK5|M2A~ds$tmH1%$!M< z@N>t(i4*r<>9BaZj=DM`4z^;IIfMWk2qG)RcWmZ z?)IkF0;s@Yiac;nUcJqanabA-8!eLDpLOGpoHsyoo8KpL(N4=>>E6s81=l!%C*Cn6w;vCMu?CAbg(~8 z_KRGGweQD{S)lnUy+dljU3oW{K71aTFRnswkkfQ!+_R~d&8AVZM7fT*2S7XLKkPK7 zacy++S323@uUOE?710N-O3-}S`qUByHIeEz?+gnAyvsz4D|s^gY0g0Zk3SOcXJ>x= zQK4d1Vly~0dDreDFUHjYq4z#fI&t5dB-knMEG;VQ%Ap^2yk963oTI4mLiipm*5dFK z$U`aw^UC|LCx8A79VdFzl~OqNPDiHEg1P^fqq%Gr5AwV|qsB=k_}1?k!=O zia4`w)oS0tJ%Kw)39AkOp2URTk?lYL1X_3&EgKjxQ4cNh<%54{^i!%Lq}h|DK*IdM{&Z5b(}SZDd!?3a^5UdU3u-i_-v{Sjyn2^B?V7x@7lUD8 z6?X@6HJ=D%00*&R<{e9IL8GOic-|HuihLc;_g4R{M@;kfc%x8L*j*%&_VykNZ1u zHADT2vj|lu%eSejPV_NJv+PX?cVuENFULe|(cW6z@VSosec@SBVy!C2;^mWd@V3Qs zHLarU#ggk$Uv)g=YkGREHk^|a$HhyTm&TDwpuB^CScwAzC9c)1n2bMr-Q%;;Evq(WT)|eSG2`s}5$g~r6h0XX!G2LK zI2LLz*TbY-MVZ7jGPoN-CBRR*c2}Kb^;Fn3R8rpX^;syYOeXAeQ#^a;bV7owwliII zoXH3|NXn@`Pok&4x7ZeR3WdQ+X0P!I=>se+nTv{wx1e=+iEF>?b zcd8jbYu60jOC^JdE9q&4^#nenLl>6Oa{WB1q*|XgeO%n(op_%0IP1-u zrC4Y6p_^a#dNMVPjkIc@sBVw)n=1OqE>eIB0U|4Y1#`50tt%p8xRhvl&SMbTnB3g1JnvRRA2ow&6 zGJvA#KmaLIkUB6lqIy<+4W=kiC4sm0pVN!-T~uvGV2}jy0D(S%(qm-O=e_fyak`vk zKkZ};i8Zfo(JRl$uyJa&&I6}>GGX#`mubL$a`kMb}RoUIwuOGxfee<1>KYwJ?Xt9oPJ9=XFOa&801`pJFd!s|`T? zusF}|_9K};VtwDag~3>nN8+Lj3kmk!w^31_WI2EB!577y1N7S5_5Pe5UNZ15q;bJR zyYaa)GF?3M@E)Vif^o;Vc=#QK{v$PqDz|F!acx#!hL3qs<+uSe%T~R5(Vw;*eKi-0?7Fnb0lw? zruz*J$2oeSlxV8_g$CP9&x1`G#Y(E%PC2%;>yUTzYDrDqMaDtAE)%2a&yAwBk+c?-3{IZNY>>9_<@ z!$UFhOW2~rm^*_Jf#{=y{`;=9h*8g}SbuI#tqPwx+~xg*8V03-k78=&Z`yZSCsoZ$u8^utIRSiEND9d&H zKZNs93i#kD<^A$wrU@4qbOiSMF@J$VpTB7RH?#}+(W~AtjMpiF7`acYYG23aya4D; z;IR;>5Fk;IOyAT5ct~lt zsqm4)%iA0wKI(Z6b2jt0EA4R|n9f|L#MZiczaVz{_2Ai@~|G2&)O$!Z>LE(!YpOA~KeRaQ6@3qCuYCD-8keN5cX0dM4eFsSPgMuyG+}6^Pv$8z3cvAC%AEe7<@4#X=qGC z|4dOr21P)Cn(I#az_)@tO`Q;ufTtblaiJ5hg}AIW28-FZo+v8p$vh$)T*Wd$NSn{I`$^L%r2L7$j>#q3hdP!3e)lCyiVSRh;EB7&3VNm=tbE&G}-xmi9SZ z`*8GAU7gs~H=Yks~UN|!9D7S)f)>z^_=8ex3OUYHHs!!eT+HX zj)0aa+`{j!6|FHn;Y;59&%2eT%H?QDalVq=*M%`v-|Rh2VoO}|>x)t z)I{ub9{D88Ipx}ISc-^6B_;g4{RaO1RkpOR+cb#9`7Tu(IYx-X6%trRq zzJr`!>6tnbTs7n0Pt-4L{}jUveOpf@S9iZ)oW8Ru!V5}2$Hu_pX=?K7wHOr_7uO2( zpX|W@7G{M=>9jZ0e8OWGZpaXhOGybEspCx^HG@!Fac$yAT77OkTH`N(_pFQZ2<|P{ zV_j{fC5~w??j(@~s3FY+?NL{MG(+v=+p!k)jcsZ16+B}%uR8dZUZ2?OD9dCAUNmnLq$sswEsfX=Fb4?qr3OLuUgjF^= zLK-E-_!>sj*wEWlk=|{C6)$Uh1TlV8$>^+(}UGw)OVFnWk-JSndhMoM3a)YIt z`~iq{gX?I9oz!E7!B{#;fcAH&*!2aHzdb&Fe`q&0z**gs8vdk%w`?7NB^-UUgp#h zyd-@{6xa=5=#7v^D`Vvimyg=7n@`YM%PGaZb|-DBn{hT6~y;*x>RY!eI>s$9OsvH<{Bg zJ+Ut4zwX{?WsF!Jy*G97g|qlQ!Bn(3c6^MOOudXx@nM>X`WpY; z^_cLE&4p%DWjn`dde+ztMMu*`lKVNwLm?gu7i5UwQH>ReuPAcR&S<@)TYc)a{HS>9 zVbzx%xxe3#-4h-7q^J>tV5Urwmcy%P*%2>*aw*mZk9XSk>qVK!E^U1)F2S(G099$e z0fcN7O+5lJBuYDgm1M&FBs(9B!p4|+DFyC(J%}eHZ4DAbTe<`xvB%KlAUOWSW3S0$f zRq@9+0L@)sbs`gOy)yZeeTV&Iih)a&~(5fM{Opw#3u-3-}Ussh0RozS+FlMYzwJ8 z9YnxU>cnECE4rvr=nDr;GGL_8*-gBQ!gY2vmhx|CW_Vr_;c)NgyW2XGN@v(e>4jx? zy(u6acUuDmLJ(a=y5T%6xcxHmK~~Sc{>ko9ds4G%2Q&V`(?0Vhc;A zWg2Yp-!p=Ebam#|)9n*Mbln*YY4l|mLyOD-a1lJ_&3^BBHnE+yAomy8JW z$c-}reKzi)JAs;vXyCbSbPkSLQ^M`9Au&r7kG}p0b}#~OGfY4E99;|eW6$Z3Ebp-J zA(ox;k0u#CB8RIj)DbXtw^dACYro|pB7Hp%xbBF@B|R@MMd}BM*%hb$wmoG=M0XpG zc=Y%-hjQbV@)7)czT*4&K<#vXW~7H~7oAobr02yB8_k{_Ap8YZKv+}1&sX=%O0sm8 zgEW#O;Phr$>$lF-huLA=ggfrYF+NHaUEP5G%=?xG`_93sm~KFPbbJs|X*eBfNWxo1 z-L7}b#x8_>5lJl!rq||CNDjvlyZ@_nwxSq$WuwDg0z_k&*hGd? z+qRz+6?*~EQdp2I%y2d7xo*rHqEsAFuc4EJM{y@ORR2u+@?&3ojEE6Q``ryfIENUqzIVOmVRt#eobj*S zy+lQo{QjpVqRO`4PL6~hEU(j|;zenmDTLzaSk1Ej2Vf=TGDIdWGV}5a46cV}LgzMW zI#gmZAL(@En1xnI%VPUIxNUbp#K>**ffaUezv@v6ND8UaJQ}S;t)D+~Nzo_1QD;K9 zK^Vn0hpsJP6W!G^bI#L%jO;sed>y3QpngFWTDRRVUo_jm3ewO+*Z}%7`L>vvmAh+nwdV<= zX8jxhpSo757t^(2bkPirkFT54+77q9wPcNk*lFY8FZ7rCy1p){dv-A}Azk#WRKpCd z;cz($;N6RF0Jc}X@*`ByLvd3Xug=%UY2xR6p57TiORAOWuj$yXHP8w&;|ebN{!uoz z^=n-#{ur&11n8-=2UnG;dIFb~+^-UrxC)RCfT;K&%C#qjjgu^@;+dTOExkAe16Ez9 z5g`AV_zHyhO7GCN*SL5v00x#=f!v45FkZ0N#EvV)SPqZJ4-}!KbDQnVA#+O4W^^d; z1kR(kZ?dH$`2ZycS-1eKVo~$K2w2&9KT2aApj)wWi}UFfMJr+R-y^2*-uQ^q8BXsm zB_`9EW;(4Lv9dY4EuV1``PRMwft5F@ra%#TK0X1>Yhx>3GY#}VPV;7OK4q?eWEaxS z@3S*&+mO|{6Yn3~uVY>>l@uQVsbn7gL8A4fDl#|r znH(N3X!>@7pM&_p*R5i&q*lw8aJ=6OFl0B8erY@X?H-U+XC99> zvK}9yXO-+988>u&rSlqHThUkT|Ir3 z#9dr4y6SZF$?uS*8cq!eFA@D{JE;Q~mqVVn)>@Q%{;Iov%@yz4=DcK3&K556%%q)5 zXDi>;1@2PljR8Js@E^yt$;ukB%4b;*R?0OiLE7&HQTDo~-b(ECvKw9z)0PA0^3JIh!iO?Vpdx`ytfFRVS&L?OUzOc;ePD38z-nC z3Lfz(7M%}A=Y9=$pPj2pNO>~L+*Yklh6t;Zh2|e5igKep%BT`ALhdi_?|GkB2o+do zr3sh**ls0quW6iRU@c*_o{(iV|8vFZ6G;LDWEGFw&g@)54$5hX<=l}FC^sfv>{q_@ z$8+gmRMW3Bx>*YSl$By$ohU-`hC5x!z{Q?CQ>bOpgVjkO6-xkdp*pRNqjvt#T7-Ew z92m$(O zrwNbCCpscModU88$eK~%KVtR=6Y2DzE+9NuR#s(&xtUoBEz1f%XOkX~(X8laFg4OB zMrjE@oYyr{DSVsyb1kxG8#9mbiM`!k)VOc>`;I!ZTf1NsTCYYMX+Lju^iI8HX;0^%LceB1*i`%WTqa%dppVO~=yr-y8~&o`0%&jHMFHKqFkg z)$W#M)y3Zt{Uc3-jH_a8eQDC(i+!C)+&PkzpVGKnEk0E>$-tZ%(d6=$eGs9oL3%Q3 zANl>v1V5U}po>ps^pkvK_o--S_{yk7w4lqwnBAK5uGO+nhA%aC&tP~bbMBil zzU*~_e9HaHrMmI)aCmT!Q7fztH9pf|H!KN|M{;~s=*djhj~FiK%XQ;k!eA2mM{;pA zn5=X3#;$8Nz}2>zMsJuc*g98=axQaSzktKdqh4tmq$gZu>LX3^l$UdloqWn1Kom(y+3g#j`bP zaI{XVFWBE%-U8e*7hYK(XydK_TtTEO2P_cqLv(kN_|Em!qhRkbh8#(l7C2V z{+VA_?TG-^9 zxtEsexSl7#7wdicipOZ-Zz2bEZ62HGenP8zz`LNyCqx^wLl52f?7bBVXRcrsTMEEMR)1ShDTVkf2af;l;Mt;~# z?ltp$qlOn|WYu2e<2Fc%pgg5vBaF%MC} zlSBckBsU93QQtmlESGs{3ve1=iNBjFk58`}+pLZ--fD zXi5YIqO>(Od{zbll~+qRSwSa&RDpp_<-BOW+`_3 zRc{L6B+XXn%Sa;9EoUYW93vx+fZU{3)g}iZ!~CNF=Ymeb)Cey}>~R5ZnI|UFxq=>3 z@x%-Tusq#!X&!nqIR?)EsRoTB<0z?tC-(M|OqHVbzqni_+#SC}g-0S?bbM6@;=)QV zHCxWz-5v!(bC22{tALr+$R=xYE{`uybDagIWa;88r8fr#EnE*?K36~3)a1%B*Z_(F zZIhA4C@;{egt`BcToN&AZw0jG(bQIbRWAikktZD1j=j|%05)z@MEO}${Q&Ja)KJsF0aM(eOWx>Qp`ByqjqcAT$00mq$skf4M{^! z#8g@D5jD7R*!C)?^Ou27IOY-IClAy7234Ru;fKws*t>xU=gNtop27td94eC><(RjC7Nc>hok3knMEP2$5NBOD@SaXFZnJ|`-B zf1sfl#WpcD6Ux1>UO1d@d3Z7c#$q6Z0Hp+TL(BdOqz(7%A)ttJI=TyBLzGp0b6-3w z63+~XgkzEQIW5=P`33|d> zlu3LDbkukEW>G2H$4>7W*Uys(Nu8q*b_8Sft%FbZ_0g!Zdw+j7ve61^a^(8`qsLYJY=py7f|DIP=xx zcP8%Y?3)$Za7JgWCv-&Sni_oG>`QzFEd4-6`CrlUbF>Ud7Lv)r(pfy|ANykl{Z;;$ zj<~m^!@dx+wM_`W#zWC6x`#N0HbNDFgL^^o-)if13JyR{FYA?Q5#{==)}Uzth9veUAr@Hz59ND@3v!pAp7dFLl~~t)Mn>Fy?OT;5pj9%0_(Wx z7l2Pt0JY(9X~{qQf_F`<_qTeCIuo{A(SwA2`xF-U6Xy=E^#*P}N{EI4Hd7*ggUy*` z@K#y9l<+!8Qo!ik>&Ut=!pU9T^mznk%W`eq~Td;@!UQaJG~j?!TQ2NM!!Il=Z&KDl{MDN}^S*>ANx@RIR!oV`7{oZqJ2w#LJj$7u z@r_;E4&`!&vHP3aSYI_FyUE*e*zleJc10?@z%u<6^HgkLQm3OOKTWwWf^b}Nq$p|T zCfjZ4lD8#*XJ5qyurr$`pM;el!vp!TLzoRf4a7~o6)C4lMkx!6{p>-#p%G*~4kB1g{0q)hwszMVGRwTGGt2Rb56Gr=t7s-?~ zZ5pd7d8nu!cHYX%yF0ySdvq@2zBca+6IV#{Kd+CQQ&?{L#G+n!&DeTyY+jUB$XXKC zsTCc`(Cq0w?zaR03~dEVQIwsE(M`tE5h~QGlX9phu{A!0_f$@#KXA;*)2;}VD76F$KL2mzkls_&%iv)%~y{FGoNJSyzr;u31}%+Y&+c(%#iz@r#uliO1DDJHI-< z*p`}h(iT}8AesuEBohWe!iynp!&q1u))2SVO;0uF{oZc_A3upn;0=N_@X0(+%ZVzL zENk-sDyu|vHcp2(LRQp8n!cgQHt-o$!H#u!3y93I%Rl&Qt`^THB0BN z!mI6{{sqn5hy#Men45BUtM_br*ZP#1+NvQ4KbF7e_mK=_8@8ixqQ7y0%Ec$wEq?!` zFI?r7dx|V=`f^62lFv+|tb&B>KTuQ%i{aw`vF=#TpckEIBUxbP5-_VmO;U*Nryi5* z0?``9Wg2ugw!CmZ@$wOuaXa?Vz zVB2bUp08JvKy4+;!Z3pQ*Uw4a7(J-Zksnq=!iQv%*@zykJpz|R+wI_;o$Fhm1n{pw zHY0+D2Eqlk=aV<3dlRh8XgC>3fm8iGfN;dkQ!<;y#ug)QFBMGJ(_gxOulfmt2t{fu zh11%G!WZI1{WZCYY%&4gp7Y@q^G0%*53ZZgOzjBr+!I$ zNdJ2)jzo4e@I2SI+e^oH6@Vb;%V5AL$=W!wlY@Dc`-_%6o28A-2T2IupY1K0StG~Q z(*1B*StWo<)g#>vsek#bCI{MVFI`Q!6j27K=tX#!0$*!My%>AjMbevSl;_=d4@JHw zLeBo!sNv?*6n|N61Pwe~=%6kR1BFIJK-!4Eidn2LR9ZWpn2ePvK1w=*qUvwTu6!E_ z%?=-GBWuaxtevF1r&ukKLm;)!`KA^YHmWBTvEkAz-=-5x@~j9mRAp*@OHU0>W?+_5 z%qy8p;qj2Okf+9#tARa@V6d0EgqnH2eY&p^0rz^p5g+qm3F_)uCW3)3wDNhCYc>s+G#?dUbCm|>=H|H%(-6k%DeXrr&;DP9y>(QS zYukst5ycG>n{H9MhLkQPgrU1Tm6UE!329{L5CjJ4?v`$lZjcy|mTvg2aX;_#zTaBk zTAzQhmJANJb6@v$o#*j8&Na$gvR{Nz`m%Ji;p#G1W0Zty-pt!6<*f@*lwFWe9a;Sd z+1}tYUO~dq}(qRTdufr)8hI1(gksq$>ga=9N9#Qs&$8jItaSE5;_+B`_}izGV1xc_`{=SkOLg; zjt|*#)?Z&=&y!OEjFlI4i=y0jO@6)aQRH*5_m`EvOsjynDxmHk9(L0h-Y1L|eXMb? z)I6EPFj8lTyF;k1q7u_EI56PcwfN}h_OM&4sh~xt8s>Xz#w5FgNg~Ghh{j2tqy%zw zxBao&U(kQ;=(68x>+*6t!_i92XyuNS7{n=ejxN3AM8&smW2OsR?L|tH+rxUX!JsI z-L4YX#Wa~EjbdiE$5>q>GvvNTr|0eSJKMjq#9p)SczI4QJ5wLwfGy51a0vdrh|N-T zW=SEo@R7#L-3L?K>X(!O+hVDEkRi%t1E#AJ!MCP^)F-yvTGdRy)g8O5=yz=+ zbwQgpIP2b{iw7N*Pryfg6uEH%h7^nx5EAlfY=%af3Hl{HGP`n7dazN_80O8dK5RGZ z^dq=Sk4!`B8?EV>E0aMIY3Pa;=@GXmEyePasDBFrUixn%QSZm`$_%>P73@AusKQ*gFm8;=3xg4gKC%06AlH5-Q z^9n4EB?P!l`jR^E<|S|m@NLYr$fxIfgKtL`9=tjMp6QcKXVT$KuJM50*+9i3<}D}u ziSsQ=s%&@Eu9@aypCc|m9oi$De`^bQdVTz5U3w`Kr})ujl3fq8M~(F|Y0Wilf50ke zu2VErd0(VvKrW+0Nu&;oZt6|^FNPg5c$=C zx~iJf6A^s@tiofwVaQ4pfJ4N6)3Le(s7h@)j4fVq!PZRQ4PVr0qLxUQ+&^4fp)BRI zFzFhmC(sh&uqe}^v!GT#i$2_s1~;_$+xsSyG{@0N{#5QfiL(UMgOa!1pKZB^Q-27e z(_j#gvbb%nduVjKCC(h~w(IhwgE+A&poQn?xN}Dg#@_>aDu0V9#H)WvRVcyMMwR=5 zN^I~Wt^lilOcb&N4dR-1uveEx_HVvx^mKEhXFr`58xI+X4()N2fM{3M;~tdSzl37t z?nIk+lb~9s&WOH5K6EZ=H$(M7sYFX&;iXu5QR>Gx@G}}D5V(C4;BTb=$wEk1kya_D zXaCIxQxszXDC?EF5zKQU%Ri!m*v)F%@{BKJ&ew;y@g^?HYg-Dfc1Uo8M5fAINCY55 z3^wHN+Tb21E0-utoxQdk8J|C~jgX&wveOlAq+jjY30|=n}A09YkHcm#488 zevTrx&)PA6|D^eLob%sMH4&$^1)xFX$)}JP*uv2gmF+1`+!R@v#}mEo5XL6SU18I- zr^Ux-3A2Z@(UBKjCl5EtwiyF5y`zpy1@`%qZFajHw%>Kfl#?u_E^Zr)8X)Sp({;2z z$*n)jG_S!|b_uDI9K1z%w7k%?Zs`-aGyK6^4i4)jY5kH#ko5TY>%Es6=Bzyi$7&xa z9ZKGO_gsVUdp#S`noUS&>@jFVH(;8RRHy#dN>C5hecnrYhTkn0UDgEZHS&*s|E>?M zyGhb^{Q`CXEXcpF5z~%p+;4ByGMP+I$Yyj%B8v7Nj?kL!tKDbrG3Ww_l|m=2>3)2C z$I43;C^}h6iZSOvsi|M){87IN_{qM6$TxHgVdSu$a!^Hg=yAb*7Y+Qgp z`DvM^BkHmWFf@30I8p~+6_Rvxt=ge2+?F2)kKfzd1Pd%!8Eg+J|ZVAIEK5{azl84{H z6({>S+9?97cc@>LFi9*v$x0yMp?rzex7@zww89lgZYDe@=qA*St^_-A&@ z;Sp6Y))5HGDK7dsmSwP+*ya)v(u+6SI$Lo5`BxMPmt=8nR4lgyZI6_qjjU32NNE3! zEavX?^yDO4I3qawCW;*1FY1MBKdU`JuI9#FG+9&e=JHCt@CVr?p>W!wTmi{eKdq zUgx(DMvtNSy&+yH>5^O1jbVtSk4(ylq!2;2tgEbx=zxz>?A{+316X?qY*_;&MHo!G zL$<^bRK4C*J|SS2peEyvph ze%ups`}yG$m6U|Uj=hP|j!-rW;E}=jWs65XMuI72S6Qzh1a^ zJT>W?{!gTt=efBS1^?%0IrdgOAnEe>-6FOUntK;4;(eY!@4j$mGdOCmrKxH9^4<@a zu9!TwK<~ALftL0soOJvU1hS)ME7{w9afBnPCJ6MMPvh(E+oUJXaut(50&N$uqoZR% z*5#>Pt@q+xd5?Ts#erttwH4r@a*suOu%-8e_b zmU#VIa0vv(=g!O~%}#no_zxv_7i)2p#E#Rv20b^^B2nh9i zu_@vc3%41*nd^X+g5%lhoil8dJ$pxOK+a}Nn#zB*X;Qp@)`-gY@6zTwqJ&cI%qi%l zBRjb&k>7Qy(TaQPy+mO}y(Ej7`?67Q73*~HbA7_r@YBb&nH!|Bjr#Y6b~a6xrn(B`pD?wmM&$^*OZKJXogBnY7f|e= zo_S+D-U0Q=>X5#Fy&QoO`WZXlM97cr3Ql(bR_I}e;tfL9--LA|?(dWj^v4B!tNhK6 zlrBcjZ(bFu@0FNK!?HU&B{ec-x{#73C_cl_xLBq3TmQRQ*>+({-wE2Lo7E@Ujc3xE zWY1ck8PVyDoz{c+I5Ll|PMzy;zsa&xyYWWtrAPUn=MS^n$D>6qg@ z;2s=;m9z}qbP9XqoXQ->l{`)n#N|Jal_q_+x3?#p``jk>u1cD%+A6AD_)2{4-E|Sw zz43ve4ps2o#EhlfDR)V*ex)KtmXG!nG<>;{UOeW3CEf4qpSlMbxFi{;P z%L&Af_;!af7^iCPgE;x5U1+iPWpklzFLgPkkb4dDb-aeg_7#YS7gk))tSK2I`Xvm4 z3FmO~@V`-hvKVMpn{QZOeWDk?SHZQs_-ix6JBs;JR$1BKVy!N2xdZ#P1Br#Vuyr+= z-H*p3UEhXMC+(h>LYJe2OWY3C6w*d@~1Drb;L^uc?-*+@hCiv4%`GxS9D7c263 z?D~BIFEVXn0O03ZTYU! z3MqN^_g7O!1%V1MvU-TY#?hD0lB^(lzgU}XalK`whleGCZW~@&A!KsnQB5@#zomEv zh4F<#+oy>}rk&5fxaZv_YVOo7zlB~jbQtJX(1|fOktPTzUz?O?b%-$ostvT=lMg4S zi%O{}R-3)g-Ik6Nbg}m{^de`ayUY+GI@>3r`>)?Z=e+rH*8q|-;p^bth|u!UiWG5* zxXNX3t+ME2XH;%2LxQY6r>Ei*Q&Z2GIB|g_fn#iZT)j596FxbxSihMTgv_#AlOrkD zZ5RP5DQR9wiE4BV+do^pTg#;xbHns|iH)D_6)!FJq3IG&8Rp~n^d zT;>>iJPLu6EBi-uTF7htYXitjFm^~IJ| zM(tdbxA^9z*M8Fujx`m+Z;Xr4Ab)J`>?+kEf0{XwZ~2OR+P%IIe(I)24xK>;(i<2v zbkEo~=Z25}`?2GDEe~Va>W`TU6tbkxmy~Z{$jO3n@+4AFmhNXq*Cz{ht`-b3HdXPx z-rZ2Ti{N>b`Z+c(D1fJHxEjQv0I!;@J8+^Id|>aRWZ;Eg#A)igZ@YytIZ*x?RS9mp z%mKlTKgH%jQ*B_>i9(fbU>F}C*Kn_jP;#a^o}!#1r#V}%&_d$DpW@9#(7k2D2kJNv zuY#h@>S~sV0iUz&sp35Wk0^~H?r4)7^~`3^K1lD%e`{tgh4_**0Opuj_8`` zOfnT=?4;uth;B6DclH&s?Z}!#e+{h(fwt2>A6$6){|VSvY!@`O*xKc)-o+5ipIy`|8%M0v(=#5zlezVDw^&>mc>{dD`!sa9)5ISw4w+PE@-+%KuD`*jh z3~N1_-EY1w2imgD;wkgy&gQo}jx8SE@>(@CqBrYuJh!j)PHqo+eVaWIk?xKlCR-rq z?&LjO$)1_n)@om?BR!}bANzTI=jQ(AsEn4i5B7dpvANqb+$Uy4OVr9?j@iskWSIDJ zLF{5ZTrl9|08t*=XZDTsM^cw}T{RZ)~ODH1;ngIy%J^IrS-2ya*W|JV|>jGAVs zU3m9Bn7$Sj-Qs(t%MRab_Hs><@{|P`l zQyUR`^HQwZAfJG?#>%LrW>;%xlU=_cYX8Re&UD&_Ut2q3Fv5lP#xl8Kq$k5aDB|SB z&1v7}q$pD4H-F&!5%lq76csx^zXHwUJ}%+Y8Xch=U zYvzZu%hPwaPoqe9AGS8#VN@&C3(U@4X=`})ojw^BHNc{`m&GbHsoFIs=3pT}ewFUu>GVty`^*&Y!+tv{XCp*a}ERs7}WmK@eISA7W9i z-5M!DCKm*0LC5CZ?O3F>VpBn|#3H{_Szn8jccBxj8P!2j@=Q*s!RgWeks@q!$Bo!# zYNDMwv1SbIi$F4BF zoHU4DdB28bFcv8s_AZpp<{(?R)Ntu_G)a{2V~mNKz?NmzZk7Vw79-VRgy*{7l?|v4 zUm88XX1r#*ra8a;xyG)pqcmlByx{pnU0}F>P}yS1Hva?AV-9@d&+3=|KUD|T?He^RqzM%Z3HXl{y4|m*vWi;T>%Udce>04%tfG3lmtR;U zVJ%xB_(PJtr=pj_Q{?pC;0{NL?9GQRmpl#De_O5rr0#U4P7EtMQQ#vjDbLJVZ_AF0 z-(yWU9LbZX7POfmBm%I(p+zDHIj zeHHO*SSF`^;ey(NxEun%VlnI-FpkNaw`>hTx zuW?a&zv;U>Q?I4az{GiJr4L`6&Z3il6eOja6Ccpx>*Tocmb=QXMDsc(SHm1=4y!Pn zo+07ct1GlL|FgHfKbE`;6XH(B4Ufy!=ri;3)aevJgAv@jIR!=L_ZAiNMsKd!2**fs z6}8T1t2`Ue`ccBWFeM^h(Rrp;-zLiC_tN2dT;D3G2Qd@ot6R5JC5D6!2EIrWv;HRA z59fKs%tiXkXZ!Qo#Lvw$7{~KaZBIwi)#`_1TGUg>k8Uk@=*%xs%fTtUQtjhRDxyEv z_j$m%#g;4{{q^(Wm#R-`P4XSyytK6~y28xS?+eV^O!F*Y?!il}U>q1>7f|xu_j01i zsE)%`s_<(~_$f~!q#J+fY&A5|GWb0YSHiLVP_4{*&ny0B%ffO#%Ai0$gL_B)rpiJH zn}~4bXqGsSiukDGYf;eyBcEn%xsQG?4eUhhTlSqax)F>MczZ(h%c*U;0?iTKRCLtb@DCS)1tC87ajSo9giElO_1m;=ZC?#eBwJ&Ngp1bdA% z2LbZ;PGw~eIlD>XCHqDHW~zJ~9(H)QFrq=Nr>*`Gea$DMEt4iRDQ@{ml~Yl-2=7!u zHbYLeRs$qQ>U<)zN9bUd*C-T-)`+B(Zv4*^LLNC?11(?O-$BhDJ*JHZRDpO6 zRjx)9a6@cYK%D&g_=}y8($)RCO_Rx#__(+{ypT_KE(`$fztJpTkNeZ%B|N2aoWqNuQVfTQRoAY8cVvN0e6TlyBl}Iposab1v!}8=xKdf>qU$U6gsLp2*seQ->i`=a~%Nfg;{M zQf)C%?yYl5+b2`P9h1PtI1d}CZ9e|GjcCrw%4ur8Kir%XOtxB;P-WB%&3Qfj`=*ynLb|-`~K#7w_tR<*uCPJ8szX6~%Yb8Hb$f`(||ef9(}Vi|Bzm z9acX9I<1bl$j)f}s_P=;lu~kbq34HG%`YDg^PNkQB>$t9VFVrKBn=5;Vd6v)vs)_Z zE-x~yeR^B$QmG$4DdCT$s?z6cZnQ1}o2H4Uto}^IerYl~hLsE_f{xuh zhfu?`WdssNRPZn_cTWx#BO@VZRqM{P{$F1jB#wapLc}ptKEgd- z<<6}=aAGxB+_$h?NK)D6^ztNkHy^0W`_c%fJO3f9myXcx-vbZ(g4@KM(^JSb#6nls zUaBaew@-BB+VpurwNdH5g$U>K%p3}vWlAXX;yDyg$bBr5eR1EKc|2q&&u@#zI=yq25oJO zrNtVPVuWY{dvLt$QNC3*U#877M{%vJ^d89wY9dg|e+UpLYP|_}1w7=K0HjM3=`jX8 z5Z6c$YYRASZ!|o|KodZ5`Ev$PAdtUHFhaVL-P8UrBaX|dZA6s(ml~SD06a6u!h(dC z<76bCNEWDuO+n2VbK9alDE6a2_1eo{`E6J9pKx6fJTgjhYF3z^UkzUcCQ%}aDX93h zLt;7vv|G7Bp8v2d1r6Rs{_fNfeYkq1I2Ub)9E+1tOu1yk&6&OD3SJ@-6mW{H(uOD9 zjkM<(OifM~?o703=;FDZaa$lk2fy14*&D=hg3kMVUsX>H8)%aJ7|R0{ldbvBm%n5O z1}0stjikPyZi?!GV0#wYxsP^as-c5Fm!=>*{~PoRp~Q*#suaN<^*1SbvE5~!pdfW) zP6)ng@Hxw_A3GcOPa_s=#JX?Qeb<434af}{p_!d)(vNv*Xz6{5h$C>VYxV1Gy*9b0 zL=;kJX=-&8h$!jXnpdg1<(AfvygRTeUOo=0ox^o-Tl!mV(I6^kkmtpVXKi?e^rfe) zJ+Qb_bJK1j&}hMp2|@S8rq8Fy(UWH8s%Hx?a`#U*xr3qZk4K^j^WXnS)9S&o@5bK@ z_NOL#vmYO$pC~KaU~e8HE7#zB!Uk=Au)~;Oc5f&A<~z^5fO^#RHxw%U@w58Z*(z28 z<$kT{fUV3@m|-v#H~h%DAD^UdeB~LvQUsFtukbdoO(GocO(~j@SY-zqwp!`N<(uONQze#rVx-{t|tiO46Y&vvU)0nmA!*fKPk{9MM*}(on&3}Ky zAE|KrG-e4Ugmr~9*VukNBs4_x#<2vxmm1o3o znH@)D~<8=u*HO<6yud7pUjGT)F9>9d!b^a3lg!UPXTy8%-!dOV~F#8 zPv(7AtdxWGG;#f&or(T6k8Ruzzg##&e|zMh-eW&w z6Hmc0yU1Cw^(SQCE_AA{Ww^QNQ5MCuM3jO|)x*vD(2vil8|WA7TPa2oiA>meex_eG zVCP0u7@M;-($BBzysBTuR!4L>Ex3;H%z|DLcXy}Y^k+U1m8iy#$c_^=8}4{{#kO$% z_u-6x!gu2&q|;S;T(r8tD?F)+?7#nB0?L36DuOSBgAe7wLI4^RE(%^Se@6YflZi2d zCJ|y{w#DjyR7S#QfIAl$XeR((^k1zl0T&={CU_lfX{B)f7ZUSW?yIo;y3o9DEig`b-703s^7!8J)u8$epy@o0bFlC0c zYna5NdJpUl>q$yx0_?QDGa2hwd?)p*m11{U%qy!h4L<#31x}E%QwgeO!^I84EF5Zc zR!~(3ZBGzpA)p0o8geGe|2P>|s5XU7|5xUF*9F9%Yzn+jfj{ryWj)|TY=`v->}}^- zp8?wVO}l4jZxDdbXn~D+CGros`~Nq*Bt|awi?o93g^+SvHhn0gPaPg7Qeq^0vli*f zBRaxE`tLRQ5OEIWHhtr^Lqjig?BPIiP+vD>!f$&OHY~gZ(hfL?Kch|<&67-1JOOyf z4D9XtARH_8V3>wz3Hb$xeM_PD)uL7dV{GlwuMu%c0$ce`>ZIDRo|ShO`YFvt$NDFB zzvgMS(Y-v!iAS8aTB_pAu1$6piAVO3%X=d|P264iHlczE6urR4`X8H`x+(8D)i=_u zCnU5U5hENVy7qc|yq&tNy*rEKp!-yEdPS%5)iYrYa7A%in(d}sjo<~E<`Q1wgC>vD ze<3DH=uCr2P@O6&)IAeUuf*GWkjNx>Q=ji*wp(v-2q+{K4{SBIE(R=X%?>`G{z^ zzcVTFr%Eh=#(!||r@D^K4+`@O)%w()K~=R1?X+r2wM@Y|9%a#ADMH#xU#Z*eU=}b1 z(BbM2*>(o;CQPjr7qYz#4flfJnFq%1%uo}VoP@~{>>!`W#r|5>qdpA>@rRz)I=yAE%CJ4ZL+PO4hPTL zS_)v}f)797g0uriS=dEz2iRF94~mL9m^q;B`wRN3O^A6OU)6V-Zdu1u&&l~5R6~d+$QC>;wzq~6bV2-${!j&TmPw6T!xzWs{@s$@dRW6pOdKv7Si{MVX6Nxqc(o^ z7QKF$=aWrgksNhG>K6oa&r$YA1b-beov}hv{&$X+d4%;aGIaVBheW=frw3>R4G`l_Di6Q>~Rb zeO}UFBkj(k=oC0;?U|i#vCIisk5%3@j6k}x={tG=-}bW`A1?-PWaL8@s=u2gXMP@0 z;$&gi=3)R3Xszw+@ZdEnNYX?QCD z#TN6k4Prqd$NZP)#W>l`{)uE~%7uWki!pNj3c?J?B>je@;E|Rj6&ce3^^=%O4+uSK zL{TzhSY22nGBE*ek^D|`aAf;p^I)J7_ODUR(ZR7fEj}S3tER=w`J-!ahX7Z=aY)eX zrnb2V?M)PteZeaaFp)|)ZtZ3fcSRM3r=p>@(?m#OK}aYwRa3~|Ex8Eo*VZq3q@e*} z$i#{3{zkv|Nmh>3QU@Cgi}%^^Ah=o}?<-Am$!`(!YM4vgD4`-uwv!}MMoGaH;C(<+ zJ#X79uN2T_>WKAj!`K&twwYq*ar%q5P_^o_fULGTc>LOnq$;k>=(#IFb(>HFP0L{B z9xm++1pfsRA{a_|tnJBmzz%zw6*u zA|T1ku(csCD=!JcaRqfJ{c>Vs5riaQQmsJJlv~_JI}K~7wiAUIDl!^D zt*%F?ZEL%05+B*?QV*4I@i$zH8F|rVs;enth@+k9;Xn7NY<0OT)pvwh*nDH` zo@km*d^s_fXYbjIN6>#F?L+_M$NwwbTVQIS0%-o9Jko+HaC715KvY~D<30X7TdP*R zazEz<$6Y2HG3XwyI{0UC5{pJpSsd&44c~4hepwU) zeBnN%B#|e1Q?9=({j+C+S@X@*i>_xpqY)P~66lFf-P!LedTUN3BYGQGcmX`iX9{kQ*ddV zlV7U2fJr~wFp(U$8wb5uLFhqf#}G;Xs~Q6Hoim;C=HBk{wZNF1x%%jT*B9jPn7qQ%z6xtx$FqmV3$oudq!we=qvIttBIE3VBF^I)+dh;`yTYYW7@znobjCCA2X*91JMt<&v;hm z_cU+TQgw@F2RO|qxWlY5t-}Z%pd>+2#X-DHUvC{k_{f16(IY+(S4s#FcC|B_<~=DQ z3Nv~e-#bw(DRLl8hc~%$ixLg>yHS>DgO?rRs*csD%eHSNp^>Nzkr5rgmnhm_!oi%U z#e%Z_Of;3!d55m|nE2i2e3Vmm(Dl+*_L8n-v$Y>$0fat`G3c3zdvDsKOv%4t*ZiEx zOw0P%3u0Q^xSHPFycA_C(vatcMGoCDzE$v|4AL#(HulRc;~dn3(vltsH{KT;#u)&I z9(L^0Pd?y%`(6j)Xjv12?}@}F_5Fe*_{Vt7iiX#FT5jp+$=y!tIlGWF;g+JSu`*2tcnu$@85~x~t8U z%lOa@_3rWuN`}Lm&SBGZ$FwT=OSzfzgt^IAR0JMQ@U>4iAnlyUqg^-jS0gIFMckv> z5&2GmwKWc|=^_}%znrdZZ|3j$Fx%|~m&Yu7$V9No9J|w{D;ooK?yn<6_v@sp3|N{F zD+fIFF!w>@RTPhQ>SWpbrYe$2L{eLJYG?!Sd|}7|Ddslub(E=)+Z#e@%?jLrN>dKn zfj~bBEuuqU$~;{EsC%;!CrR7GiGC2)&fG{A%e3juTI`s25Q$kaq7d1}@svYi!VK)e zW@7Q03Om0++^S`AxcCP`UNzGSaND$Q=%`4il=@o-BJ;Yz#U_+F(aj4bx9nxBD#fBiF3d%=ORSA1+)n?H(G9S^3195gO+TRb7Km@oC)#XeS(Za<} zHL!r20+v!5))YIA(N|BI)8(NwyAn#=NFFMV3TjUtQo2D^wG-clz3XcSu}O`DpN)sy z&2hafOyh6#_qYVALQ@3_@{Rnw{u0Zo!~h*P8-xL!;TG4hoLrNt>F4B5Cq;rV@Ec9U zyt;>d-eZ|gP!*z>)B0)@45hPF^*r6^{Q6uDE$-gRLd4HimD)7>CQjf{JXebx9(yYD zz-=|xAQ5{Oa^mA3c9fKO5tqH#u6xcdh+q5B#EaZKwQ3nT@#Pbe_^+pv%(X{j;;u$U z^T=FWV%v;AF66M(E!S8wic}iQR>VRoZ`a7Z8%O;?{Lt2n+FVh*hS1epi>l}Z7lbup zd`3Ggu5RP=lK-EvNZx5;$&A|;^kf{m*M4cjV9n}$xZtGvAZm))>(d#&#pM~jE5?`U z*2hG)D4>E?kUg}p71>Y6Gvsmb$*X8)7Ct3Ffz5`Z9adOnSMA})ROVEAHlV{|)q~BslCXsi5JvvA7h_qL)rqtb zA?^4e>(~cW-7Y2u){)Y(?4}Z(q4Eiap}+-o)RbLp{X?f5UG{weZeXkmSDVh? zG1p&y#IBxCbJsb<(7n1pjz4+&jz0JAhF=GhAo{|@exC8=eSUy~?vaH?X?;n)jso5D z#fY(0y77v?1KNIAk51#NT!2Ty#G_NqpD&X&=-xbv=WRNdKy6txaen=w zPG~sX7`KZhP^A#F@#T;n7!uwc?1T@zrhDJ@5xDNq`oZQnDZ0a-!s#nm^=Cc1}}am09Vg;0U%?03=T8| z98hA>k`50`#m{{qCF}u1qjyYoOZrn0xh=vcrRPbRXu{r7Jus%&j=%N3VCf{2%69`_ zA5h@B#-s<%+VbkZKmlQ1IW2fv;8k=bY|`~;#GcHO0T3|^KHJ-Mpb;e z2;;{ZK(QC8UxUU$y0Hs8Hpt|K$EiQ^i}s42A<;vF-*7eC2xvn?6iYogVoJ%{>ETNZTJH865jyQiag*7dpO?t zF>F(dpp3IfWTnFsh3@`&hT#AP*iAr>;$x&5g3Z7BffKmA<&!@HIA4txRGM`*6x3&u zmUDpF9#83G(LrG`5(msc18)gTw_$sCe~UNVYOx`F`9jn0J^&m?Tc+G6&M1X`fA5IS z0)71YZ@wa@Niw)h7M~gCkkYx2?yaRgQtWYS;aHKwJfl!NMkt&%lT6V3@BQm>DqP~} zLvB151gD+jG8J`JIx;G+2rG?<7y)mOo`spkq^8p?&&CPh#e>O4gVA{+a5tDgaR~Hq_4Y;p_PYRNpV`Qgul3@+_r)upn?7My zqjK@Z^y3+$vJ?Mh)49VQ6TmX&y8>XcqzC@(o9R??ZgxqHqTNoq_d(B0xfvL!)|9=^ zGU5n^C$2#D&bo-xqVN5FKi_XTb`w5t4<`}xqcFUkwkfWmi6;NiO6BK0UwXLJG~&4C zI=u?;yMP^0oTYOH;lqh#{cD!%+=E0~kQrGx?GdZKzCu!n%Xb@E#WnX6X+0e0Yf6Ws zJsCec@xQI4(#cKUA&IrmF-!IP8&PfwntYS1m0)Ph(0=^3TLm3117=zT5QP=a@5Pir z(lMekexljVhVsYC>YW1#FCCugVMyT|@ugp6qP;Gm1o z9pV9D;q}#N9czTNEv7em#Lr936 zE+@xhZOFgymQ2JNE}OURN&EcF=eAI@bo7h?t4C1;4M*%9m8SKzaitzEIr-+?T|)8v z#7=yVzR$`dMt%?FJ*A8$;ZrZ6>|sqDX`xoT+JyaXje~ndI+I6}UDmT|j(^Ej%~zyx zo$z;(L597rjmPgiY0!wmXXD=kd{@Q86pT*7`D|{}fl_#!;7)#IH^}+g8>=ga1q4(Lo&D%%LR=lYO*OR5l(SId}HMjff$_9aYqV<4v z_KZDclg}CQWJy=o0U!!?TP(JC&0TtQ%pBZW;ryd6#E}+B-od|j3STC-m*-0 z;wq`Xqfhg0>q-*7WTfD8Es*W~HpDOG+JiuBPy=!e2=5Yf&zBxk2~zp?&d~JKJ*5e+ zs!~YdM`7w?`Zb>$U3jgOZZ%(Hm*-$hDx8s?USdB`gQy4PJ&@(gIg_%d0^Ie8?|lGH zVYX6N(r7mJ-tt1UYJ*$6$gB@5CS2CnbdRdp+L2uhXS7LGJD-xnn18d;Psk^J`CBbI-a%D37DV)24M} zX?JVPN>#w1=MW_PB+PqkZ*}r6-K;Y#EKHr)q9j4x8((&r3Yi^~v`GW=nFR9-Lz85H z47XDb22s&ijz~{_ zt*{*igVm-fScJjl-2r&>FjjQtS2S6_z#z|%hL_!;TV`EzsnV=GqG;i~eAU*Eb`nk0 z1J0LkGczZfOu6cJfT@sLsQ$P$hSn1~ETl~A{uv(r>fwF5hoONI@Z5`JS{XTn~ zk#`FCu5K`9MnjmPF>y}XvzV(y_6FWeJZ->9K0D`Tv5au--H}(4Q*o>>wLNcOu>$%| zleyd4!t>$8vkhGGq$%l0!ubvP7Hi?CuW_`yV0vxMGIck&h}396-of@uo|+ySPeI9~ zru}C&jAp18y(w2v&|nF>oEEHXPsp8yn+9uhP>_(E*G@z26;fMg-9FiTeQXlJx!Rv$ zkhjkgnnX+e9|}e`82*5rUptUSIvP3jrXfR7snc9_l1VzU2lTN`cA^YIA|o}T?;Xd9 zCh@9ot|u8GEIt1jKXM4XWS?Pa>t5Jz4Ju}LFS@D|O@U^jpn%w`Rf%l-ooLSA=QWeA8mRm4|Hm{(-txHk#kUxGGi&FR> zP++T*^=_uiL4X>Uc6^d7jP86KpK~_jbndus4>rbis*4bPo6j@{+iq}x>Lkw@N=ec6 zS4&M@j>VuGC{gf}*7xbm=vL@qO-=X3z|Ix#3F#4$ZY6KQStKiD{`gTw9^C4>HA%|c_%f*kj-l4Z|%a3yM^PzNc zu)qsEDym$|=INnp)6DG(ZX4(j@FfNzAq{%Sb3)YX^Dg+IG?2EP+{#SXWHb0wj8qtS z?)QgTR}X(_D0QP!QBeueD@;;M*MoHn?R%04xgThZtnIn>d3pMHe|svT zBC;%3+8o*Bw$u}q!sCM`dUg5a0x7|}tbF?2ov$VLTOr9lg~XK3f7d=Fa{3C)399mH zc5+Q=xShz!w=TfYIly__cL}6-$jQAk?$3;>HKEkhp<`2fLX*Dmfjz(|7M7gv#D*h> zja&e(;`h+K=Cj4*A|GD=7&3f%gD)BE+C{qk!cf!L)OlE3@0x1>+KEOe%+?*)D2vl+ zxSJ6?-mr)7R`}*x{BIrsbnbl{>Q%n4|5?}ck5;9fam+{|r?`d%Fw&U0P?y23T-US) zz#Tq$&5w^;^z){mg~?5po_qW2JpD85o^vkRk76^MdqC?fdrj_V)=6pxlU@e+BxD>yM$m=HjCj6;7LES^I=$A+OXV z@FRp}VT{!8t}y=WVMT+2j`DN0m;Md2e&pfiLm!jcX)bWADI10 z@V6I?t}oI1!8ZnzkH8z`fjsF+U1>?PB?qdw$4U+~pN4}HviZCJmv`n#w_eX_Ad8ro zkwIczEiJ41xlxaYN2jy1Gw4cr;}J0<{UJ*nOKxe$Az3FpB>)f*)MmbA4#8_#pV6|1a Date: Wed, 7 Sep 2022 09:26:25 +0000 Subject: [PATCH 32/34] ran linter --- ..._tabular_regression_model_evaluation.ipynb | 2981 ++++++++--------- 1 file changed, 1481 insertions(+), 1500 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb index 01bd788ab..bbeec9ec8 100644 --- a/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_regression_model_evaluation.ipynb @@ -1,1502 +1,1483 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Regression model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI regression model evaluation component to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you learn how to evaluate a Vertex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `AutoML`\n", - "- Vertex AI `TabularDataset` (AutoML)\n", - "- Vertex AI `AutoMLTabularTrainingJob`\n", - "- Vertex AI `BatchPrediction`\n", - "- Vertex AI `Pipeline`\n", - "- Vertex AI `Model Registry`\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Create a Vertex AI Dataset\n", - "- Configure a `AutoMLTabularTrainingJob`\n", - "- Run the `AutoMLTabularTrainingJob` which returns a model\n", - "- Import a pre-trained `AutoML model resource` into the pipeline\n", - "- Run a `batch prediction` job\n", - "- Evaulate the AutoML model using the `regression evaluation component`\n", - "- Import the Classification Metrics to the AutoML model resource" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook. You can skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. \n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). \n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "import matplotlib.pyplot as plt\n", - "from google.cloud import aiplatform_v1\n", - "from kfp.v2 import compiler" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BiVlyW5OUnjK" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bViYfWfpVAiF" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "20S9En09X0PY" - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset\",\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print(\"Resource name:\", dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "A-QQkeUnq8Xt" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple regression model using the created dataset using `Age` as the target column. \n", - "\n", - "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Bxn6ATUXrET6" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2e7664fe3af6" - }, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if (\n", - " TRAINING_JOB_DISPLAY_NAME == \"\"\n", - " or TRAINING_JOB_DISPLAY_NAME is None\n", - " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", - "):\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6cb41277f4f3" - }, - "source": [ - "An AutoML training job is created with the `AutoMLTabularTrainingJob` class, with the following parameters:\n", - "\n", - "- `display_name`: The human readable name for the `TrainingJob` resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", - "- `column_transformations`: Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using \".\" as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. If none of column_transformations or column_specs is passed, the local credentials being used will try setting column_transformations to \"auto\". To do this, the local credentials require read access to the GCS or BigQuery training data source.\n", - "- `optimization_objective`: The optimization objective to minimize or maximize.\n", - " - `minimize-rmse`\n", - " - `minimize-mae`\n", - " - `minimize-rmsle`\n", - "\n", - "To learn more about `AutoMLTabularTrainingJob` click [here](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob) " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3l691PEMZFdA" - }, - "outputs": [], - "source": [ - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"regression\",\n", - " column_specs={\n", - " \"Type\": \"categorical\",\n", - " \"Breed1\": \"categorical\",\n", - " \"Gender\": \"categorical\",\n", - " \"Color1\": \"categorical\",\n", - " \"Color2\": \"categorical\",\n", - " \"MaturitySize\": \"categorical\",\n", - " \"FurLength\": \"categorical\",\n", - " \"Vaccinated\": \"categorical\",\n", - " \"Sterilized\": \"categorical\",\n", - " \"Health\": \"categorical\",\n", - " \"Fee\": \"numeric\",\n", - " \"PhotoAmt\": \"numeric\",\n", - " \"Adopted\": \"categorical\",\n", - " },\n", - " optimization_objective=\"minimize-rmse\",\n", - ")\n", - "\n", - "print(train_job)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "391c51c98647" - }, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "454f077b984e" - }, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "c4f338cdea7c" - }, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if (\n", - " MODEL_DISPLAY_NAME == \"\"\n", - " or MODEL_DISPLAY_NAME is None\n", - " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", - "):\n", - " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "de7e24205889" - }, - "source": [ - "Next, you start the training job by invoking the method `run`, with the following parameters:\n", - "\n", - "- `dataset`: The `Dataset` resource to train the model.\n", - "- `target_column`: The name of the column, whose values the model is to predict.\n", - "- `training_fraction_split`: The percentage of the dataset to use for training.\n", - "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", - "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", - "- `model_display_name`: The human readable name for the trained model.\n", - "- `disable_early_stopping`: If true, the entire budget is used.\n", - "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5caae7fc10d9" - }, - "source": [ - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "IIfvPCGYyFCT" - }, - "outputs": [], - "source": [ - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=\"Age\",\n", - " training_fraction_split=0.8,\n", - " validation_fraction_split=0.1,\n", - " test_fraction_split=0.1,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " disable_early_stopping=False,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rYirKB_9yaa0" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "KkgCdQQAyZP1" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3f4d0c17150d" - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "581a188f0453" - }, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5adde0951eb5" - }, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a Dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ktMsqtibAUzz" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(\n", - " name=\"vertex-evaluation-automl-tabular-regression-feature-attribution\"\n", - ")\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = \"jsonl\",\n", - " batch_predict_machine_type: str = \"n1-standard-4\",\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - " dataflow_max_num_workers: int = 5,\n", - " dataflow_use_public_ips: bool = True,\n", - " encryption_spec_key_name: str = \"\",\n", - "):\n", - "\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " EvaluationDataSamplerOp, GetVertexModelOp,\n", - " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", - " ModelImportEvaluationOp)\n", - "\n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size,\n", - " )\n", - "\n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs[\"model\"],\n", - " job_display_name=\"model-registry-batch-predict-evaluation\",\n", - " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationRegressionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " predictions_format=batch_predict_predictions_format,\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format=\"jsonl\",\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " dataflow_max_workers_num=dataflow_max_num_workers,\n", - " dataflow_use_public_ips=dataflow_use_public_ips,\n", - " encryption_spec_key_name=encryption_spec_key_name,\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", - " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", - " model=get_model_task.outputs[\"model\"],\n", - " dataset_type=batch_predict_instances_format,\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "a712dfa762ee" - }, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_regression_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NOvOMTEgCVcW" - }, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_regression_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bdd9e2fd6841" - }, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "8f17c5c7b3e3" - }, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "1aa7d7bbb1c9" - }, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if (\n", - " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", - " or PIPELINE_DISPLAY_NAME == \"\"\n", - " or PIPELINE_DISPLAY_NAME is None\n", - "):\n", - " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "90f424d5dca0" - }, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", - "- `target_column_name`: Name of the column to be used as the target for regression.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction. Can be \"jsonl\", \"csv\" or \"bigquery\".\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "JeSiA6-TSgV8" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", - "parameters = {\n", - " \"project\": PROJECT_ID,\n", - " \"location\": REGION,\n", - " \"root_dir\": PIPELINE_ROOT,\n", - " \"model_name\": model.resource_name,\n", - " \"target_column_name\": \"Age\",\n", - " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", - " \"batch_predict_instances_format\": \"csv\",\n", - " \"batch_predict_explanation_data_sample_size\": 3000,\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "859fa6611d9a" - }, - "source": [ - "Next, you create the pipeline job, with the following parameters:\n", - "\n", - "- `display_name`: The user-defined name of this Pipeline.\n", - "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI (e.g. \"gs://project.name\"), or an Artifact Registry URI (e.g. \"https://us-central1-kfp.pkg.dev/proj/repo/pack/latest\").\n", - "- `parameter_values`: The mapping from runtime parameter names to its values that control the pipeline run.\n", - "- `enable_caching`: Whether to turn on caching for the run. If this is not set, defaults to the compile time settings, which are True for all tasks by default, while users may specify different caching options for individual tasks. If this is set, the setting applies to all tasks in the pipeline. Overrides the compile time settings.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e8dce0638349" - }, - "source": [ - "Run the pipeline using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pdHib_yUEuEk" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_regression_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "625960707c60" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U2zocUvk2YVs" - }, - "source": [ - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "mtHA8rhGGQv3" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if (\n", - " (\"model-evaluation\" in task.task_name)\n", - " and (\"model-evaluation-import\" not in task.task_name)\n", - " and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " )\n", - " ):\n", - " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", - " 0\n", - " ] # ['artifacts']\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e69f183f902b" - }, - "source": [ - "### Visualize the metrics\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "b7c5e5c35ee9" - }, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " if (\n", - " i[0] == \"meanAbsolutePercentageError\"\n", - " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", - " continue\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(10, 5))\n", - "plt.bar(x=metrics, height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "c26ad3958895" - }, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance.\n", - "\n", - "To learn more about Feature Attributions click [here](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "b09056628b26" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if (task.task_name == \"feature-attribution\") and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " ):\n", - " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print(feat_attrs)\n", - "print(feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "4d9d6a82d826" - }, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "c26a2091f4fc" - }, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print(attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "77151be8d776" - }, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "069bf017e0de" - }, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - "\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "automl_tabular_regression_model_evaluation.ipynb", - "toc_visible": true - }, - "environment": { - "kernel": "python3", - "name": "common-cpu.m94", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/base-cpu:m94" - }, - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.12" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Regression model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use Vertex AI regression model evaluation component to evaluate an AutoML regression model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you learn how to evaluate a Vertex AI model resource through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `AutoML`\n", + "- Vertex AI `TabularDataset` (AutoML)\n", + "- Vertex AI `AutoMLTabularTrainingJob`\n", + "- Vertex AI `BatchPrediction`\n", + "- Vertex AI `Pipeline`\n", + "- Vertex AI `Model Registry`\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI Dataset\n", + "- Configure a `AutoMLTabularTrainingJob`\n", + "- Run the `AutoMLTabularTrainingJob` which returns a model\n", + "- Import a pre-trained `AutoML model resource` into the pipeline\n", + "- Run a `batch prediction` job\n", + "- Evaulate the AutoML model using the `regression evaluation component`\n", + "- Import the Classification Metrics to the AutoML model resource" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting age of the pet. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket and is accessed from there in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook. You can skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). \n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click *Create*. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BiVlyW5OUnjK" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bViYfWfpVAiF" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "20S9En09X0PY" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset\",\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A-QQkeUnq8Xt" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple regression model using the created dataset using `Age` as the target column. \n", + "\n", + "Set a display name and create the `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Bxn6ATUXrET6" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2e7664fe3af6" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-pet-agefinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6cb41277f4f3" + }, + "source": [ + "An AutoML training job is created with the `AutoMLTabularTrainingJob` class, with the following parameters:\n", + "\n", + "- `display_name`: The human readable name for the `TrainingJob` resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression,classification\n", + "- `column_transformations`: Transformations to apply to the input columns (i.e. columns other than the targetColumn). Each transformation may produce multiple result values from the column's value, and all are used for training. When creating transformation for BigQuery Struct column, the column should be flattened using \".\" as the delimiter. Only columns with no child should have a transformation. If an input column has no transformations on it, such a column is ignored by the training, except for the targetColumn, which should have no transformations defined on. Only one of column_transformations or column_specs should be passed. Consider using column_specs as column_transformations will be deprecated eventually. If none of column_transformations or column_specs is passed, the local credentials being used will try setting column_transformations to \"auto\". To do this, the local credentials require read access to the GCS or BigQuery training data source.\n", + "- `optimization_objective`: The optimization objective to minimize or maximize.\n", + " - `minimize-rmse`\n", + " - `minimize-mae`\n", + " - `minimize-rmsle`\n", + "\n", + "To learn more about `AutoMLTabularTrainingJob` click [here](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3l691PEMZFdA" + }, + "outputs": [], + "source": [ + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"regression\",\n", + " column_specs={\n", + " \"Type\": \"categorical\",\n", + " \"Breed1\": \"categorical\",\n", + " \"Gender\": \"categorical\",\n", + " \"Color1\": \"categorical\",\n", + " \"Color2\": \"categorical\",\n", + " \"MaturitySize\": \"categorical\",\n", + " \"FurLength\": \"categorical\",\n", + " \"Vaccinated\": \"categorical\",\n", + " \"Sterilized\": \"categorical\",\n", + " \"Health\": \"categorical\",\n", + " \"Fee\": \"numeric\",\n", + " \"PhotoAmt\": \"numeric\",\n", + " \"Adopted\": \"categorical\",\n", + " },\n", + " optimization_objective=\"minimize-rmse\",\n", + ")\n", + "\n", + "print(train_job)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "391c51c98647" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "454f077b984e" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c4f338cdea7c" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-agefinder-prediction-model_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "de7e24205889" + }, + "source": [ + "Next, you start the training job by invoking the method `run`, with the following parameters:\n", + "\n", + "- `dataset`: The `Dataset` resource to train the model.\n", + "- `target_column`: The name of the column, whose values the model is to predict.\n", + "- `training_fraction_split`: The percentage of the dataset to use for training.\n", + "- `validation_fraction_split`: The percentage of the dataset to use for validation.\n", + "- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).\n", + "- `model_display_name`: The human readable name for the trained model.\n", + "- `disable_early_stopping`: If true, the entire budget is used.\n", + "- `budget_milli_node_hours`: (optional) Maximum training time specified in unit of millihours (1000 = hour).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5caae7fc10d9" + }, + "source": [ + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IIfvPCGYyFCT" + }, + "outputs": [], + "source": [ + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=\"Age\",\n", + " training_fraction_split=0.8,\n", + " validation_fraction_split=0.1,\n", + " test_fraction_split=0.1,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " disable_early_stopping=False,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rYirKB_9yaa0" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KkgCdQQAyZP1" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3f4d0c17150d" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "581a188f0453" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5adde0951eb5" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the regression evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationRegressionOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports regression for tabular data.[here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationRegressionOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a Dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ktMsqtibAUzz" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(\n", + " name=\"vertex-evaluation-automl-tabular-regression-feature-attribution\"\n", + ")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + " dataflow_max_num_workers: int = 5,\n", + " dataflow_use_public_ips: bool = True,\n", + " encryption_spec_key_name: str = \"\",\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationFeatureAttributionOp, ModelEvaluationRegressionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationRegressionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=\"jsonl\",\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " dataflow_max_workers_num=dataflow_max_num_workers,\n", + " dataflow_use_public_ips=dataflow_use_public_ips,\n", + " encryption_spec_key_name=encryption_spec_key_name,\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " regression_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a712dfa762ee" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_regression_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NOvOMTEgCVcW" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_regression_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bdd9e2fd6841" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8f17c5c7b3e3" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1aa7d7bbb1c9" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_agefinder_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "90f424d5dca0" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory will be created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Regression model.\n", + "- `target_column_name`: Name of the column to be used as the target for regression.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction. Can be \"jsonl\", \"csv\" or \"bigquery\".\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JeSiA6-TSgV8" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_agefinder_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"target_column_name\": \"Age\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "859fa6611d9a" + }, + "source": [ + "Next, you create the pipeline job, with the following parameters:\n", + "\n", + "- `display_name`: The user-defined name of this Pipeline.\n", + "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI (e.g. \"gs://project.name\"), or an Artifact Registry URI (e.g. \"https://us-central1-kfp.pkg.dev/proj/repo/pack/latest\").\n", + "- `parameter_values`: The mapping from runtime parameter names to its values that control the pipeline run.\n", + "- `enable_caching`: Whether to turn on caching for the run. If this is not set, defaults to the compile time settings, which are True for all tasks by default, while users may specify different caching options for individual tasks. If this is set, the setting applies to all tasks in the pipeline. Overrides the compile time settings.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e8dce0638349" + }, + "source": [ + "Run the pipeline using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pdHib_yUEuEk" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_regression_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "625960707c60" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline DAG nodes will expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U2zocUvk2YVs" + }, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mtHA8rhGGQv3" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[\n", + " 0\n", + " ] # ['artifacts']\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e69f183f902b" + }, + "source": [ + "### Visualize the metrics\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b7c5e5c35ee9" + }, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " if (\n", + " i[0] == \"meanAbsolutePercentageError\"\n", + " ): # we are not considering MAPE as it is infinite. MAPE is infinite if groud truth is 0 as in our case Age is 0 for some instances.\n", + " continue\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(10, 5))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c26ad3958895" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance.\n", + "\n", + "To learn more about Feature Attributions click [here](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b09056628b26" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature-attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4d9d6a82d826" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c26a2091f4fc" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "77151be8d776" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "069bf017e0de" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "automl_tabular_regression_model_evaluation.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 } From 5fbda9e5bf40f4d2d4c0bd3a5af6c0397c638d2f Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Thu, 8 Sep 2022 18:39:13 +0000 Subject: [PATCH 33/34] addresses the tech-writer's comments + updates the pipeline image with data-sampler task --- ...ular_classification_model_evaluation.ipynb | 2874 +++++++++-------- ...lar_classification_evaluation_pipeline.PNG | Bin 84131 -> 56192 bytes 2 files changed, 1447 insertions(+), 1427 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb index c0a64e06f..6fd218428 100644 --- a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb @@ -1,1429 +1,1449 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `Datasets`\n", - "- Vertex AI `Training`(AutoML Tabular Classification) \n", - "- Vertex AI `Model Registry`\n", - "- Vertex AI `Pipelines`\n", - "- Vertex AI `Batch Predictions`\n", - "\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Create a Vertex AI `Dataset`.\n", - "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", - "- Import the trained `AutoML model resource` into the pipeline.\n", - "- Run a `Batch Prediction` job.\n", - "- Evaulate the AutoML model using the `Classification Evaluation Component`.\n", - "- Import the Classification Metrics to the AutoML model resource." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click *Create*. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "import google.cloud.aiplatform as aiplatform\n", - "import kfp\n", - "import matplotlib.pyplot as plt\n", - "from google.cloud import aiplatform_v1\n", - "from kfp.v2 import compiler" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8d97acf78771" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3390c9e9426c" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2011a473ce65" - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print(\"Resource name:\", dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6da01c2f1d4f" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple classification model the created dataset using `Adopted` as the target column. \n", - "\n", - "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "5dd3db2d1225" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "0614e3fb19da" - }, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if (\n", - " TRAINING_JOB_DISPLAY_NAME == \"\"\n", - " or TRAINING_JOB_DISPLAY_NAME is None\n", - " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", - "):\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce9c9f279674" - }, - "source": [ - "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", - "\n", - "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", - "- `column_specs`(Optional): Transformations to apply to the input columns(including data-type corrections).\n", - "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", - "\n", - "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d33629c2aae6" - }, - "outputs": [], - "source": [ - "# Define the AutoML training job\n", - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"classification\",\n", - " column_specs={\n", - " \"Type\": \"categorical\",\n", - " \"Age\": \"numeric\",\n", - " \"Breed1\": \"categorical\",\n", - " \"Color1\": \"categorical\",\n", - " \"Color2\": \"categorical\",\n", - " \"MaturitySize\": \"categorical\",\n", - " \"FurLength\": \"categorical\",\n", - " \"Vaccinated\": \"categorical\",\n", - " \"Sterilized\": \"categorical\",\n", - " \"Health\": \"categorical\",\n", - " \"Fee\": \"numeric\",\n", - " \"PhotoAmt\": \"numeric\",\n", - " },\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "391c51c98647" - }, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "454f077b984e" - }, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "21b5a27e8171" - }, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if (\n", - " MODEL_DISPLAY_NAME == \"\"\n", - " or MODEL_DISPLAY_NAME is None\n", - " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", - "):\n", - " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "93ebafd3f347" - }, - "source": [ - "Run training job on the created TabularDataset by passing the following arguments for training:\n", - "\n", - "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", - "- `target_column`: The name of the column values of which the Model is to predict.\n", - "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", - "- `budget_milli_node_hours`(Optional): The train budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", - "\n", - "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", - "\n", - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "9ce44a2ab942" - }, - "outputs": [], - "source": [ - "# Specify the target column\n", - "target_column = \"Adopted\"\n", - "\n", - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=target_column,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bfa52eb3f22f" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d56e2b3cf57d" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bd2e1da7a64e" - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "19c434d8b035" - }, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature-attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) python package.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ab9f273691cc" - }, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a Dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "327d8d4e11b2" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(\n", - " name=\"vertex-evaluation-automl-tabular-classification-feature-attribution\"\n", - ")\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = \"jsonl\",\n", - " batch_predict_machine_type: str = \"n1-standard-4\",\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - "):\n", - "\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " EvaluationDataSamplerOp, GetVertexModelOp,\n", - " ModelEvaluationClassificationOp, ModelEvaluationFeatureAttributionOp,\n", - " ModelImportEvaluationOp)\n", - "\n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size,\n", - " )\n", - "\n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs[\"model\"],\n", - " job_display_name=\"model-registry-batch-predict-evaluation\",\n", - " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationClassificationOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " problem_type=\"classification\",\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " predictions_format=batch_predict_predictions_format,\n", - " )\n", - "\n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format=batch_predict_predictions_format,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " classification_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", - " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", - " model=get_model_task.outputs[\"model\"],\n", - " dataset_type=batch_predict_instances_format,\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "1abb012ce04b" - }, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_classification_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "e526b588cae9" - }, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_classification_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "26eef4b83c88" - }, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "63b84f5490d2" - }, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "e0a18b803bb7" - }, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if (\n", - " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", - " or PIPELINE_DISPLAY_NAME == \"\"\n", - " or PIPELINE_DISPLAY_NAME is None\n", - "):\n", - " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "a9571ef567de" - }, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", - "- `target_column_name`: Name of the column to be used as the target for classification.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction. Can be '**jsonl**' or '**bigquery**' or '**csv**'.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "52d622c274d2" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", - "parameters = {\n", - " \"project\": PROJECT_ID,\n", - " \"location\": REGION,\n", - " \"root_dir\": PIPELINE_ROOT,\n", - " \"model_name\": model.resource_name,\n", - " \"target_column_name\": \"Adopted\",\n", - " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", - " \"batch_predict_instances_format\": \"csv\",\n", - " \"batch_predict_explanation_data_sample_size\": 3000,\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0409b0f330c2" - }, - "source": [ - "Create a Vertex AI pipeline job using the following parameters:\n", - "\n", - "- `display_name`: The name of the pipeline, this will show up in the Google Cloud console.\n", - "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI or an Artifact Registry URI.\n", - "- `parameter_values`: The mapping from runtime parameter names to its values that\n", - " control the pipeline run.\n", - "- `enable_caching`: Whether to turn on caching for the run.\n", - "\n", - "Learn more about the `PipelineJob` class from [this documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).\n", - "\n", - "After creating, run the pipeline job using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "894afe1ba396" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_classification_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline DAG nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ec4ec00ab350" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if (\n", - " (\"model-evaluation\" in task.task_name)\n", - " and (\"model-evaluation-import\" not in task.task_name)\n", - " and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " )\n", - " ):\n", - " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[0]\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ca00512eb89f" - }, - "source": [ - "### Visualize the metrics\n", - "\n", - "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "f9e38f73f838" - }, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=metrics, height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "049c9bbae2cb" - }, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "03ca8c149bc6" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature-attribution task\n", - " if (task.task_name == \"feature-attribution\") and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " ):\n", - " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print(feat_attrs)\n", - "print(feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "719d2cd57d10" - }, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "82e308dd8aca" - }, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print(attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5bfe517357f8" - }, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d7a7dca9e3cc" - }, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - "\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "automl_tabular_classification_model_evaluation.ipynb", - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - } - }, - "nbformat": 4, - "nbformat_minor": 0 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use the Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `Datasets`\n", + "- Vertex AI `Training`(AutoML Tabular Classification) \n", + "- Vertex AI `Model Registry`\n", + "- Vertex AI `Pipelines`\n", + "- Vertex AI `Batch Predictions`\n", + "\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI `Dataset`.\n", + "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", + "- Import the trained `AutoML model resource` into the pipeline.\n", + "- Run a `Batch Prediction` job.\n", + "- Evaulate the AutoML model using the `Classification Evaluation Component`.\n", + "- Import the Classification Metrics to the AutoML model resource." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click **Create**. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries\n", + "\n", + "Import the Vertex AI Python SDK and other required Python libraries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import google.cloud.aiplatform as aiplatform\n", + "import json\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8d97acf78771" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3390c9e9426c" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2011a473ce65" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6da01c2f1d4f" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple classification model the created dataset using `Adopted` as the target column. \n", + "\n", + "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5dd3db2d1225" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0614e3fb19da" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce9c9f279674" + }, + "source": [ + "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", + "\n", + "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", + "- `column_specs`(Optional): Transformations to apply to the input columns (including data-type corrections).\n", + "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", + "\n", + "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d33629c2aae6" + }, + "outputs": [], + "source": [ + "# Define the AutoML training job\n", + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"classification\",\n", + " column_specs={\n", + " \"Type\": \"categorical\",\n", + " \"Age\": \"numeric\",\n", + " \"Breed1\": \"categorical\",\n", + " \"Color1\": \"categorical\",\n", + " \"Color2\": \"categorical\",\n", + " \"MaturitySize\": \"categorical\",\n", + " \"FurLength\": \"categorical\",\n", + " \"Vaccinated\": \"categorical\",\n", + " \"Sterilized\": \"categorical\",\n", + " \"Health\": \"categorical\",\n", + " \"Fee\": \"numeric\",\n", + " \"PhotoAmt\": \"numeric\",\n", + " },\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "391c51c98647" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "454f077b984e" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "21b5a27e8171" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "93ebafd3f347" + }, + "source": [ + "Run training job on the created TabularDataset by passing the following arguments for training:\n", + "\n", + "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", + "- `target_column`: The name of the column values of which the Model is to predict.\n", + "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", + "- `budget_milli_node_hours`(Optional): The training budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", + "\n", + "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", + "\n", + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9ce44a2ab942" + }, + "outputs": [], + "source": [ + "# Specify the target column\n", + "target_column = \"Adopted\"\n", + "\n", + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=target_column,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bfa52eb3f22f" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d56e2b3cf57d" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bd2e1da7a64e" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "19c434d8b035" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) Python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ab9f273691cc" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a Dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "327d8d4e11b2" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(\n", + " name=\"vertex-evaluation-automl-tabular-classification-feature-attribution\"\n", + ")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationClassificationOp, ModelEvaluationFeatureAttributionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationClassificationOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " problem_type=\"classification\",\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=batch_predict_predictions_format,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " classification_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1abb012ce04b" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_classification_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e526b588cae9" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_classification_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "26eef4b83c88" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "63b84f5490d2" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e0a18b803bb7" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a9571ef567de" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", + "- `target_column_name`: Name of the column to be used as the target for classification.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction. Can be '**jsonl**' or '**bigquery**' or '**csv**'.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "52d622c274d2" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"target_column_name\": \"Adopted\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0409b0f330c2" + }, + "source": [ + "Create a Vertex AI pipeline job using the following parameters:\n", + "\n", + "- `display_name`: The name of the pipeline, this will show up in the Google Cloud console.\n", + "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI or an Artifact Registry URI.\n", + "- `parameter_values`: The mapping from runtime parameter names to its values that\n", + " control the pipeline run.\n", + "- `enable_caching`: Whether to turn on caching for the run.\n", + "\n", + "Learn more about the `PipelineJob` class from [this documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).\n", + "\n", + "After creating, run the pipeline job using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "894afe1ba396" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_classification_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline directed acyclic graph (DAG) nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ec4ec00ab350" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[0]\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ca00512eb89f" + }, + "source": [ + "### Visualize the metrics\n", + "\n", + "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "f9e38f73f838" + }, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "049c9bbae2cb" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "03ca8c149bc6" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "719d2cd57d10" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "82e308dd8aca" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5bfe517357f8" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d7a7dca9e3cc" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "automl_tabular_classification_model_evaluation.ipynb", + "toc_visible": true + }, + "environment": { + "kernel": "python3", + "name": "common-cpu.m90", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/base-cpu:m90" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 } diff --git a/notebooks/community/model_evaluation/images/automl_tabular_classification_evaluation_pipeline.PNG b/notebooks/community/model_evaluation/images/automl_tabular_classification_evaluation_pipeline.PNG index 275c9df733e16d23116112d6446ea7cedd57978a..80e59879f43ea03cb9d999293ab880e4ff83a49f 100644 GIT binary patch literal 56192 zcmce;XH-*d*EK4lpwd)SYCr)&KOpOBmAaT`HymzXkhj|hBaK>6*RsPhevPg;pv$MeG^Ug~8uBT32 zyGi^ZQP$*GKXq#Ss`3MQZBOHsLwo{rYZ{hnlGWq*W&M8VijP${3_9Q7+UtV_b;Csz zvZJ#e=xbh{Rf>*%5O$gSQS1YLsi^Bpr#UEpo%?z@{`%bjvf!M+ERtWU+4cC!F`v-2 zAzGup*F8ONItut=l5oB~d+FE$UmhW$LbmC4Ck3mc1T5CkOn{z}Ea;1=0R2(Kv8I6m z@1IYUWX)3>Cf`WY($XHm8;vt3sc{~8Rrl*bORG zadB}=YrDIiUo$6J2F8QDeLfU8IXPwYm2``h?vn&39DR#{DBh5&yZieUq2Azu@vKR; z%570m(SC#x%^dv#xV9#zBum3~X;~*#7V0Qfu)K7c47YRgLUx_Qq^{UHOtN10zei*xI>miUn5t!!Rz4+PNFSQPtVPc4HNh;oGM)*q`nGmkvi}%U{SfV~dS_5RHqsmX zQbCAtaJk+6`}cp0pl=(=$OvA&VNr6X8#yd4`WQF^I2`WTQPYp4y?5>g14;zjR&b+M zOhDi|E<79&LEPS#zqs{4?tnzsLO6jl@() zU~g#|89c_QDKVaZ4~cvrH8>!-O<^O5&6j3~AnZPI`ePD+p$TP)hx_>W)N1SMJ`!f} zVG&&+&NBg+XGFwnA?yO(ce%`rM8KYMKQvPi&FFuO)8d{JBb2h^>eZ{7y1EL$U}WbC zAg!YOzn?Khr4f+-jHD*Jpo*Cs8bXj+Sq#T$pAa2UV zcZ>+gFf{fHh~ou1j83YyW@f(XSaWh@-)aQ}<>BF-1&bdEWbv(>8AYZ*gE8Lb8x?X=@nB2ed9!8dl5kK?WSlir({SB_&F=mq^EZ*r52;zP9%B3|8+D_MaAc~_aHIvwQ7TO;$ea(x) zY&c9rt5ofWyCLjb!hKUSr7+Zi6eM!24DVOL6Ogbw&b{ z->$ir*Q(So4mo(9?&6f!u*~MlSGobH-c#+Oj32u$YoZ#mxv-kKT?+ zi3@YEb=39u-&>w~1T)X}b>2=CLe29VYU3+BT$Z*OEF;V`A&tvi>B*(1p|b&B8jiCR zKU7w#RXR=y7B@?oq5R$F*Jg^LJIeP51i-_zrE04nkZqd4H!fMi(>a+#__I*4Db6v8 zY+1|M*>baftIyrg<6r7axbgkTxKclify>J7dG{$tX66L~)(0Qpiy5Da2@0iFAD)eF zO;6OIrj+zk6j&12%PO zpnVG~Dk=+x$F#pbXnva9*}=;*>oe7;Y?-9vAzcu}a@&~j!_A8M7=x9mI+~`|A(M@C zF%32=Y22G&%E?1Mg;Aq+=M4OBIMvrXNns*zFH)Dw2G3712w_e32)GVIOy(JhWh+N^ zXGR;>4vHmsxf={8-_{%{%cnSbY^nh3?p<7MJvzH<^y@ll1}_g^QdxuVs}m=5gFEIt zhPuCgSE;d(Tm4Fg$@l1uA~1Z#2#@B4!C*_DZERJW?a1OKsYHdw1xkR!Jw#fJx>&jh zo}JCDLL<7wx(W=EJg4+Iog1tRX`3pNO8u8>c_<=Ue~?;FbUB7<*w)aR14yN3{zt(u&zjpg^`qyGg{@<5<7w*OcrFTYiqs}mFFB@WSWnlmmbB{TX17KZ+T5}3 zN!QY{5=da7E0$75A;;&CRh6IfNn^aW=<*x{=aHD}ma|Ku1tk)gfSQu0hvh4ipY%Fj zps%`$Vg*)*9jaktvHo`m#}oS*Zrv5j&b+OUExM3CUP7to{}qSmx*!UbyoT%x zh*asF8LcH)mpYWxveeJASg1;%nsJNo&anSH8Q>(cYsTNCK%fSZ25HGD1~Yn%d&SA= zK?Iv1n(P%3?RI*o=@8vM_oR_!y5F3>yIJv3Dlo?F&^<-{9^5Q1aWRVmEB=j3Ddmj^ ziG`d-zeIq@@?&#G+wIYRr74`--s>%Q4H-oB=o245emrlFc5h0h6H{f7&75yc*&5kk zDc|m)cqVL#CV{^uqccke;)seM9VWuQwx@!%6pSfQww4J06HpQ%xw3r!gsm7F?|x*i zFa`_vgudxh*L2@kLBM@f-7>B}hA%QxQHAsHu5?AwzB@nn935>o+;5JKPa%m}+wl{M zE`t-(ihG*PBG|=54A!d2{d#suZm;Px%JL0IDNvFSHl}Y?#&|4~-9M(?1wKiMriEXi&$qJxhjFgUeRy=+f=&PL!@&K( z_wqY8-iJq6z-zZ;*Q>2OJs;?nj`N7RQ8XQ5}Dk9~cBj?^>y0?vd43Bwc)L`wyLEOH7_9TRLBr z+fH2b+Fqp7Cfg6MqE<3F4!CVlFV9neV9|BIQQ);WtLk!`O)-PV%kV|YZ9N~5fVRES zGuBaH8HM;Z)KnTTE-x|7=FgkF^TGPxyLQKby}Q_W`y+Ju z7Z*N0i|Os*p*p?2gX`QBO+hSY+Bn*1S%FZn7oJb!^ff(Jl?yo}EIN_BgB9$5cZ3^L z0+}qG=ti=GBgq9VDd_QZ38gxtc8ME%Z{rKQ@PX`#v#W=S14o|j`;L1nU9a>R@Ro(* z9c7VnTKg3JdIAoXFJ3&gQOl^!n>^kgkj&}Nn5bQQ;?@|EKU+BL6jxRz$iNWBm^l(l zA4>tEX4?Vzn#BE@2{O;fo0WJyQf-}47_ysTPY>NXJ*2G)W-~|0(5k-^{&$-9#Hsb| zBw+QPF1FI91A@mu;<_=m8di{W1_i?m8I|d=t+5d9Chdfvtm3*ALYR_pC0LeaZ3ZK1 zMEUigNP7u-dS=ZGMKP9)SJXdUck6mSg5^=L#R95N<%{8}cgj6mq}-zdgQzTKDc*KI zf2z5|FFvx$%6chG zO0FBPfbRXEIPn7aG(%8_jGk+wErlQ0e_ZtmjoQBVHzQUAoJ+~yED)$C-pTwn(|5!w z#DWI!FhJGzff9Jn=L58|YU0v+CamY(`XO*xeApVIfFNNEoSY!4`SZB6-jd1KPkS#0cW-;IT! z65q|xl3!4m*$JQ649~f=*STD;iI!$DxHot2JKP821=M(^=sx~T|GRQu;9wl&AB#@& z1wth#NtCjqt;p2E1StVD^&2z&ng9twrB@brA@?c)>ot2DnjW~qjdm>it;tE_PI*z1EsJVX)Q zkOfAtBWmE0>S&LySqC)szsbmyp|EKXyW8DDEt!iqV+}pi}4qj7t1fUUhL0_Bt58ldFFRT z8zKmVD!w2}CLsr;0F~>6U=%yW@Q)ww$?ymSr|5U#uXGt#MA{SRD!T+$p+y-|FU*7a zzsL6>`YQWSeM5b7xCx8zbe>6^q`zl)+msP%F&JgmT&LUR)N(3fg-I>Vya!AMwU~7? zZL$vLAF7|?$jH#Uvu5IK(sRylK@2VyUKuVHQ5hjtI$xPG>M<_DaQEaQSdgU2_s@Yy zKZ~uk)%=Q;oU48QL{W^IRQmL&q=E#dKF{oUbo0@$C4BwNnBbVmnAn)anABLE%~-v~ zILYr5J-cmM)D0%Hb(n+{FmD+OCbCn627(wQ&yaa5NMwKHWSe@Xy;=jmlztbikfJcI zz@%uPm{OJG5&RdE5bUl^7t2_kxK z)9Dm9&g*@;H!#j_Kqx3EU|1cq->PA~n|OF%IuQa5{KvuORndyoOTJ+5A^`k;o8ZJ+!oVTjWIcLznFzpEn-@&TyogYeoa8{Objzg<=oGjOu+{!~vKxM*<(@)V###ALuv=pWa40!W^CU3+w;J1pfd_YUPC9 zk^yc^dj4j973~8VVJzQ2P|#lhsc|t0Qg#2(9#|Ftt=-@tcaHei_W$2cEY}V-V{KP! zUV2WM7>bC_m>K|^{=aSwU@jbQxiINrTyjbOHMgbJ_4a&^yo~VwA5QSRv|g#Pr2zen zx4u8!x)afrS^wDF>Co}(y`};&3LFzz_Ksp@W_A>7KIw9lm z)7}3f+5bHD7_9=P1i;jXma{?v67*V^fnYpt`fa$6F}wibZR6;m1G}3@tVWct{ikWs zKB#AbUjl(ZnmRf+i~Hn#+}yO^bn?b}aRq-MPsGdxdQ2%kG%p5no^FA)Xr%|XwzdN% z{1)#Ug_L!BxpK#VTape3U+DXFmI#>xHn^8Li~x}etN2)2=4fkcD_QF$NWD@!r_2g~ z#l@r`O=F#6>YtyzN=r&~QJ`OE`41zN5x5Mad@YoW)~uEj*K2Ppw9y|R9`mpi*o(Q! zN-@i^l$4YlM1!&S{=PZSJte7gZy<_S6!%~L`ekN0DL7wWUvHEIZL~ml3!joAF?-vo z1Jb}u&geVEuf0-n;wOTw^MFTtUHY9gRt(!#6b^^Ceg7U6tEbccf(V+!&a@*FhP4YD zPQhMES}Nd~MiolMjWZo9^Ol(tm~Y+9tAp*ZdZQ0E?>t|O>I;US%^96t6<3JN# zhVLtuGlF+*FcFjn{{D@zy4u9N&+2p>q_jjAMnX8_fc~7+9bzL&RW8;u_9PMxf8=B` zLv-ep#sMyCq(c#(3M?U8buhdY$N&HmK$TUXv)R$1pH6$2KgO>S!7R z)}@pTBvgtf0kSv7`z_t}tXN+yEG@*?7-YBzR|E9xO+2m4(zmP zbDd_|zKgL!*8?%T%+6hufgf9>t%{XnVOJQP`BN=qa6-cAW|iHZpQq9iqftx;;x%B z67wr=9v)3~#jaS0oSWr?cxHyE$*i9=&Wl1=fnLNH!$wHV0T%E71iIaJ#W=inQX%nY zraVTr%9#-#AAc>ss3TS*Ml@-a zs>Ab_g6x6}1N6%Kf2f16MNnaEj>acyjPa8vQ63oM5fMJ|dh|_ySO3B{$48ziE3)7# z28=2~mgr9;;XaP%$|0qBc~4Lv@*QJ%ZSgUL6hRlb>7Efh#mc9GOMy7qIHb9!s0=XS z@HO@ASk|Rgi+jEB+P_SNy|*~v;o&~8`QU>++@wu2mNIcKCT_HLfpVNMH@?;eb=(c~ z-K$xi9eu^Gq&JY7y1QV$HwSf!eLwv?r?9v%VQp%g)(hVx&k#Y~V>j#+wOn$%1Mf{l zfyi2Ird2WYrIarkc4tdfTNj0`tr>d$K>zu(a{GI$n?xjkje3mraQm7LUnRL^V~u6X zn1XmuT4Uz+(_<4>e@%xBG3=aF<`=2;T6r0_3AZ$Kg(D~R8c4!ZshIqkFImRfjUDo) zo#UmRsdmg(x^CyT6hJfx9Wsu;#~;`2QOs09kL4S-2TMB-X=K=<+XmJqDp|X0?n38{ z@NF4bA5#=Q;XHtsK(ApFYVo}>YmI(YzGjD!15@LZzITUAM($V|iI!w2i3R#ePn{4M8M)>14yYqa zPmvs!cxua4N2B@l9Tht**H88%75an6(>a|T$mU?#s)dK{nIzp*a=Xzyt;&C>2_gaT zWIWu()UB+ZDm;7>lj6@;T4YeF(qBHyJGe{;A+vF?(+b%cou|tO!v_@cp()oaCAhio zXyHa#Mc2N6O$8X2wZZ;dm{pp9)#ILUx!7jK$(Y9p{Do`h$o%o?^_rJNBZ#EVRRQmj zsVel&V@Uurw!~`m)a7PqG+MjLEv*jkUpgU23aUH4o0)W(tfw-|ZA25Og<1^5_cv;U zT_)nkXPpxKxE&7w)bAaDr&s&QCbjJ_Lnl}zbK9Gc3A(y1+UNBX zy?wrx*vUXhtJAstvknv0CQ6>fLY#OO0FbsOCCp*sS=l_8Iewax)k;pw)W>Z5nsn8Z zB+`#|cos{_+L{{6_VQU*FQRKXZLP+ISiTY)G^|ftgX>P0j>pb8wqiz2|l^% zJzaNOmQ2b-6K4A+Wb@CjU8fyp+=p0>+1HQDFkITY4Ry~Sy)7@JE@I+5U;5chw9|cP z5b;poJ8hS;d|INdN&X%b8p%W9zo9>V)MQe1n1_#?G*RuoBP>h44qH03^rOxY#Ix@- z;~V^A3fSH7GV31+ht|o}^mrnGNaUa>4-CuK`~iO?0*F zlG}6YD5}l+w2>$qLpg@TsN5CXp=bvYQ~Rk*@|L0! z`dFX%Jdpd+m?@hnC1-YL6cq|>Z82uH8XC3~nEOu4NmQ59Jlt06;2r z!=o&rS%GyiPPN1AA2(y%Lm{mL0*v@|080Y$2JZmnTuAE`2J1@S3B{m}*xDp_kpGyZ zs~;G9my|Z+eAFG(b~tQ}d0PAkzu_8>izzyeRCYIUEJPazm4;ySt5;wKbj@U zFD}+%u%OpTlyQ*wylC*rqrT%ohOCdSrJo?45p{{KyoHyv4!0xgZg#s%6kcs3LE8<% z+oB=s0i>r4+rlK*=a``?bYGZVx*oO)B`(=uSjLZPvDJFr;m0cji^ykO$e(&9>H(xQ zU2hdZ0#oQpxm1NQG;5O$x-6sD1esTf0SLD_saB&kMMIi^!`Ex`(TW*Q8cF5}LdrZS z(AHoY3Vm%XZ;f9LVT_s9u}(5(YeT_o8N2+$>)-TR4EpAz6`8R3a0wzHt<1jlCQsH> zR_NY=|CqGEq6k)CTwpqp$D9#j(U;*{>lAMWozx(cf{`=c?mRQ13)5qu*F&p5m@+!k zbn0v3S2x@338l=|SIKQ->;~woqQMAAsN~_+Aq~$KrHs)2m;{QKbf1~d@jc5^#JpjO z!eAZK8_5X{1`PERG1q8e8Um5*MbbSrd;2K0`J1_6z2<(OvyLqlSPLl4&M zWQyl@z<%s>d}>CNFUDEh8Sy5Eb5;G8UiF2POTjf+5q{!WfV=3w?aIPmVbl7s$SR&i z%OQ1^^Gzlbz{IKtr)!duu3ZI8z|ePv4zLO$#80$}_F%}cAiwQUcRs(s7C;>oBfeh; zJH6~phGw?MLb!gBpr}KbP~ck}c4)IERr4c!VS^;*GsA`om@Tfh>&zrF<0yBeaQAw@ z>@XZ0?&C>B003XvWbp&ZZi*WA7opecA_FL)T2d;YKzSfb50#lX8z{M)xaNmH;<8S3 z3GS|7UuUo$4P)bftb1pXfpF-@o7!o@tH#vc1ak)(>#rYAT@}-?$_n zDm26Quk2->lu1s`!q}u#Ov!y(H%!FgD;V81F~^CUbo?f@cdTmeCf)KWF*PyIAWzeV z@9)Lm)4NZ-mEKmtu9?IcbZ(Ow8lK*Yy#_e_R1Ayk-wOlCKZJ#ZczA~$tr<&auJF(Rdgpy0K%moSsQpAki; zR}|M%l0NdzF0(k3T3x0}CBDdzRWy#h>}Gp3P!Il8&T~XaYb=2Kka0=3zM7&TW<2Fs z$3Wmb0#P#le?oNAhkhg`ebw_qLCU4!tTG1Z?PeqEuXi8n6x&bX;&Z<+lO7MH`Q#u< zZtSQL&b^1oInU(1Q8^ui^gh%DXrQx$*N9Ax@%W*K{i5iFx1AQo0~Hv4xBPbn5JjYF zQdx^3VE0#0GNj*;^ECg23`S`9s%|7>)vTugEp|CwIXT&2Ht8Yn3=NHRet98<@l);z znR+1>kQ=@Z_5=%C;I3Zlvz}Q9Q_tQve>&tic7xa0%UvPUMA6mW~ZXhJD$Q@)I>_z7ediR*ry_XGI5vU|(vFAX^g(k2$fXXX5ZJ9)V%5faN3Atv` zJT$BW!^wViVoD-J%7J-H9jETlJs%$#x)co%X-Y8jPq&S$)egx5N)nPZT0M%U>a>#f ztAzkQxyi`Fp2qA);J92*O9lJa=V%B&zI|!$6Pph4vI02aYp3s|DxC0IGEQbIw8%VT zKk|6O+`%?kRV;uchy>-)EWq~mLO+=n8AkkyC`GZuYWa&%#{81v$mH5^+ayW)6wWK1 z@g-vyMd^cnFo~u<)+;}s{}xNRBc=Fyu+*Kt?>7VTPN12lt&ocux_KeZJa4(_>D zcUi{yi>&<5Q;3frR0cJyzfMj1KZt<2X*!r|C|C37t%cTvAc9f#jsADz!q!fWe_ZZ{ zW|PZt#iF{9%PtjR9*(YBuG_#ee)`}+n(j@|w<=PU*AZgB0?v{@F)$v_Zg{*xzM@_v z9t~tMuVG<9x0uh#5Vk3ns-iCt8E1MIl{9&U36W(YZ{5*>nOt+scq|6&DHkB&NXHzt z`@g-T%EE|0_)|)s3JMg`LK%^2QM+QT(1M@IwCu}CD(RA*Sr`P`jG12e{EMuu68x!? zu|Jrt(7}&-&OrThfJX2o-07~lkUOQkv{V%ugNdkTDV>!&uZ7yY)x_Y7d@9*gv@D&+6)h&+IegYUX#cg?7%B|)_5qO3`Bu)Cnfs{rTPKSp z-s@jDojW>^xYjC+eL9S-qRuP&OlKM*NXHT+_gz6JJ6!bpoRAI%i&is@FAcQu8=k$V z&ur0z^xs}&@N(a0vaJ`G;;*mM0IekME0*?G^LINuMcRjJ>kQ|^tdem+{utZn+AAx0 z)XweBNs{EokE$>AzpMHSvOfV}sXo3>i~((gR2HO{^{@LJ6~g*`>C{e&P|6j`N~AVR zR=wRkuR{B#w^}uBaS5vtayiK-JiR=yX@BJS7Og)`Jyazwr%yr0oYBsKkDM&zT{@5J zbWHyeY~U&xv-Md4*2!+9Id)f=zbI-hUQEXiV$-r+!D=y?;p@@vmR8o?W|gg?68$CB zdqB&Hyz)|i2$9Z=!yT1H4!V=xV$yn&{Os9aLqL{Tc&)7S_t_@qQACFX{!#;P{S*nc zZTi!Xo`fD5f!6?$I+$^P0{#p|E!8KLv@+b#J5Pyq~Q+e`B!G2osdscxuPJHUd3_5}7B z9kfLqQ6SviYa^@E+KCE77m0;!uN_-wfp}oO^K%YvqucIM#ipNxTkI*?1Z>KY1 zT07u2*ljK;%CjnHV2ZAZgo1i*TWTXoI};KaiH!uf#JG{QF&d)ZPlx+(rXT!eD!7!q zWtsrivP4~t>a>7mo09psSqbheeDMsa1ft%Q@RrV$O|UqG)Nl8EIizJzB^NSGE;7RJ>tgHsG7oQo)_6_L{|7Wb|H+u94guj0Pj;urzkK=fJ3z!)M+>$* zjB;&Hz5MQiyciZxhyn$=p^qT8@~D+6(KyyfM<)lq-}Lk2fuq(ZUnr!Q`M_dnC)e*T z10KO?3Bae9^1ziOzg}73X^5byHEQB7uq#%d0_hC}v^dql4EWtd27HVq>1NQ?JDUlh z;vZvuw5yRF793-Wdx1X|WEYl}l0{!BS)=>=F%a;|$;0e#oaX{m&%NNLQANfpMgHTqQ>`jr=TxhGK@UF$$eIcEltlC4l!96`wLvbH%V zs~}JE{SZ&m4=l`&d3ipF6Xe?E6i}rdDRd#0{aisVl_BI#iyQ!W{;|m3`=c z;V_0n@3=9yT|E8`VLOs?kSS8C(0^FBxGVi4c!n-R8{Q!&J;4W{p383N#zOJR*si!t zdi+oz^_TkoKDuwUK99;3+&7~A_S6O{GF~S`nH*s;Vrv53d68o4=LRm#jXk!P>*I;6 z8~S;2c6av#R-R_0vzRCERIkuX`=5C1FAmNF04R4=NcC*XO(A;FE#;$iKpbtA=mD~e zph8`-;5*ZT)Dsza!b_!@8@X-Dc7(`Ey`sw8BJ1H2!kr znq#zqt~ipV47to>K;VejqgJV44m2aS)7RWdS4lcJjV zF~*bd_H9lxhM@FOQderpcI?uWUsz+H+r={&<56v{xWU0sjy|uwrgU_4Y^B2JKE3g? z?gNK!uJwLyKk8&JgJR5g^sJ6Q?B7bXbohB`p|qo;W8iGTSl9Szx)euQ%^QtV+G{n2 zx>=^tjBY8X7lpa&k1awLhrQ znWuB{egFEZAmPH>4?MODg0PTFJtHo4Er3qzXmwO~xL4ra+mOm@CJHdSPqErMkL0)S z)>k$de}-+7v`II2eMRc(>hfTWL87>^B2@MVFGIfugudgmJ`QV&<>zZ-kR83TD=CMu z*usl>k<<=z4uMAw^L@6}m^i87*&cyTD5FQN4S5fN>!ml??>nGQ{6nawJ@9sY@~!EE z@KY)(s--K{4NgKr4sEi74>}7GzrQZD)}9lx@wYO%<}aiVBJdaOZGl}zCp`G&x%01jd1 zzPZE|WZN-_FbBC}OWjy=_PId<-;o#!YL0ci6fc9!BOkevXz)-Kl25!@zoFXklNx6j z!GLREd?P5`l}KfLWxgo*OaRni{$)+RkB{AGw>wZr{{Wg&j~O$uW2zxJK%8M~xHVhSN&oBIM8y9HS5*5QJ1g}n!aFC%y=NjzaY_yx_4+$1&L5l~ z_|;VM?!37OKJmC&QL0H=B5ZZ^J!8U_Ee4j5pI)YIoynjg2eqQ^9mbS zsY``B%sS2!0V1h{iE(74Ox*yhot=*_8wAn1^mmG2zn!Q`OrA4BbGLZAKLIVvfpev~!_a3l@%f4r1ZWiJpE`A8T#an&i+#5Y z?b2d#u;)o?Z@2xkh)SBw(c$Oku;DCnF9lBwMX!kRRFmla)^jF_4$rDwC$Or`#{Ghv z_=X`~sKavfn@Us5QPr`1&RM3%>}7hvBPXt$CD#jh-WB8>WgRT;s6T#V&_N4&#VlAj z^|$)O^RO%GU`u$le6Rn@Q~pbaHDsK#y!=0qv`$2eIUQv4^U_>EPg~^fMkh$~hx8J# zgdTx$Un2`czfU<_j1!t;LM+P+FIA`3b@qO+F+!)Q0JTGv<%Rv2Je5BZwGa25sAqbb zif2yv=&ISC6D7-3^?CK=qAj8XL{SRol*$zE|D|7Dt~%aQUD5wJI}@S28L_P_ULn#STU3kn{+|GK+5&t-&W%E8218;`6kA+sL%~E>oz6Y+uNer(9 z*?PJ%!Ckja!|6Amr#svs*JT$`Z_X9BnIEZs5!Z#R{dW(TmzjUvfGZGInVE8L=U72@U|M- zFOBAKy`$B=*OcKukTJgnHRmbMQklCfd&^o1!7Fg89nC)8vvf?gvo6~{JzRdsn#ajC zK|cK6e^iKPLLTEVE{B>*(jRa`n6v=rCrpE#0`<{eFUBQE%z(+VgK`XoLvDgV`?q9ahyoS~00Lnsh$HYC^L?9~b;gW^<1_w zV1G&ONVoLBQ#HX?Q=T;Wc2&`GJdSnarQP52!sJ5{w%1xyZH`_@^hAv30~G~!Wgy7L`e7shqH)>YMXdG z(F7_b^uU}|4hy;ppVA{2)R}1N`wID1ZgO1ueP7!3jNwYX7uO8mQE|Sn^=b#wzG0w= zZQ^lV<@E{a@+x&|y8pQ}9Ek=nL`Y|reHA}D!hh{DcQtCml6~`+}pr?Q?6{~zYp?1G365p*!h?S zAP9<-ia5f>O zX4dx(|C2Ndt`3D+MZ21rGI5`rVyMs#Upr?nXdyvdnjU)Zw>ibr(a^NTt9u37@>Jke zJL&+FAup_%X})aioBa41%TmH{?>bcF&z62y<;LuEgUlX{_zqbQ$~HlrSs0S^G8+)^AHIM-sw=fzY@CHZS-I^JPX$W^v>`#}i{rG6uD8kmu zIuoOQcmd(_3#y{T_Io*u=wrr6IF~TD4MTz)c0mI=DQVCQUfWI5Q6#r$ZteP?k#T9X zJlQG;`iv|3M_&SJcDF@4Wk$pNpSt=!8A|{8>MMufZ`@BA6~Aw@o9kJQ z`W$YuzGbt0rovVaI7f@ejh-giMZ4F_AnKoemnQy0Rk^Ff8unmjX|V7-X-#r6;}YgU z$#zh7%x3kTgAntfT|)+7*bn(|$Xj@Vi$9(Gme>YT?tgYceR%4Qb7vot2p2rT?+zvIBn)t*3K7ngUJUV3ggXB>1N77L#y zn^a{A-ffP+G63@Iy;YYyQ>AMw%W+P(7o%k6$qeOMooTFV@l0Sfs)zo~hd^HRji~W_ zN0%5>DVeFRqSEzxV*{elG~E`L#lSfXFQ2TYlm9HCR|7jjANJD|;0rFGgl?`Mi1HS; z#$T>u?#I=H_Q&lKPF2!)tH1A&ql$HYX9DbdtoeqwtW~O^JDa8w}XS=)$*;p&zvcL*Mrp=}7gpnn3JMl4Ufhy;q4zB|ILhs! z2*!_q_Oh|H)r8%9U*sc3;9W40LT9lQ#Qin$|AhGf5wM#8_8cY3suE^udUt{);^I45 zKvJikCd1Xz(jr`r1tbtWhw3~82gT*tUov6B*!+0)da(S!zY1FW1+y`*l$4c9jC^H1 z(C~iYap&_#L>=XONNH)Q9x#(dq0xoX%qbt9XmQO=i% zdQ4juGsjJ$ExZW0K9}U~^boCOQX2LTbF+wCwrp z6vmGzf{Rh_)^iUc8($-<7*QV)n{ViwHLadZMiRosOasulYFIVvu{WM$;u20&&qWNUK@RPO084rq z@t;ESc5xWTV((1V4cZ|r$Q^Are zMz#|bZ@&DVj#*=@yn3Y?o$48B$v+_brGCXKTr3Kt_m8{RCU1pC)Nv*7ndo*_K;%3Z zJ_;eoX;`Dn0mRFG^iAASrFR${4Cki4XuG^*IoMg6=fY=PQ#2HLJmqbjQBhHGOZc#M z>R5hkd|YS3>`9XW|jxj8=r9sl( zp1n)T@zQHcfrTwPX|7gv-z$fphXX>)rv`_H+J{!bGsLeSux!)tFLE8-jMLQD$yo|o zbz_9u534&>J9zJF_}jPUbn00-m|skK3sDpU`unyFyK4-tQ>^Jm3l zu!=3MNLN>L4<(EO^3_`p{?!jgjsOJ)N*VzTeml$Y4-CVrhoAa>kkfre=tF@JTJ;f( zcXOjrLUkp@g)zkTef!B8qk019v!yM8W8}Q($Y-BO zq53;_9Qv2-B9=0T^(`EpcQ0P2k99n30Toc^nEz6XzO0JsNHa3@1i_a2t7HY zXx~)Gq-=eC9#e-r=Y&3M?lp3-gy*d{j&ku6_@?G1aZE>n{Qvy?cq^`?Vl{vl7m}E!qbcV$dBa&O=m)Ac~krwW92z5-(jzA6Y8$ICAUI-L7azzVvx72;VLYK2t+s1K$9=7!fNvg`JbQ2&BDo~G6 z^;u;acbczUc6fP7pe3m9aM>^;tUvYnlOGJwwD1ParyquFk`3qQGsT6myJ}fqql8Br;bSVAjeE zJ#%#YmB3mZpK=RmIynOZ>UyQT@1#yOK*u%G)Og=pafq4oV2eV^kO~a1_cBmAIG!K& z87x@3aWu=vkm0AqSUG+nq6(p!v`2xs9#RwZ7ut!5l$I@G@!8}2wB20AN zb6FCb=<*V0_UOXg$0?v!TD6pBZim;T(Ril;S1Hsw$BakJS-A(8^+3djGun{L4#8#( zTP0Jyi5SQDAl&qOO~YLSBdzSE^|oyFJg@DWIhF;oiJwa6(RQC48KJHbKKJiuCv!#i z{@VJzO1wq+>R1KFb(w!9*^l)Y+^GtD!^K~B4M-HkW{_HNVY>zZ$w{UTnibfba{)wR zTQ7%B*0!9#$gKs96I`PvO@+qk*RAB(kirl1*LZ>`B+NX_uoFYpr?=jazPi~NU+6x) zg=C^=;3}e99)`XQtZs2pF_AZ7@!90I3@}^Is+#zJ2z%>*D4X^TR7D9z5KxguKoRMV z1r{toT0lfvT1s+hq(lYjZb9ji?vU>8lw4|Q7Iujhk0~GxuEg6)Mdd z?sW4324%Rz8@XKF8yw&;Od?U^UalWeQLym(gA6P>Un`vSwB1~;!zwWRB_B4Q?5 zQdhfV%`6#fA~E~(rBV4Ofb9Hvy{#VrZLEjWmnsTyyRL;e8&3{&X!utqWCct-idhl? z94A^+%ZQbGfLhi2XL%%tT~4isql@S}5UHGA3xJid$A$Dq3-1F5BP~Fyks(9+;Gh?W zuPY4<;r{qtNPzViS0r!BFCr*EI6x4QdQQ*+%B>xu^Fao>%nX&^-b_v*9;WOgs^6vOuF-PWJ17H<1%hu(j(Fz5n^`YiPHBNj%{1o>yW zA0-{6n#$-y01u1I#X1v;Qpd31|Gn)2wpvm4m$8>@@4%vV90*_c{~9VgdmCEIE}CU( z90G_it#x&szOvVi9=F{Ia-~?@vaxI6@g5cp<{G3Dz4V{DMkbjXg}1> zu>0|vB0F=Ri>-T3k$QL9bW?cx3N=3m6MW`ebBo8FPw>S0A*`5 z$;+{b5|{3l)St{#CypNW_UY-?2ztCpYaA$ zG38|kl!S7QRn1>NnjK|)tH#yYR5t%cSwnx-uy^Xsiczda$2K`eq6gr(J&Z^X@0QbS z6K$N_Ji)u30{}9`8+6WtQp2K>2`p|z3 zXYVme-N59SsHufG^og4=jQz zOHU`}1}e<}p@wg&LxWwTOY-vZN@MzWUYb?rLQ;K&xh%hCVy*gZ_^LIVRugjY8n0yT zTd${tW{N6p!X7l#=&r4`bEHf%CTc!x5j{0?qM;w4$~!&8_jfZ=Ewz0jq!J$+qi%G^ zRKM!SGzxz4#_ES`A<=8|n^nWKB+XO<@e1tvlRvs)gsw>k@q9a4QodFr9)2%5-Ca*V zsXWwy2ZzH3{w~1z{41Aw!-;s$K*uJkMQbKnquDx_hp+U(fDng3_|ynC*g?`V) z13aGAo3YU^znp5QJ3k?Nej=!^Gg6HjFZ$YSdo^nE2{1~*|E`tB7sQd$Di&+CRh`nl zFz*h(vCMWez31W`ce92}JM}O%>`}HT!)o*K)>U)s+>s;E_LhPMnm7gQmr5@ca|&iS zW&n3kP?N3w{dCUhD8B}ee`vSdEM&UOTvd(> zF$ShRV%y)_Bw!Da!o4=uo)7W(#Vwo7_gxM+W5m4t;x+@eRvPewfJPfWStS#0M5>}a z*6V>XEPccYm`_(Xo64|`Uv5KFTRNRBqHo^r5uU*Cf%)q%j!nY|nzS4NH39*4EqT9p zE^PkJxgL&QYSKI^*f2)5N)z9hhbgkm(=c+k31u?lAS%AXfP1_w4% z(GJAH=LU^#=P2xE%X0g5^2!KyrXJNC3GU7f!U(DuN%7}#Y;YU=nMhk-(o+OZK<6F} zN*aw6>3%Vq+}2$2gUnBb+*lf@61a7rD$x&Sm8r$ZQHmGUe6r%xvH0nVw6J#Es^R4P z5H!K6h0HIK-%Lr#qaxX$!gaW#DPy!}bf)e|b7ew2hWyJwnO-HeyWx)tg|Vs&f@H>% z%LJemc=ddT4C*?-Zk91Ut4kWU(AN(3k>%$IFMBW)O+-j1_gMbLzdv_Ta@Z)o3|YOG zZIIoN5+?@D9{^;MzjmcAVE4R0{AZJk+5P_wV#p?DFs2bdngsH}TU%R2 zR<+8n!a_a9^xyH6uJst*zn?}YgTHu)yvZ;>P0EXaV-;CR%wPZpAlW?!7^ngl6P@tS z^6`OoQ(itWEG)94ZdOMoUQhKk zaPVBo#xG3y+kr8%Jb$Ob!(_m?IYy6XGuqI2JrkhoN;oox$7?oJf;}(O+C4mj)h4_g z;9sVr=$tX))FI_83;^;4f=Blc?tuZ8-n~;m0b9bXEa~Sd&DG^OtOuXgx1b5eFy7Z^ zyBP;jp88kg#@gPDARC7brTnI@Aq;NpLeU|V=L8TK21)re?f zlkrb3S7(%~b-}6uH;VLHu1t$h#}%7^0aCV(ZhF#%aw?-m{to%JvjZ`|tTvQoU{RS{ zLU7ID+QIV?*TBQIfQJBE6{*?N@LD^d)pCZ4)c#8PFc$HTSw-JLmkF0!8ON=EO+>yX zlU*xZLTdrn2DaW`!(T8rH}CK7r|atTsLsfcBB!8G3Xo?XX`3>YUKn0%+Hn_sr)Ht8 zNXK5S$ut4Y4w10utBlERft8j;G)Zb}JTwifb+$dIIjd}Wz4#&nZFEe{4!li8PFf1D z??7o-g0yp4B9cEljr1%%PGU`wO)*dTl9Gcl`dBlo)J|;!J_&QTI8|D_5aoah^miHT8DgAG*C zuYFKj0-6WwqaA~C+m#UVS)$hNCO;)3_SVO-Y#o~Hq^+chY@sx@%1nMU8KLsIMP;j2 z3Q1Ub6fuwmTd7K(lFnheIMfqhrxAx@gpEj~=tju-DkVNRPJ$|pvxV;~oyDSTzO(Ku zcku!q$gP5nSM%9Dg8LR7@kE!aEm}CB9P3d782b^zfdDUC%6#XxlC<$NhO8O;BG6C! z?-qu>bfAYiVY3B>6T;b3TvMa<`Sa&NqoCl&I!XK~*||!5H0I$m=IN1VyTR7w+djl< z<#{ijzl9i?#4BoQ<`58U0XsSF^wJiMefO!cq@NiBoS^0Zlx>ej>usbZduBwe>TtV9 zp*h}u39hGBsa}*_oTis9#+XDRl`55r%rVUbNO#9p^MIoV0B3+%!(E0V0F#Z!;`Four%YH`t5^JSc#H_ z#xSk>^e_WyRw^Q!c(`|i%$&lU(wxei+I*(7D7IN7vdY>;48z3$4jhN)6WEtn$Cj#> z=CiUS?im%@T0)e;6xweiHrXvhNDm|y=0mi2x43^2Wlj5~zf+H3|MAN>(&cmoTg^J@ zu@h>NSNwXB9(R%2$cL%80R-s)frHh<-xI+eyk3pM0e^D5E|TYZM+eP8YoKar$8~m0 zhh)n7hv7RXKy_i>(xjh^SX3&q4{Rl|e6~>8F}@JLTb&z2iz8D(P-J61T5Q6Pz6=F3 z{5eG1cCFK+qQX^UI1wJHK5XA!8d2Oa_t@1s*U)&pllvz{U{&AupXO?!&1s8XefGU~ zk+swJZspK}<9*;;NmYXxjuE+m+33+)@{%3Y*}!?*JX;g(IwL>r^M(=J(yW z?p4-R+CCEnE&*R?kR4FbsfT%P#NDoXF?LWgd!S_ceTzj!2XgHJN{@HeR4W&={OEVq zJ=_G<1os6Tmc;-Ernh;@T&+ezwx#UY#3=CC$tF@Gox*PDvdA%5>;J6xPq1^_5rqvq zpl7r5{V9>-TkS*_4N-$(^RnwZ{##AsBwi0EBREs%x~?_sH2G}o?EJVmvMByJ>7NIJ zcV(0GODm22UMq%k5?DLLC?QP}YZu~fJTJEn+HEP%3REpLXYMVVw!GO4kpTXC^ zWn;czuHj<2n@L)G{$?c)iyp}jAMY};`2@;ME%IP?*3}^1ma-l$JaDO>{~d?LOL03} zBr11S4|EZEJF4yEF7|M|Sx@?Bc{LyKXBlMVLfEiEiNpu#(Wku>)jL_TQ#(?M67bRy z26ZPk{WUCbeXtYF z2`80Cz1&4cIylNR^W*Uw^U%rSa?KcL;vVS^Ti2h-GcLk(?&EF#l_aZ14T2bDsNy&m z8#^T;^6b^kU^HyMom0#BdH<7Sf5>(-DiC`SN@~~oOTef`(i(ig=Wn)=UuWy*8I<<465?`=a?J){r0FswapfP}VSMa;@R;JmpgF;QA%W5}= zZ!U+I1Q*U1XD2es>W6^zk~VIk#7|hH1Ztz-W0&Ol>*k?Q87P14;{?Z2?W?i5Q)C~4 z!23-g<8$SDFv7YLCjU?%hZQ*b{62%);sCrZpe0iq^tjiOu`_LSxJtZ(Iv>v`2nK}9 zP3^iauY4VaVCv=HHFI`8=u02J(JKWMVfATYD5Y~`(E6kK1hq`a{bchh=?KbSRY-k+ zz?`=~^usibJ}mfVRW9V3i&R1I+XQ(K(MI~mnUkeis$_jxs(O~h|2lLLH=mVCiHp#K z84W%=NJ}9v&w3*bi_l=6r+&gR;Xy}yY~^>qfnUU+Cv!enr6?S9`m+-8+edCoEc5}= z!AH_K;>3D?XZw0{P4}JWk%PO#xi4=&Beb@d>Ai&^wfs8+16XvMPFqIiv2TR6B!8$b z;;qp=Z;La|?0@n{Yjnsz5na_DpP`-67{6~2;U*sX1$C%ci&Sm|z9s*YRsBoZur}e4MmaN|gh8e@d*0 zPB72(f4KLI+Itex5{t`g6${;}kcGO8-wphDml5uH(NMG1nQ{R-ZS_eL-c1{(nQLx; z7t}9q@&~phYHx-o=4^xz`Z@M7;oSzEiZ+%_4ax|BZ& zX7!oy)!lmUW)bAN2$W&QEko%oeU*;NKV2{>+yecXSLDIVgnQDPH!M44Jgr|7-UCoE+PM!n4&KhtivqVf&tCq7o{?> zP?Qyt28@5k9z@S(+E}ijC~fWZB-KB}rGY9(w^ub=X1-=Hnz;H!3mC&Qdj5tj%2 zn9j$JqHtxcyoJ`pJs;V)4!NJ0TQNaIhvl&bv`QEpIr^z#YBW$9@5G_H@yh3iEF9m` z;9Mtm14ruE?pHh-mX7ZiiE@Xbb@HvF42#&rP0{!D*^ z6KaPRzQ-n}p_Z$mtRjhVt6(Pt4)B-8Ln`w0Bw6vuH*I3EIyrb0qSQFh;)Nj%wUvkl zTftUIfVx6?kH2P+1+*$!rhkM!KXMz?bFRLFJf9nBSEK*9fZ=(bPD#_R;-$BiL8K&( zz$^?%XyxF)6F<@HJ3})fH){rO1O6f3YUp(!B1Hg?Dc~=`&@<}p_^M{-&6BIK&3ch(;RWsx(zT_Za953D~Y-_V*zYl&oaI?(Q{@uaY|~-w8?VEyMBZ2ei8JIXSY# zDspE=jOESwFB$8y&pHaJtP8lHmZi^d%S_e5J9ZDUn{gC^CNTYZ$CA_>99&HNIfyVA z8$rwTT?VLbvxq=kE;VK(f3CjDe+|q58x%(i%IDmlo%fELdthoKH`)j)@) zwIE^Y0q5^`EZGX!@5u!l-Z#jRNk35V_t}T}zMXWPe`%qs~tSirujcjqjU995CCoett?sCC2;x`~VknwW?!Y2p6QO z9>meuYZ_jAt5Y6i?nzo)#OOo&RZdP2w+jykk*-)YOh zyx+h0KA}BM%VP7C-|pYP|1C4~t{^Uh+KE}` z46RJkscXyzotA+6nQ6d5#=BKpfqmR+YHu}C??_baLa3IzHVVRxOeyc_dN*-)So6}o z7xFl*K{k+M#HV+LbTuSn<4VTr|DNwus-uJZy}nR;1OJilrFrsC%{Ak1-oBMlD^gu; zHNWNa@z_c_)To-RHn&FWy648mMj^I{cc|G0pf=)FP9JalD*f1*iMhO=8~k71ot~;v zTd4dRT;bTSFl*duZDmpVD73ixt9);e24R8#|V9y7@x!=C%6g1#<4Lb&~eN2okB0_eq zW@K0fp;DK!pISZjQGNyq?Ca}Odv--!l5b0?GCe&bya0aju@MZ3&z-*3YR~CY4&8%& zF!m!qoMUk}dPO=s*3W0xJG;E;Z}WRaZsS~{HHrzns=u9l|E?-NK0$`KC9t#|Y7vAJ zg-9KbvNfmJy$6yXbG_Ji;+r@p)hnCXxjz)iRy4lKR!zV@Y+>b5(ge+zt$eDcV-Ur_ zJa?AGfHvqhm2nKP^W){%TpSw8l;e5G5PMjzc{3thO4}Ve;f=Kn`*1Q4kin!l5NE10 z9lVcRgEfOX1_M6{HVXFE$LTmdw^DPI33ldy+avd6Vqq~iLrjjR*`6iaWn@RkJYVAR zJo>x~X>hAQWy(ocHhOq`p_{hC)yVB|`cJXZcGm)niY>o=(ZqnU^&KA{e^y1k*pYc1 z{V#gljpY0yhf?Z zB#qMY@UJ4e6_m8uL8e3v<(6a_?(@nzpUkqypDjpI9eaQwDEB%IR3@}<7Re2qJ6D&4 zXWM!Ib)11Dcs5<9l?1~aWL`^CH4R%A-+_mTA@uTh&AKx5&}G9|hnX?pMw(Z{J@~!0 zb|z8^_s`s5?M)5u#J}Y)rtVz1>F;bH`z4bCyUK;1iJ|cPj~k*;eU}X>s=E%mi>z8a zL}dzQ{GGLnpf(9i+YK;~-+)<$Bm?TibjfeVe|>9iD&3(2D?Tv~-Y%cHCnjCeb*fOG+)1@sf^VoaBFjLXrzEdR1)zs_b$sfJ}^dKfS zO>VxVsHEF9-;X?Wd;U!2e#a@XGrgV8By(n#%9KoKG7d!8gM+4q{rbNBQs&6UBO?pl z$&*pGA}dQXvoN6TG5IpwQqJJEI?dv;Lec2uy}+rZ z5#}u}D|@wHss$vh2Z7gf0BZOjcmOD9obt2$$uHbN_V>RqP>4}~y=11KBhw5$tW7T) zw-Y|FcngeoR8^Qlletw4?TN5-?;iwyaDwO#tl57B=B3uZO6@Ps6X;zf!JQAVUvDad zl=2ga_pWVrnoK%7IW3uCw9PW^Mxpy1dYh@(ZcuM=H%c|^;R^1uNQ&?lfeZtlLp`q`qOMDybk^k&61jQ(B;)4f1 zKx)&~lvMXIy`J^0%*_T~n{MSXe`irJwgf!~*`7aqXgsN><19O$o5tVO1iYk3Y6`dI zPbcz}P&I|6emE;SMDE2)=dq9bD8{9NGp0+9i; zMB7dAx4;^7r0yVfqD(#xHMxi6K8b&Nrn>36Q`@49Ndox^2#R+FX-U9C4&qE7qYIY- zT9>xl|N6TD;B3QCQa2clF$~oO)npK55T4hi-C&LvWu#pH@zZ(qyOV@Y$zPC^zkVCS z1TMJ*_eEurRV{W2e>^Z4JRB>tJhGKUOb=81DG%E&ns7d7;2p4mPQAhIy?yffez&`9 zlm5x)QBNS12sI=6W-mVPcHL|f_`6rw_;(Y5%**d>$!}`7 z_@3Fp>t#|bzkXo6Ay?hUib@zB*i-{geT+_W!xOkik4}R&@}yMrara)l0T1et-sFsC z%1pz|6L75@vwuPNKqinXULUHo-MR;ilEd&n?;@@`SUTl5oMdvQub@>;=c+d^`J)4n z((~jnu;KnQrhycZdkTE>;G^AWo>~i48gdJy7KnypF@32(iXzq!7dD;CpE9^XQEeFx zFN#U$6V3-TsKa3YZrRy}Q*EN~keNB$ik*o`s!}|9_zu~_D{_JoGcbZM3*fPDzkl!O zSAkHuk+dI+1B5Pe?L(^Ub6Hv0abzHH&HMR{f8yzVph6w@;X?qDV|!T&6%a8m9XQ>^ zbuzPj>F*r3`7B=K6EJs~McM!$(O?n;768M$(v9NaOt*sRd@{#vk!bf=KAb6IEkq=(Vqpv>v?!4^4&a3%P+1BYob(Y0+J6;ILk|Fw zzT0r@23=?~RcEm#$Gn?t;6@i9Hz*l(s*A5~PIRadAG<{XM^#L31PGmjD0R3WKmtY> z3>4;k3{krG6+@E|NkqjT~nEyHXfyol{b+#~j&RSUkPW^RO54Y!V)M0ao03@aPmu>6as4&`}})t`Sl zxgl6_*xzCAluUtD9M%<1%F_b;-wL+0 z9XaWs>J#^*XDEW?A3ifMqq|SnC|2qR9LJO;zialEX=6=5ROb_T1nPDig1-Oj;|Cp+ zi{;2-i*gIAN*4D#iOko&1#c8y+yyoa3)T%$*=NQIh{=9dpuuCqDA!~Y@t^CGqzwfz zZK*#3){|+MfPOl^T8$sU1YoiA$w%!Cmk^uIDKc=xwk?m(u7kM+bxBu2W^HA{Xq55fj`vST*8LHJ%r zQ(1uot*Y8_6%M<$`=cygo);Ms{VPK|_UCuHn}{6OyF`yrAs>(7i)?BR`Pu;y{;hSA zZqZ3D_y2t0TA&a2&nD=@gSX^?*EfS_ELMU=mVJRsB4&QZU_lJB>w^3&{CQ30H!AgKzW*An+_vI=ASA7#p~W`ecd>~uOhrYS z;cX0P+-Y{PsFdJT1)NH$<=4tnyJrCluv!Dn$ZVl@GnZGmk1FB*!`}-k47*9qe6O6u z>!N?q=zRX6trH^M&aAKV?m=oN6@A|fF#+ANQ~pyhQUBfFlw;d(O+B+v4|F>0b**$# zF&s5L0#jtD67$1{-+*7fQ@*M3+nuNOL}Ld#=~libo{pZMtbs)wO2bW?bgd=1nQH8i z&2FL%$G)UviIC!3Dii;{6>z6c(<;4YJ}FBA_qF%H^7Wj1!@GD1r_f=JGr^^kF-?mb zArMp#`+ya@V?1LoSW#;&tGIU|nRk378PU{-0(4PO#jT7&`ThqRGoBVsj}?jmWsgRd zg%-Yw^wqaHyhx9ozAiHQi)a!PPxoWRDc_4p>{(uZwuZsyg6o}lOJ27nMgj>MJ6{NX zsT|oD(tk12&&`xaX%RGIZWiL6Fq6F0Yzc71=jwAa)?Cp$7cmteEJn_QF{)ex^tmnHkdhan*$#CG|MO=Xqld0&(7Mm=`z^Z|QjhZDWp+}`=2)E} z3+lWb;&MF3#8md;0oX(jL_b|Vwig+7oYIml*$b9Tb9v2a*U?6kiwN0d*$gv6B0Hov z30vWZ#o|KeC|=rh*X5XvOZSh~%WbUhzs2sdCt~j((qg-*5!}CXbqM4Lh{$3awJR{j z2@aUkCE#7QnSy?5_0h8AE}0Q*%1BLp#->^EjhLjVPk@wnXUWa(Sw#(dV>ulC%E+e3 z@~La^`*gsz$5Dq{DvmtCQUCpx$zWjdybh3*IgjyH?+TH-h<63+)fY|H;?ctDrS;v> zn%-6)?$*%`+~nOBV4VTg<|Qlzp+}Lv!1wVEQQSdGMW3TizKvX{RM__&dX){5f{ntv zKyz7otheXfc%?Bfp8UlO>o%dAF=++M6q{}Ady|%nyacpnIE6DC2aFni$HYbvtqnv%m3i6Ub@95rCABcWgsosSp?ks0g(FrDY`nWGVsSK3kO6E_ zr5UnIYrQzHUzMJAD=vaBH zGFP!F?p(QpjW5PQc*lQlHZq`h@vB)j*4j@oqnLwyim;!lR%*^~#N5g=)N9J}AB#L# z+wf~x{iVi$KK!8>W&YFfLc<_ih`1%>0IP<3`J!qripw?0M<{BBY!#li>Hi1j#G3qi zIdubN5D!Gp;$*R{X1*2#_Hv=gg(@6ZoqD|-#dO4;8^NuA5SYtx^hT$OXyadH6K>uR zrWu=HJ|h!(*RUa?4af(P6j5J#Zi5%q&G)q{aiA#0!mzScb>%xQdh*~z zj@ThCpR{A6gK`Nr!8r4 z`eM#{p)QYZo(=FW3hJX*71d5p-@CZ0L^`G4W_vE$#l%b261Z_`t*IVsb=Giv_q%Lk z!vWZ2+%JcB?7@~)2MB)e-len(*E$FF_N_*O`MdAA8NayS>*Z@x=f%l^BNBye{;IQX zHte}Thj>);6AF8qqz;&u=QRy0=e7wxTMXR2Fot!qBlYyaQaXxoAz4$PLkzxSzCo8b z;T>A;VuV||*>u~&;PUoAtA(SK)NYw@S*2=CZKl#vbv(NJ1tHzi`(?^KTOHbVWw;$& zyqgkktYG+JMDVVEAT@zbojHzVQ zUtZD07`+@Jmmxsih+a5JPPb~^;w3(Me>BlyQ2DL|uTV26y37@GA6@3PJ@+~b_EDXi zUD>gTe>x$K7cI_8K{CEY-asaKg2rZGR##Vl;e3oERAAFkw{Un64dZ z!iIv^wuh(gJ_SRD-YG$Rty1ZXb;ZwewnA!LVw&X|YlL*^NqoS&+!i%LXu%TDYNCc3 z<>l^yFw&Orab0)fa8OcvrnWqW1?jf9=nEh5Of5pBBBupM;dSQSTRe{@cRB^d%CFcC z4Gq1lRzEuQ8~G$Ta?niQic0t@HFj#{du;eP^q3~Fv>hyyLsX_tzVkDNT%r3H#YZ6S ziQVr`ORA4gyLwR$i)+35PRxQP`mq^zVOQa}eW}a)2b(O@r-7t=)f5){KY+0Qe{foB z*Q_E5g(a7}Sw#v~LdSMTF)~*SBHk*wC$OVapYLQ?2NfA$E4#%+z=M{^P5tfyP}lP` zhQA0)#N_R-TU7XVffOi=iH-rcu2GNR7^~8Gi1*H+^H%#u|A3aB9pXnnuK=wfrt0y) zQMwMiVpRh(XPmey3FYY+;_>}8RH9)rDfK=%fi+`{G!g!d5#TTW^nUSpAo7H5k=k$t z`NDN-fA@NtyW%OPIxnULAm1!Fj}}SkR{2uXG93~f9{mL3DGWY`$H17?JUGC->6)YZ zIVM-r;ko1;;93yGnn=)|(>HSri-5K7Ac4fhk);DPx(0KQByM;sN$)aEcFb7Z!ii3hyWY( z)1pENU{F91O~boul$@GygBbrd^?Kw4KftLmy#W>hEpB(t31^&wbvRY>cD~uzxe-#srkQ@JJ8EZoj(>x7hgoJAOVs z*-2!7GDya(zleAD6jQc0g))jocC

V z$I1*lxnj?w!|}O~EN41mZ-oq@B)V*i&gYpYczy~0R289dJL20PUG!GMOlZ9qKEwpz zIj)>sv};~t2W!PAuIb>n3LE#_6B;~P;$=Fa`#t z-z}2R?1JUv{Nsj!jaBx$jb}V|^6437m<@!)!Cw1O`d5%Q5h=2~eW7@Il=;ydXfzfy z+u)!UBkVMRMF{$mz@v|~VE5q&T8zg!c9F*xrll^gell5jf4gS*dk8nzGaUdiUT_)+ zfm7-Bzss;i$r2TVa|Xt*OD5-ClRphf{jk7xiJ|z1B?*-N)sJ!UuF*(L@Kdku0cuLbKwxiQBWGKWW)ybjE!Pv86d zrfiqda{L3_HX6(zJE?S9bY`xI@?d1Z!#t}0Ycz`w0ON#2%r?@$(PBReu@nEt$xZ61 z`<89>v4u@#1?lmrXYeR7?A)LPwCK!M{u5l$9SW4y2zO5T;5rTAh9o+m9R3`oEtQ&3 zwG#9vkGhtq1=SYaD5R7&&+Xe<^8Od%X@RQm%Q|P3{QPw~3K^07(-s6_Gv6M&C+R!^ zxBydN^%%b?N<*l(eIx@UKbR-g8I9SbhR{=f#Jl^+WIX{h?}x{Z*mb1cs+zxf_sbR{TTNy-!wCMuqCTs>os4T!F^8e1@1qKwSksDK^fsX;!8E1chEvzeA>I&vlO7n zEEFKkBMu?_^8&-_9=`;{mM$5xyZYU<=+hxm`U1q|#C(|@lHbUE2#*nnw3$iR3|b`+ z)%bKHgc{>@1IQ%{HJa7^yaHsPS$+KHx)_>z;VMVJR^Q7=00q-04!PMw4UhD}apZh>m%(P%Bn#xk-le(8EiSf$_4^ldWRn#emCQm0 zAp_%hcgdDyx}uKJyHo&8M8Kkn9ifgB~hNuRtD2h z9(w8PNtQJ0POXjbOgNQ(}oa5F!NgM*OFHM6G@T3CJ)5&7(&oJRwbKh zl9k_MajgHpNxMBugb=4ivHC%#HJLEWKj}-1A5l|AzJp}Ypieup=QEP>{_a6Z2Rq+E z#&nVy z3?`?@Z_`hDTx&o$6id5Rr7B@RyRXUX|IU}jDvObR(yGAvi70=qW(4JmDlFD&029yy z(yTGFIE*i$txf7#tDdnf*W2N9ac+EoyF-O$w{l?FD%r1VK(de3EmS@fOH?DE%)xNB z5nMuc(ig8@p;_#|Ek|!5;PEBQ>~0w=pvez@`NyS3RsjYp^{t z^Lt^X#50MOH_wS*p1KEi3$vOxe|K^_voUz*A}QqS>_Qg&)#B40O!}#2jxFfjQ5OA(hCZ?qNV^VItebAM0(z^)`e$N=GimpG?n5a!mHm9XC7*gRzC7T^cL}OmA z0l-?XixEaRC1vMksYbJgK$U@s7qmN3B+qu!QPfSC3b^T&K1sYX6LPc8#JXsmt z+=g{Og}}L;&rzQjt^q`>d(|&joj5q*Zi2ZKjyMP6V?+XcX1IiGX=6F4Poe+D=u zLD=_O;hb067=u<`1Hes$b3a(5+PWsfNTdI$f9~UZY(B`PK%B&ZOzvx5~A_AD^5AhMKJ;P7ge&ogAbBu19}hoJw+7 zym^bFIShvJPB}LUsQICYjg6gJsF7+V4|`TWMp|4{B&ntr4chIJYJ-h{`V*pco)5H= z4e4%O0de$So_wpUwGPSrUBPtW&*yS$fDnfClB}#)3{(lieYDiI*-uVRfY7P4A}!Xz zm1tP|wIZl9ftf&v6?<0Z3*Z#sZMG*d=80*fhgr1*PEoS#^F-Iz*Qc*-0)GdFE?`SA zGB_lqtv$21Vq%jdv#evX=9hm;tPutJv68R@&bR0!GT7b~rwTBl+W3K}A|L=`YhZH!V{>&{ z8e>{B)mj=j{RudI3Om>j9$|*Wa#*ss4mBD0-3nuCvt|!-SCJL$0Kll;`X6Y?VS9Md zh`<62$@2N5`RVED-!~d#m6w?jj6fjb$YI&Fe+M6d1#5xX3bYIT9!Gs~RxR2-1F`dq z(%L5#Eb2eF4lLX-!hZ;CBO+$}_0W;X;Z{lkFh2P4J;P?j+M{Z}TXY{nw~d!+#lN`_ z#5$jgrXxo#WTa<@ZB8CO29614>r}Ab19vwx`}e{u<^#E<*2ao=ti!__c^S21Vw)rU z2Q_5QjIL3x++q-i-3{0R1jv}bsMMmmEYl*t`z3*(SER%kK%E`#F*aAj;F&CfT(oB= z+kDYT`$VVsYZ#rcT@A>e7g!W$h)?BLnOuE;v58NBkB6@vXxe21s?O@;O6toK_9^(= zHr>u}?1rGgtEUoDQu9Emxw^*Yk%d#aG=}lDE=rM`R`6k{Y`*ZpV~S;v5cC$Z%})Vx z2Vz~h%WRJdFm?zAE7Hus&PTr8!^^k-RHUzLnHsDE;n@^OKvx{RaKw)+mBYb51{6-c z&+8r+FMqhz%&X!FIt2Z|#F)R|d@^t(J{ap(i^{A!zicMa%lQEBi^MpF+TQO!v723O zAxm-LM2y`J1wKi`9(KC%(R5SIuI@)aR!-qA(qFUg4`Bv@+^|2otzD6qRtQImyQYRA z*>>zg8g(ru-3SM}1h_kZAaJjC3tA8Wjn%p^X|=5R9F1{apmE>Ha4NWkO{j@|dNvF1 zTiJO(R=;*eT9f}O_pJN+bFY&ec`vs=Axr5@!WOfxEWUChdF9{jUQ*o_t~Ev#=U&6^ zfNWMsoZ8M{wSObMbxVFkngWT5+HiA5(hRPQqG_C-jE|3};YgaV7T^A{)zp-`=AlK{ zzT}1;>#Yw}!GvyP?)&k}NRux-(!h;koDBvr!R~ z)$UravAdo$Q^~W)jD-H0L?x%DrouOiB}J$>HJe?5?P@s3yER{1Ok`Ac{Pic5M?>+o zp)JcIIpd`Wz4QDf)a22%=WFZSy)P+J(6~A?cj>bx20`FYu%!#ydEg{?jCxzuD4itQ zBrLE7%hfAa{jBc&Bsvo3Jmp?fBp)%9m#wlmSMS;@W~v@J5SE-!!7lD-VE}i#xp}vj zR6Gre(}_B_hdA*$Nk!-Ru(K(muXiX~sMvi$#gUYXlInSm#=-ULfTcfj!y60Ec*OjsH^7j@E^}gWX8GZunv!YJ2V3QRVtXv%wdFZnu`-29?uT36YOYR7(&=uV+VD?g)j@FPun#0nC5cdr z?j*#Jd$?UNv;a|1nKNa})3xb!tXn#d>=Hc=c9zjJ$~cS0Xk+z5N>qA#dc8Qh|3vXm zG*#LJ{psn>NNxB*`Y+Cj9Fj1<9bMZ#WKvg3PO=spdhS8N4&PYwTXkbM9nMuroM_fajL2zQ__6?ykoFt9AsVHfhbTP5S3iSqoh34 zp%gn&q^?maG#ezH_=v^1*8$YUCx(+WE|$TjGxPWiWJ^xZdQB!lj|wbo9@Pk_*(i#J zzu^17!3Ux#z^sF{w7%C=->I(gf~Op1{9SBne6iS43)PN%C`Z1?F?_zCbcztp&?kNzk?X~dMC!e=-4x|YreWZUM+F`CMlcNi`QX3wTpe-cvuhEG%lqR3RzB- z$J$-!?xyNMIM3?tS6^D5ojq`*)Cr3S&#>O($^rK4rTg%a(Z~lW--}gP2e2U6QZ_V~ zmffyFY&a3XD#4oSrn$#@B@m#~;gi8c0;h5kw~0B>(ZUoT?NcVg+{{aur@Dily!YTo za6;meuObA0;UWfRSx(G$JaaAxd>drYWwgt2OQ+SIga z^MGQ_VQs@T)8&qmaXw}9ZTBBkQ&S?VkjsWp-P~)NEo(St2Itf1(sBS5^s9eiE<0+2 zF(D=owSzFGd3#`I=$_j1D4FuO$>lsEZD6f#b4OvX?&^dl@doMF`hqq%dK+cvy5;fh zO3718l5wKW8qo)0RCZ*BGx%F0lM)U&$j22?epYAW5fhr|I0p~!28~V-$8wqt-ACv@ zN>rQ=sNa!?IBVkkd=mV{F%@35uag0h7SUO$vtGy1FXwo$g0z#!gt&>^h{D3n% zCgyuIT^8!vhwms7&Q4nDGOEl6TKTz%b6d$xK45lv`${PP=M2>g01Fu?fU<8$Y zq1!lDvsO_SZ_uewyZ3CKdmD98oeb9o8kN}K{5q`6H7b)`F4%5ugmg4(+t8Go&uD7{OmhAvRu46tKF3G`~?cF&&{>T|iAzE(apNqml*RT~#?l6V>p zn-SIfLL2_lTT%O;k&f$unX7MAB&uah?bIyR`tst(pupxSIMYg=ukuPEK_TI8!m19L z?L1-~s`K*Y(2u^HFmL(!d#o2gdBgUYrWd-VjtJD=P0Uia$t_Mu;Kb4!=W z%mCdR2#hsHV1$&)ySwgZEzk_R3o=J_PQXSg!zt#(yPPeIDvu^V+@tQyq31|YW$8Lp z!RY4u5&z0*>MztJdqNjdxo_XTC9PIuHh8^*c)zOx+ERa^hF^bsZH;0O{VHJQD7tW2 z`3)|P*xzGQ5|F&)D%BH7*n4rlvR44>k5%5&ityM601mDRC7LHl2~g^g!wNY1zRjg| zN0KusW_CAP_^CNTguq6X^*;HKBOSe`U=s7o+U%vM1VNbAb^oZekAmOtT(*_?96*kB zX^B5s3B5SQoedOe*BXQT&pS_t`N@V6jwhc`cJ#uWz&da=fTq79F!! zg}AR!Mt!{^j*GiMykGrouRGYcZPYyk1!hH91||a5iH3d5ZV<5LS378GyvQt&IE-WK zFc)!Bkm6~XqV(_d;HSadl~$zGF=1=*b~Moq+n zlzE{RpPk-$0v<^*{Yyq@o(Uq3t{C$G$dYnHQLfX`t)a6hn@Kz8MrY%|r~)L$3dXIa zT$Dt_h;pqSrK7PexQBCLO}sokG$Fy$?cxT7l5tWkn$SIlZAK$f6kaE60sPKPZC%X` z;HNj7I*CANS;k7w3Tk|*EQLU(MjYiF*qAy+(SmMBS#}=RoXH?S2>9XkT#q zQLMd7cid{^b2=biTY!*pBb+YS$~o3wfOk6R3Vn%g^PKMNHEJmH%1|l@a%7W+vHz0RC+Gph)hj-p{N($5? zv6Q|PD3rD&L@}1mY{#y2#;W1qGnCzUo>(zt^Qz077Y9egp)0mlr&-g$c9OD1pDv1~Ygnv%nu;WFWo@3=Q`C#UMwERu1(tVUe%a$`Bs z41#Uf2@sm$sOG3W>%8{B9&3v{4aOcGv3#YMqP#GzhH zEO63`HS~KD9LY`zrH+xeik$bN(@6?umn!lt;ugILI)aOvwFgIDVwaHEwQw=8ik|K* z_@6dpvx}ybDk1ICv3m2mTrJ-WUqlaVbhD>RgJpWmEt>*OzTbc@bHhstfXmyBN#48% zX`v!~`_tn>`%iA3ck#Eo+jUNbFRdXT0YjX1OEaRLB6?eBxD3FMT^Q$^HGOx7-`?Q} z-hOBs;j=_464AB#&XheWbeH%6Zdv%S0nd^5GIR_gA8_A=Fp#?2r(e{@ujMi8nisxrAZKl$#EGEb9azLu20sfi7pdfq-|wcnKsb+BG^`cynK~1eW4>^k5;5;_^bqH46hPp1Q5rMJJ&D z4%yTnDqG`N&e039H7t$;Tg{)%8r%rfLBMZg9YYY5m))$1vs7zJY+`~5MFXZ{+Vebc zj6F~IHRXo4labX>+EPnf_d~k7_XRnDabb{FG3YIrfg6q|^@?7iBF&vU1DR}U&V@4U zME3Imy^S43YhW9s%b+$Nz{nMQ@_C@jnR5eIF1#H3cS8s8BQY6fw8o;Zm@wIBA-|HX zYHx#s`hT;w%Cw0|FqtO$Bm?<5=zJY>3%zAMMC*B0fu)mIFY(%4FL0Uy;7Wocb5FVF zjsjbIBJ-i^Tq(K!3r>&(J%6*vX3Oi`G)SOn<7EjeF&V5DD#)AY?|f(cLtArB<-*+J z7;G$t`2mSR&#iUS_1&b<&sH$*rE$zfa-3Q$-oKi5PM8E3>_;bZq+r2fvB=K0(X@r` z;jd5fi-p{zZ}Qgt0|MHXs4OL&eM{MaKXm}53K3x~1u{5TQLWs*5TrjhO-s=TT$tEB z5HCl5OfmS1Euqzv%!K1TKK)ar}}Nr=;pOZG(QAj z|GJW6H&D6%^q+)Zv3ZNT(IvrVXD<|9(V+=kZ#ajG5|i10Q({bk!oN%mS@KQ|4)M2R z?1PY#XWmt1Z*pwy_ZuTLu*|R6Ijoi&0M{2(725t`4U|?!=<19zojRA zGZksveJ8H3s`CNXo$Al9!@qvjRYR_t_790v?%Jg!XTB`ibYj+J3}X@JL4e$|0*6JR zeg4p!5n*lpDr@1Kmgc6BPdkPsm3DcRaC97A=1kB{Xt^t)QYw@|B|snIxb=xZzclCv zjG0~WvdDY@x}b8O)zBEcIbj<*e1?JA*}I|w3bHL@T#VoZpHRKF@6maAz!28&Lzq=7{xi0(R3b6=fm^A-g+Gwaq`A`=N>$+qVau;v zCUE~BG(jd$n{Ralc|5&;n@R+KqCs91sz=L-bDR6I=T40J#31?t)*?ueVpvz^Nv9~# z;HYX%HXq%oXx|b&JS?;ZxlB7hrSsDFb;t;=#_dmI9D$N<3~E{~1KXk`{|su|NiHGa zhY}%<#VYF6qxI1OnKkkMH1-xyQAgeWw~7){0xI1I3P>s4ozh5`l(gi~A*di73P^WI zcXxwy2?NNG5<_>rXV53_UGM+8mM+nmU!Jq~KKtzb`9i)s8NKWBMO*hcgNj!ur0O=B zMjad;4zj>kn#~&L0ZXdLYMRp9<(G5?DNr&~-=4o-yK{Q!8P>NrSWmz{Mmp+d_VHo| zr7OGXJA!U>2sC5KDe(f2gJc;@trh{mr3a-;AWn(-mnqQ>Ka+NY@E1<}l{bJo7sa~m zz`7NXGH8dgW@eS8gQws}C%{YIA1(jR=cGUF(faY{`K|Z(uk(=9s>*~;whBTjqaGF2 z^HBF$RA#zZi>9CBGgB~xZSrAhIn9h*4o<&~pZ7iy9p^OUUuD))O13+~e`2D7jV+Sw zUs{|vrDsLg>o_BSfd!qWn1dV*a|KMyVD-}HEdxGMn&+U;I(n*Tif4L}NZivIsM#&= zUAMPy+bhW1%!cqO6~8i}R9nC3Rlhv`mlT~_LV-B>=f;QSPGYLp%JWlv)S9UqnN5~H zTN+KS*{fyM9LntQ0OKF$K*tZH4?!%%x>zWDbg3R+Yli910m62?kCn zRBd^f&w~k(vqGg-56-*W*Keb5$F2F^^Lxgl<#;p|mYl?czvaUz4=K0Z7KFK`kPwh4 zlkx{M3WT)~`cd~@o-7f%((uHlP#h8lfJGQHraVj1}q&E-1wuR2!Gw| z(dN&4P6C)Ui^1e)d2&x4yks$`4N1oqe*CBja?v^NYjyznK6poMa^idAy}o|W?M&-i zW!}4Yy!N?@@4dy&AwI(+__>4{jjt#XXDqeo#3r!&x;~gV;V`em}||(&`nYp zqXUoi=j#-{_%+@=6pweb^jjN!=M(1nUWnZPr~te1_SuXPd7|o!hN^O5@A&!c%R9Yd zKUV~hKsC$dB~sxf0n0g>ZYam_fVGW9=#^KR=j5E{Csk6w(ioJGAXSMUf*9yDN6UJ| zOkqzYnFl8B`;gV8uIK@T`&)I(h~Zly19QFGGtY4B)eFKPoC~`H+Jqo~4$~Yt&xXsUz;0D5_=|m&t9vJ=ju1)dOpz zz1aloO%@pn-gNLS0t^xl18<>is3a}H63)}r@h#gN-^qOZcjqKujdnhv`J4)Ud}Y=b zNl6O4yv;o{zffPiIG2mC;}wf3C;y&mBu|c^gn?|<5ZRC?KI!llO9g7|!N*QhlcZse5`j$x0;7X{F6tD)Ft>H5Z|4|m z*3IWOe_kinnM;+m#Jtb;NKKgI{N0_O?8A*j^PE|~Qnxw0k(#$0{wW5?9wpwxqP|#& zN`c#M`3zgh0vX$cM+gpX2@E zo6B6FljnS=%mZ^eUYk6`52O5dm;nC^PZq5G%aK0f@s`jU?*pF1^ARX=l@~_r^@cq# z=1Ky%6kUScnFBNin=ChvOX}_dWA@B8ttJqzF9RA0H1j>Rb|~F0ztQD)y|WZV>lRG% zFyx-gTfS)?CVeUN7$uwAmrNrZO*H#=0=-{NB*96)ud+AUIZ5KngR}ByS>ICHBwZ=y zn$vjSHF!lQ_p1_O7^BBD@+B6uTs51&uDX54j6`p)_!Z{Y93y~h`^Fr4X0+3M)q3ka zkI{pc+?lMeuPd%u9vFJAI2-s#d_+TfuD2M9dJ?q9s{iP)hJ>GDXx{_ptht2|#VL$a z4SD{919_^BXUco)R!72w^-0J?^Kmd+mi(fA9oP6pqoHxJ=gL)ki!7+{s7vn?{Tasr zCmPK#8!e8H9c6^Q`XZcs7w@yP#Sxmu$3a(f;(@DsYmF$r$0gTFv{z(q+p+YrMn`@^ zRnL9E14lvYBBSaXuP!^|nKQpsQRO&Z+DDk^$D?(Je{U>UfDGNB+)XIwjA zs_(aS|Lox(K}F9eSrlpc$6=oNS+r?Z;Ent$y3tY0sk1IiqF;nco)jKIt+vMvt8gF(<4HQW=g&iyFImD_JC@Io!q7 zx(WAH0#trXi`}_Atz2sqUk%L~Q?7&nnj)uG(iJNV4n_gW={NQ+8WX*Ms@59J7;QV`@`|Ngk z!}J;QxIjOzVcaDJ%ei2_`Z1}gx1nCb=gNB5E;7*4Vp_-6H~H&0^_(XfsVyH{Y)GF% zr4YMwSAZ$y1J;axbDXdP?>mZ&B2D4MVKd4F*hJWEAoCdJHOAOw9bdu>4R-gw{^gjA z9p3CaAh#2{h}J89D&(zy3x$y@c66iNOc}@sA0ui@mI~~;DknH!=C%l%$q!Rz6JtaZ z+^WBY6u8M(T1x|3-Rq1tf?BPhhP~ktot5c>L-;INQO4)^vE9aSe{ldy`1|&>7ej@I z>tAW)#i@vZgU^y`>pwJI4Bcd8%Ub!#EK+*!A0OPvW%FMG%sz;fr&27~C#*rB8U9qc z;6);}8bdFymQ+Oo5&;7JPczs19K<#w3NYA{Vlk=d!Y;y=vj;YwO6FtvZPoY>AA;{j z-hOZO|HvXIA=>VmgX`+%3Yg!3zxTHWyL*&o0x|^vv>@Q^SieXR*Mu{n0Dd?FijGjg zLb-v1ECI3=;(x%!#vvZtCLcee_f*7%immh&VgnN^#SYRnfsYU!1avzGY{Wv(E(Gx0 z-t|{k$y!ryKT8-aKalTjH{QG_jV{av-{MaM9V?e2l_7QTAkapO=PT{S#^U}G4B^g z9KeU0nxhJN^(v3KI}_v`lpxFsbgRVai|WfDGpnZWe^Rx|YrK8BFs00<4tCCqqNDQ~ zZ+2J9f_ce;Lb5-n2?K^)4zjY2ULg^354j#M`%8YQZ@a&UU%o;}1vFHriOh0K5JL{I z*XY=+p^f^d$0Y+L=Q9<8XDP8}`Xj=M!S34LVnC!p74YLI|FP{X{TAx( z?w$$uc0hyXIEs|lyIEMkhEU+I-1Qo~hl_gi!;I_|PU{n@Yv}go&=RjPPz0e&1t|9l z;l&OmHvJXkBGB4Y-YhByO4QB+H?}I$-b)fC`dlObju-(^`Z9jeg0T5Uu-K(! zE@*O*KNerZW1A&6xv9l6f%bJQHrtIvS-e+UW= z?`UFv`;PP}j-CyO3GrpaN|Tdd^n2d;VNafm-aNZ1T4Wq)ajz9KECi z(@3f9&7d9Q_R>-|HO?`lL*WMK8I>65cSfiO;+-WI*f?*6IVZh+Hy$4mcoqwwXOZJz z{&t%sd(x^G>L}fyyJmmlwN zJtyM~m2mt$@?_j)NRcHrO?bg>>8*}w0*{jgPeLE`y_@@z%=L(`>8bv{7TEf}_Q+8q zn;L#I-UV)N0iS=GZTjG6nc4Ei>Aw9!>%DQ;HD0b1=6Af$L;g`kf=@6P1{j!MBF6t*=V-H_ods@9MfFW zwPnmBqsDo8dC_)7xw%T05>&W{g7W{hx#cxojpR`LvJC!Kj%E8T0T-+z;`vG(xEm<~ zU*xQGfX2hTI1Ok3<`4THz{b)DpoYuVNTVNQ8S|!l-4K zg}Q$C%f02jg%74c3-I0DY1TW|L^sDGwVoq+`f9W=#GO{Eg87a2L-^F8T{^V9e8h>6 znq;AZ*`D_t-xyI-*)d+-Ma*L7cHg0I$miWsP^W*o!AnM<>v#5HTv_LZlhOm~1Yc*w zz0ODwBY1M0C^8P=V#fC!NP7;%H^W*<>F!T2SB_WQNF^3|MvRtIzwe)j^0PnYn462H zF$INJu}E2DFYRPxuoF=GnMJ8bd?h+i%{(lnqf|K*elQsOGxviN%uH;NSkH0u)5aqi z(|h8yTJsXNgbnrrQ8}V)tPVTxdXd&}rHG=`LoHCs%u z5WLN3`Q=+>uC^^}E3s+(#FkFgxeiw3W)K07ygEjA(L_}ZlaW7NKZo^3&;vsRpAA&k zp_ncT84afrhyOVQpuKg+fE0*cTTMJD3(VSBuzv(hV$`iEL?nG7KnyL7?@}rRg zq-ODoG(ze+t)8Nm;jp%n&+E=n9!BJz{s_K6`&Xgh9GChFDu~slqi!dvE!T`E0V3uC zfFAaN&NiT52>%@L(lN9ST$kWQfHptjLfC#qIs){DsB$E~DNKCwY!TjGU^Sb?sFXDeBD1kk-qf7G&;MlCWNocIXT_W{C$DkvceY2_NmBCE3pFIB{02Sm5p(~j z`1BZQtDdi~eP*A$bxwR@9Ga~Xm9FhAAm}$0CGfP_;0E8|LHJ|75OT?TAe_^i;{*D zaPsJzrldd?8|6QWYUw83%sTl&7tGCr%&EdN=4XS0=Lf25Pt*i$ZR;-VRbYN2e?_PM zsmDY2)-BskAp`P{8rb9m7C2{-O|QnO4}!@gt=6>lwjujJ0P6qiBRy-#rULMve`!5R zqkO9n@c({6?qv3d z>*E=)ItHqdA#Qjrquv9N0DS0Kn2TMbvnrq!m{d}a4ZSbE=C^qQHT$T%m@o|CApngp|aPwj6p(LZ~vdcKnR zec|?bB`;qq;pBAaSpEDOzK4|?7W}G7K7$i!L!95yg>rl=bM8-7xvpCoGuwH(@mZ7u zJM~`K7tEwX)G`kc91RYn8n%*>pNL+@?b_eS6zP46p=>3gIHU;7$3xC9&7=RlwGp+u=Kk}I z!8@sMGt(IvY09KrYb_%rEa1@X1%gC`gn>um<9Yf**Wu5Hwo|Ve3i6|CJ3D_{@tsq{ z@c$S%(f2bHh&t)RSFc9M0^3z<42>OMTFf!1dU9#t;(EajCB;pa^8~w}tYI6@{UpZ` zEIC|Cx$tsV*nC8}3)J9myEP9ZqDB;C6t%2bY>F8*?RgfX;+Gq*P|;9K>p0G-X>eGa z2Cymi=O*U|uL4ex|1`#k`hfHD@Z^=su=IPSX4H#}_gNRy`1||0Tq@a#ivxy}DphK# z?pcUO?pSp=-Fo~t6M(2557T54Y5i0nC~|&`7i$~l zRw77+F!i7-g#TA8QfAyN5dlE}mYfOe7s2M{+4*@v>b?%dc?>qf5Q17@-Sr<|Srf+n zNyV>76?E56M?;H97r^Y869y9We1AU&1*)dmoa z($c879Tr67;$ka@(VlIRzv9{9E;9H9-`aY`rAw=OPDl2~%fi<l0gYGxC7~3>)Ij$1)ne#qBjA`Q zYiK+rPvj$1ndY8-!6;Zil%NcCu2IYT)cNMg&z9!d(a{&4Zahg~>#|`tWez}DfUQ$2 znt2B*z1iTPyMJ`l_+2x6yqY!XZLsb)X~K`9SE@%|TzeB(2i@%s~%4``Otla3M=js(Z)iJ}%YGmf#U!S3qn_$V?WB2wU6q@ASOGhd%rLgS<;{&ITxo2xOS&#{5vbY@T&&d1%ROWw!eZhV&Vn!bx9x5v;{yOc+^?CvS zJuUcm06`?H4EbEkOG}ltw6yM+LqR*)O{ntUzCt?YW_akSs%8YA2Pz1LJlo z@8aDP$W}0-1FbpXJCQkpdlt|zXRMy>`!hP{#M-}JoaBR6*ucMC+SuTw$P|HYD)6Ln z^}sOD3vowe2fPrCJ-}==(-1v12eH#nDkX0*kk9fsN+SP?0y+MpgnfJ~>TL>utN0rS ziCrRcV7sm_ETF4RGvt4`Q%G`YzL5yc(C6UD;C00WeFoL!wo?cVP%`j6paFVw0bmSD zRZ5MKgF%Oz`~&px%5J;Yi4ynNzJ($MgT5O+zD?~#J(YPDYp9X}81DX`!0D!&k^zV7 zoI28SB9S1Py$g&uAlr!)Y@M?C`*#Gal#P(!z{|Z{PDMv4b^XT7MidL?feOXn}F;-*3r_}_C z3&w=FSy)om1JBat27KB6%aVq@R#TG(xfF}lx7%-z#eujM`3}-nM8Md@c-@iYp9nl7 z@A{O%+J1YW9rC5;zqM&cGAEHTN2~Gi@nsFi%Jpj`7OGG4m4X)7~#LEZ)Ysg~MzlA#DMJO_()nUQGQbtDU4<9|sZ7JjC(SSw2v(>jD+w>8} z^*fB)UsdxwOFJYWAyGXHHiM2QG_yN5Yo>mx0f)UnbepAETq)^pt-7o9Zu+OV&~}$=OGooSf+2 z8OJ>JDn0p}Kc~J@+mE43+uP+DSCN8D1z{AhrYey;#>u;b@4%O6+Kbz->v zE^*Ls5l#Nzu)KAUL@qEx+?!cZWQ>Cj90?PbTRYUbmfyGBkIVY{^)b5-J3D*sGj=`L zq0#tWCS6fAlLInH!E0bod;%LTXK>7%80&8slr+`GwSsPY751mUN8U`D*tA>)-2LF~ zS}2MvaZ*Q(m^!ytgv?EA4xq{y%H1pr2k#L4nC5WT5ntiwN*AswAL`4CI-Lkx8L^j* z_%+BUf(|!oGEANONS=atjw0%Ecv_TwZ|aDC(gZ}Jh){)YMDW?<^IZ;pLw12OidUl#Bkt7`_vnpf6(H3`qm?Q}zjOwv5^Q8wD#O1vDx zjh%yl+3C4j`%txE{JYU@(qzEETkL)7hg#Ot@hmuKd)^$X{&b6fs-2k8j)%a3o7I4F z)V~E?lkP6^i=vWKjUZ@sIk;?*n83XfvV81~jx%dNn)eTIvHG18;a6|x;K-WiUS%6> zX2pV3!iIx2hH=+6Ht-9H_W%+u#eyQUY7oes;MOdbh2|GS#uXK^I%o5xPT$&I_v4U> z5x<*H`&pu22Dc*M4~S=-rGx*8u9BD3iK)4FPK|pvX?C8uC|TMA8zOQm#6930F(WHP z1v_>n#T*&A+!z+2YrXet!}`UX!`ECCNq(Vk4pQ7wAgpB@CciWgFY|aku>ppRS=?q= zKynShH|gZ97ElCozJ-c@B=b4&32c$Zjs?k}RFqOH{Vrl)R+wOA{+ z0xA6ciOU+Y(*@5iE@*SNW52Dyp7ie_!sHP_wrAI8I+7qv4g{$gJ3DN6P*9T|$cSVe zR#V6do?`n&VS%#hAtXY0txs3fVv*P2W3B7vP}3TS@g4-xb-JxXlB6?E!x18(Wq$E+c&#h(in%Mff^c^lu8{}4M8v?;;8h_98%u+(~ zZTDVzf^ZE5tdBxP99=*k{?C!N`gp%6?zeHbRSMdG287%tI?DOdgj3?k$cw`rTr-1) zoKINH8MO^#bvK>88L}YzXzFczL}xm`5DXS!_Qw-UgKnf*bDAieuyMhgm~=qDlD!vX znh6<_-TsLDicB30wDc01ew+j6NH|0I33a{Aa0LblGjp7w59Z#q?-A(<0LLxx-Wj>J zy0%h$OuBiPKe>=q6#?IJ5@|^E7ztAd@Gunqp1@qVFYbBk&>5k%eazQRGB)x^5@>)E z%YxqNbm7Bq4@ByJZOXzr@N5rzlJ?sQ`*813dfjbWMR7*>J{Y0^kAtX|7J;Jao}F>a zs4-(ZVxsi#Peecjf}HC2kQ(4TYZwRb(M&u1Pq8qM&cB{4DGY6I_f5e97yS|ih*{0ROzJ1uXZ(_)Z}Oi ztu0gSYNrEx^4uzIR=m=nktaqV%WfDtsZj2lc%I6Yl+wG||g- z^IZQ7K(p)N8YjiuL#Pofvg_KXS-nJDw##BfgXbRniF^Zf%c|5u0JOSP#By@}wM^hzdwP>J)rbyQOolq+u3SLm_s{s33NBs$Sr#14b$3T-tQpe1hi&ky zD67JQ6Vag!7Wo<4*OLvNFG{0-GtJ-R1sBJ>U$f^8ur%2X@Q(@rSrWh#r&tFd#fNEa z=wK#zaUAO6t;?Jb{;-XqgK+kKfYp|N{bvvOtXnMEzAxw|{==lGS<~m`r_t9Th`^SG z%}2M4Az&1zDIgaPC!^xU~`P?UAQRHb2TVPR1nFXN(wYH&+)=rXX{9T3a~v z{n?0SBk|2NbOh{|WoF*#WzZRwS^7iZqe;5JSQjX>1Z?s~8eo+z6>gCKV+AyWlY8_Y zB72x}$?m5F$+v~yxAEeC0yIa*?p|yZ040dj1HoMpA!!pm0;&SQQ@NWREqC-H6r>v8 z0b027Q1^@w?zPGg3vJW#1>EMH1OhJ<$ zT>+CPnUiP6i)2{mp);Q2^|Ze%mH+Aa6%s-j2lU|Yh61vb>y<&%1)!bpO*S;x*M9qn zDbflwpODsHm-wGV05HC*n||^|S8Kqxi5|MtNGrtIWVd%Rz-DPalU|$ZH-K%CIi}!M z5NW+NebC`X7LDg^S7-{%<0yl^E}zp%m2IpMRu2#1 z43qCfg&@P+1*Do7_0WAAHkzu3>>+(=qDXt>K~(nFa6N!h$JG{*f`e!{^&Wmz0S{1V z&>ntPRk$5#os20hd0l~0^FZXH$4Qn-dA0?~Ceo7%FzXJN!a*id)~h&A<<+Maq&sN4 zfV@P*%%#cB{skEyceboSa@H~i9}fp6xD(woZ?hE&nAQ6Lkn@Fe5W5dpk|o(7o-`xz zWTF?SyVU=VPNzvHkFTb8mlt0;F5>99cFTK3=8>)*`Y^ORqod8NRNy8zixz(c1dR43 zKAa3%##){t47bLC)APi2i1W5(!}@fN3KB-@|tdlv%Gf)5hU68tOdu(S!m#))(}KAU5wHNXuD{^a0peY3c7 z7&a2v-A6eyl7+^}o+6yu(18oQ>zk3E#lRVYq*YYMYrMMaW^RtYdc*@R<-s9X`UD{H zMLfz@yxwyw0%{6%P-6Cd_w+1*R&;H&o~QG=!@0qaDRbFgI;$mecZ|vzj(MNS#2^?l z1FHLvTWCnYis#Mksu;1AOCY4NEDK(G*Z0}IbvjYyp96Im4@16B9vXrlohJJ92<%2Qz2>X7qN^dNe-vai3ZJ`7KHZ ziYU<;DGn@Mt|Er zWGNx;G?x1Xxhb0&EP*?!$Jnk@ns4Ca))cFALPuy7S19XLi$VXJJEkvUb$wWQrwsbB zIvG1RDITXcR50gMl-Ft(kh)}hse6j&s(RMd{{A|aH}5duPUF<%WNCg_kUih%^u0J$ zVIpBYe7?vw>N;VTv1luxyYPa^Jh-@5REpk!pslS>bhinWus1VLPt%Q&c5$*i*H+N( zlNE+xzO9kB?8Utfzyb9k=H05BAT7F8(QEWwWWBHepI$SBj)t(vCahaB23O&>^aYAd z^9SiW6l)I-cP&Lq5B6hqD(@ye5EKQe2y~rXxLTFf^%lzI_=$Su2a}l(^0^ji z)h*ALS~})TbtsHtMmhz4+Lg$@8a`=mee9IFCmY|EN72fwhi8xfnU_*Ci``jU_ZN?+ zf+||C=meVi%A7N|@(}fcd{THoQ(Q5B8WRo8RoaPZvQ>%RaUd_4+V8I{KW1&4$25G+ znf$2ni-HE{Aw=Y3+=xN4mB3_fR{X_Lf+=D0WL6dfFX>u@fE(W~4nBkX_=h>;O^5VG zb&iDJifw6)oJHcgqb!#Csz>_UuQ?^C{9XKD^4+L@tnuIF+sE>W=40nFykHV9N&2Ka zc?NAUzN6RnOf6cxD!MS271Ns|`g)x-${1FN(UX9*WwDZC5dFwWWTYdi+H7$^!8z9U z0kQ_VL<0UU+qzb%cX%vb!$f?8dvSvC8e?rNcJzn&a_kPOe9rwf*;+U=i2hiIy znuU!0SpBMTiE)|0*e#v={%YOR|oFC!y^<7t53h-b=0d2 zZsV~oa)}>HGO&wKb2TlGbhcmzQO099!O+)Lnat>Wk7LrmD0W$MzlNrNtX;D`yQ-#S zWKd{V^&WL+D9HgiD0DEwS&`vupSc(5wm)}^%8IOi{T17?;>XN8&NU{N=(r87X*0+m zQ4tL6Ej41DTqU5!t3F@5+R$Zl_pYlAcvz-;X)ddARnE8wYKoUdCDc^}dJcr1U$gade}4&?kS2mWYsXA&6f6=~o7_?368m1T z2-k4+yfg3CDB1{!5Q+vKms>Mcgcy^t6`VXv-YeJ}qJ=fVV@C58@~7v4eqqc41Ww0u zMA49jENzaYz#hw)ndl6`@@nZU_|~VC1P{)O&mv)lfAvmyNJ4|k1BRiDzCO7h>(kIj&b3!f6rs>HN8OJRS{E}1 zw3CbHU6_=bPJC3%HdbG*zs)`kW3-Kg998zrF|g)ZZ`oOIOcZGQQAM=liAX#KCp1Ro z5{|j&ZZi0VDb9g{9-Hy4^P{`%m-|E+T#5q*7@yTS#t`Sw)6(Cl#b|T4;?-Psv%;B= zouJUydmbfw&qP^qrO_kpDLs15TPq7Pn%!uKhy)5S7~K>HJ(}t>?_hI|JWKSb0Vw$H z@hhKC&~;E2j;`OkIH_8L;xxOb8zcRMsO`D3mL|7i2;}uV z8YMMSN8*FDAv)vdTU?q5`AkW(Y!BB8sM>=E+hi7DRYKjm@61%IuJPMjifR`%ukOO; z9$409=LEj&nVfeNY-F}YuVsp|c-f(q*&lhFltea;UwkIt-3=_P|5IoNDc$B?p2db) zQ9eKW0Zg@@P(g}{cAG@YDfz(eIwa;oj<%gN0^+vjL?@^*^t9s#{Pee&_5Ld~Nh-AE z#0PBkF1W{5(9<3)U?>pQ!sW;0$lTZ%jjxc9&0PI6U0q#eMP;JF2Q`X68Em&gdl|H; z_ZH#?PSX8pkX1tE&z@4@6-+@27_M@fdZMeTL%*>1Y}CEq>+eiq$$R; z_V=?p-R>*|Hv$YcHb*oF8NiqQoM8Uf#Quk|nwR6)b2`L;g+{TpHIz(kPM* zQu!Odm;3BqNj)`8pZPl3r!l3jPWupZjhddG9{CItPw$MJ_vU#2W{`CHtAcpxEf8{( zV6&h^qo2oIy}b;TGL9mqy`cy97*}>;7(Zn_)D}juT?6W;5aT9#LFbqrVazp zUfd9gXMaDRHljR^pQ0c%-yEchM#v3J#0d$fU6(rS#DWhu5KP$~mKs|4hbK?+n1|X- z5Eg_43_y=iX=vL53&Iaa1;IiaeSrHxy!H0gsSe#R+pZo$BQ3-kl5`#cw#pju0T%?! zhc;|5+>(uOk1H!Vz>kEqlkE*DK*-`AIHCj@_!oh$SJ1;?Ht-{x0#i_(rzzn-zx6^H z=C$+PCfB^-(Y58~&hUWlUb~8dtmV^(-o{UJwB$WJOYdJKy(+CJ{61OON0$s&x_%Ky zH8D)7U9}zFOK;1`(4z*fKHTx&NvMq{_H?jz1LG6JUD!`bF^kU|vRd9Ds3Am%s7H_N?_!kqSWs$}cp{HvO)<7Ct>eRE?27gly7U%^#T@15JOOS^Q& zKWY#6c8a_TraN|9?Ht}iQM`|2pA5LwHPlP+K&FE=_cr&Lb63MhM@GE&UNm@3g_$L} zD-|GF_s4O1cQ=}TlM6U+99i5esQCmNArW+5l5uTBq*nOXP@V*=mmA?QLf5)FeUg-89^>Y_D_oI1bOO+8`SL2W*}@J2*9NTi@K| z(YFBaX35)oFPwzSQA0*XcHAN8Ykkp5Tjv>$4NT9P^xQ5?pzv;0W2*3_$Jz0qOzso8 zMelvyCzZFhuJ(qzGAb+OF?D?j@PRPXR@V`%M75n?i8y={06!Qgi*oW+_g!XU7?_I` zYuwx^e@|^Qs25>&bKNCrN>ALr+`0Yamprt6w$CynEc*q15za?v8n?@x={JHDXD>{D ze8r<{QXD1YIc6V)ey>*GW1CbPn}z9rz0y!yo}J#s&98nje|P_fF?F&L@Y>w+c7tOC z$K(2r)iO1c)!xFg6HB^5ycw8N%M(oHCv|zXT4bIc_4)I~()?Hy{eqVrZT9KKih3TW z%rY7k^=pQY4>38%_a4YX^O3FQcS|BW<2n71HY|(x!flIo(lGP)O^wpE3d7A2Vv@fAz~xv%Y&ohZ~aD?7Vi_A~1FQg9iS5B9KanSZF(@uy|v6hch(9(|x$} z#b850yN?BLc>Enifm?PETcaDBOsidlIzzCTqz#{bGH7h)QCq>Pi`is{6y4=~J`YU}95B z-foU*`kB{7yL81|?U@@p-(689M}g^*?RaTu30%zjR(Ah%OxyMm$05&dwKOy$`i z4Et#uE<|;1c%o#H4!MAQZ4!G126g@gos^3ztTu89j*pFpnQ14h2(spP7W^<4wxct4 z1b?)z8jC`etgHO&HjaN-I6rc7zi{37MaiatH7a5HO|@ND9jeqa?vh(=^zk9zCrSb> zd7SVAtc??%Ni#|HVx;9|zpo$0swzH=PuZwWmyoH}?4FEOndS7dPEVfuc@}TS7mmzK z29cDH5@66;A;q?n&YYi|%wq&UJJzy;z@pvG&$iAD=bbOMXyF5&>t;5x8Q@18EpHzk zpT7PMJ5yP<1y5v{~&|gs=Yd^(7%WJ&SI;)r zSQo1UaE&^Z_`QtwRL7C^zMC+p=UP;gee?@r)=%aW*SqSo1o)iipRmZ4X8b+h!Dl|J zxgwpvcV=ZGnVB)DNX)_9>4EcGFbdLNJaFJEJRa>MI9_-F+3Z`sV6eF0%ldM_(>Xgw z`wniIB`pbM9qV|uk~I_=y3C@A95#Espdvc`>M_Ej;UgTSd}aGA)AAAVbKzdfMHehJ z7c+6v!Jo`;NzLuL<^xT&Zx3jitT5Lylxu_~sUhPFy*a`SNS`@Npimebx$|3$_bs-N ze97}z6N&Ww60-jA_ZEVAtCW_!n;Rl(B>11xbe9&1Z#OgqPB&kRs;ERwd-0*LcJ2=M z{hSPQw)6e&`8sX^_jvwl;ivDG!prZ$`-f#s14OI9S~|9qj`Hy=^C?wJ*M;_R+G3c` zJp8wd?$GTjpB)x%Gf05E`p}#%v#rB4vp6daUyQyOmgj+8T{Ko9FP)>13p_y|EF`8` z;*lE(ZRT<{o5@P4#$&4mf503*%0h8+GFG`en~Va2f8TlinOFKFdAqK@F_2`Is@3g{ zlIDF6lb5o&i5oDot2c(&u*@Oub7T)8kr;vjb)GrFo^F!QxAihe!N{hmy7Al+%22!A zJkZ0PQXN$y!2^Tm7tSo}hG{F%Ze7_UMAs^uQ3N=Sn(`|Ss+=K=v zl*x}E_<aprAeNNvt6UxYct%jl85>Tmkp(qqqC;~SN1>3_Vg|xZXe`=b$dduR?ifGG(5`DLJ#%zPQ(t#RhrIl?x#Ut z3%-Ec%;`$sVs$-k_@YddQ#?o4Q0lcWX@77MvtFZGjc+F@G`FqIKcF^}7-5sCivnWO zKPEitx!q*-kI^J4s8Xi>R_rRj_QS_9Jxjm0d;6m1Ucad}UPh6bg65Tc8LudH>EMZD zh3rxg`VSS%7VBPy@+}#0GwQ42Du(%KSJ^yE%`dPF5QJe(G+7-YR|u5z+wRhj)8^HI zaH}?Ga($|2bZ>nr&SFMz&Qs_t|7eS=>fZU?&GV0+9Yu{43-CbN;$3|GUZ-Om*D@*h z>9XsSobJ($@vdKqr?TUV-vy>%wfx$GSE{qQFUPmK>~xc=l&s13?CS44Qe!jRV{}~@ zf5v9A!uMo5$Hj(Mq4a>qJdLE5SbtlRgyb%fqU($#`l->3ymtaeFDXwbp7zv z2lq!1(pYokamVcZbgD3PrBT9BIL1G1u4J-H*%LSsl>Aa+$WTQ>YGUpZ0xEn_aI{VluoEGw#b3)t5^HytRYF?-b zo=V3-VJ;s?d^w29&ToqnNip#f)X3P6nNXQngrB366*q+8QuO0fU50c9TNW zPHLnu0o=tYN`vqT4aw{??LO_mC(t}LE3_v*3Bo*hJ&BJYln zZ}+_V2yQ&qEm4btGCfCVX0bWKNG5WZmwxap{*KS&qjOC)6*^gUZ?0ln7mN`dCAAmD z^1qbML-ZQGT6Tx$x$SeQ+ks$ge8`X|t<^Y5FWaNLt10`vQv- zCMTYZ5l1`e+GoMW#AeHSXQOMBD%I}tYO@!J722Pvx$yXS_?FlVQ;F7G0<2J)2J(O> zAW<56sIiu{vQ&0|>cb;zI$Zs_r@X%5(;d2gy~l5HfVS5$75vuq_Y=}KqpTl#G@SRh zy=DyPv|%)L&SN_i|&7~pRHDsxPW96F4W!Yl;x zIlhSO1_(j?F>oRL1G55UaR2gMu=fd1{jRz4ZhHy>tc=c7pHL_&=rpOFG-Pe(&flb(G4(tXe0h0! zyx224)VtD;Q;SQOC0$#CgPL?+q@XlO+V6Db@A>uzwPB&hFXv`l!WJ7s zkL#0OA$#8b*pp%_M6RTyv>**_I1v1^i*OM>__+@KPcH3SQAa~=u5@C8-5tpfufm+V z&`8EPJ)A;G5}-<77?_y8%EmBB0~Jp#T1|>=&^Feu@SvX)W&cNnMWfd>jBak!#deWD z^q6OE-_45m><&hhFcxAp6BSW#C36$yfm`$bbQjsL#ejtX1KE%DTe{2@d1OwfsJmaL R|1I!KT3lYNP{iQ<{|9)UHoE`- literal 84131 zcmbrmc|26_|Nk%9cUeZ(vL?$^mh2Q`iLBW*Lb8o)(b%JzEQ3Op?2#nh)eezuWEi{e8c;?;kj3&MfDg>v}Ga$Nf4{hWc9cG@LXfBqa1Y+8V|r zB&5UO=ROq$c;tB$Y!duATK8|b(%bcwJNar{_!_5vUMc84KRQvw;7L|iRxwL#J0C{AWusRIk3iR507Bv;|-DFJ32ag z(e%6v<2J!%Ee0JgTDMoiKYi+keGkPFDjYiF=$q0Q?Vr`IeR~JxO4N+>AMgZR>o+N3_tG^9;{zZ;QS?jUbNUtJ3Qj5OP{W~;3Jb9 zUW(Wb=O}99QKNUmXRmlBUz5aP`f3RLXKA6Na^&M8*lLv6X-cU3GiqrBK3^WXMGcjR zvht6oE1HHqvh>%%i9D(iyoi0HicU>Q_FOkXtaG`Czf!Hp?@h~z@^{f;g^#*o-;30; z#zGRc*svTX_mD#zJQ!8^G3N* z_1(@pc&fn?dJ63OP&VujXs;IvvO0tu&@zKp8zEh?f0w^3aopsOV#T8yk{}Lt^Z7Sl z^=^C#@>4>(sJP9yE0b=r2^Q5Gx%6gmm`_ZS`3w1b#NDjPp;0he z=(sr>bj_9=sw>Wp7n1M5>3Y*)rL)^#)4&rPim~PIHY(>hkUDWXFD>C|{c)7gUZ)bQ zV5bFmsAcjD#tiHwjum7t!8$)qG9Tb(*}OsH7SW3Ll;dWIXtjGx`|p49$ljOx{dp8T zdE?Rl@e^vs7FAk&fhQI3a%;2C1u|6r$_9V8%1`*WTo0%#7X=PjjjNBy>iXc}$wvy9 z4O}e`(k1ZD7fpo3fhw*60spO-;rtj4sfGEwIJnJ|M-y

8Ui3uXK&*1~0@PMIlv zv?pS?BJH)}Xo&B-gcL`F13cLR{8PKytjlFRenF*ZJ;6de5pu*Jg6a3>`Id{VuBh1{ zg(2;%WuZ~+{MhJS#0#u_wl6%-UkEKzFy?9VYMzW+|LRPt`RZN1rFfRQ(PBk%&K@24 z(XYF#+S3`2)D=0Nu{!5vtmOG#5m~y`j>%oexu77wbi-o5lwjq)M^-q-nDk+)JW1o{ zM-BIH6DMXG-cjDMryJvj!}R_zId@*FmK`2bkTI@tuAn&>TzC_F(ha@K(0vK=HEQU% z?%^x7gs>(wB!nMZ-4$ap=`5yNgX(FBYJZDd52V5Mcn74;=_8Yp6r;e%eMu&INt1ws z&%$mGT80K#@V?Drk%qGnXP#?{d<(8;5w)GhQjB#j>!xvtC_8lmI<+F2*aIu*a0d7sKB?C>TDGE#v~N!Dn38*wQqD_-qL^{Zv%u& zSrUX=LpOJ~iSQ-i!ewVX`TKWhNFx+mEl87+i;d&~Gj76C1}gE;*qjafHiJbRYo#p0 ziXSTUx&8wCHjM?9;;Wwn@5wlat)8$%nGgAdcNDz^b95*g{F34I3E#g!rQg#3T%D9O zGVF6kSb(SG(hzB;KAPk78zHYHT-=A6mYmp$Y+1#!s4iVvcymLAL?Sqb)i+Lo2>EY_$p#oIN`~lRX$KQ#D z_k2zQ83r@%&&1QB_Ox65zj)%&tMYq%sp8hL_ zIHuZz@#v?P$N{tMWWCj}vHRHSqbqm^Jo$9{XmS^3x7Dd)#J=Jph$zzVU8>br^j6|D zWx4XNzk?7anhx0`g(q_hV5_;+l_%#WV+dxQ<<~s64ksSJbsMbe zPHY}4C%F0w9t+XGH{b3n1`RL10M)<#CE*Pr_m~w0L9c6{Qbj}7)$Fyyj#=SWh{sMX7Q&l`}mlo=+mqia-P*@V` zoEy`&E3Mj?`H%#uFn$iHNS$ey!RNQzoBCW=8rjSET<3G$*y1b6WOKSqI8)yF@jHf$ zJemzwc`g?+V;Y2$3u(3Tc*dItG}rp?xM4jtNKIzZ4z$o!b&}P`5Ex6693lzJ;Xqo( zJPq@fOoxo%Vc*lV?MT_+%PasF}U_;hBbfdTfvu=NV1c{FT z7X-YFe->tk0Oxo*!gM2HPoVz^Db)To0&Wul@pZ#~e2l~Ri}Y*K$g;z2W+z8y6up#& z9yh)^sqom54|6kJ(5Efyk`)m|%fx9$47?d^K7}ZsR~9v7q<==1MLI&wNu9ykmi(cn z)~tor!e58)0Vi~MQ~|08MJUFm+ymh*#Pv-=O1l}-=-~@&+i*ASxBXi0n=~VD5~(x! z9R}VQ@+H#O4rV(EkXw^6vBcfO(AR!pjHTJ~^gz_qdS})e7uRMs>mn4xATXrmL5>O1 zPM7OVqFnUsMT$@d_4FESi|vKQ#{Eaa@bY~RMI*&9A z&JTEetCM=*F?mMT{rgfo2BQ9R@;Vkld-@2y)3{9)^fKbtC9o%lV&+ z@#Wv^BP6)v`dX$d9@}+>eV33>rrbd(KW?zo>i_((+{Q*R8i%d{vFpLP4Go!-FUf4SS-D5cjwZPqfJEK?E!sU zT4iIAB1VjP1bjUEpJF49fs6%PY$mn0*1FZHH=yHxYYB6p zWHK8XHSt}2LJvNZ2jj;wy!@fQ>OFsBq?B!qpT#rQ;mX3RT+e3pL_fZs#EOtF*-5#} z4&7_&X5HPP<$7c-d#*rIcB>YlCYc`4ic!-!mCg^UJr(?GwS&>>Ec%Z!a58B>)AfW{ z0#}$DLc{hsWL5tq6?FR?8lv)geWp3Q(*OLeX>_OyDEE+EE##uC^Dt`}xEcdP&N1uT z8Ca;X2v$0tk`LK+jYZVLe;{R-&HVzgu5;FUf&C%EP+z`*b8@EL+TdJ$BC=ys71BWK zrq!9tgO>4v1u;9dae0aOWcE|f_fzAFR^+cCU3^Hf9SXz|?eWY(y{2y(C4ulapc!oS zO5#4TpbzJLNREqG@+(*4w$mz}_QBEi*#yzuGnVNB!V{LF8 zODyohoVJRyV&jh-UaOs>fsSXAVoR`|p2Rx+1A$*YKm1Ie8JsLF2@-MW3mejQ4kwc$~OMRoG7+hhZ2&$zqcrJCZnB zkrCu&i&2o~G{Hd#Vm)p4`Gtz>KwNW|fsPlihX)$L)5=9&%v4pIv)m$dFa**^UB=A1A`1~lPT@VuD29N_OW^hl@Zj2u*-qdU@#(#i)DW4~XKJ@Dp7;>30 zeyqc7Z0oQLX6rVdx=ag1g#5Miy4^vaVzmb*Ew!`fY9tNREu?#g6cqW|c- z#vJ-4Gu(7L>`89zDy}H9gaVP7vdB@_#F`+_!p(9earkYG+fY;wajF!R3pd)^@hp-` z+)u9=dQ{R+z1sD!pg^JXL$X+eci3Z;sZU`CTVD3=;cn8vv~Q`y)q&6VacQgDoYZa+ zdCWEzirfrZ5if7pH8@U15i%(QT8E&kjHe@P)MBpMSS(^D%vd4Aargc;xBPKwi-#Tj z#mobcPux_Yen$P;dyM95H2p!z$M@Tj_AR)6+*AD%9rb}RI^@DeLQg4Gzsd=YAXb8= zT@H#bN~&r2Znre!g!A3}#U(0b{;8{`h8-p2HIExNBKsShS`BN<8{H@G2HCK*e1Yl-Z1jDF7?$v*2frZ}lKowZ z;#d2^@9ZO`pw;C9G$n7)g^bPR&8n$ip`JHnB_rG%d>55>?v({^TAy$%U}dk)-neak z+voN!bV^R*a4yF7F4|AlwTX6lfSma#d+p#Ce(L-BNWt@_jQ#JI9NJeVc3j%gHF;-gpGX< z$wuj7n;O@_XqhH~C(Pd>^~^Fnnl{%EDI}=H;gWB-L6@hfZ);ewcT%XW+)#l zO(+=uk7GNFOVd)dC_LVrl7SbNLPNb;vo1aK4-o6+zhf<7R>&OB@5%fI!w&5-^K}%G zXD^H8B_FHy*A>>c4f>hnIotd0p0(Z4c6X8H(6u8)KRRj;=GPVFuc4chkLim#i?yeG z`-Aa!meN!aS$CEq_ho0nZu3XSy8c%M^jy8J|J*!o>8HZIbU zjT4>?waQS&#fub=3G9w9dwk;MdvS71Ip!9cc16Z|-BHzQt^MFa9?G}mM6!OSJDzPB zyYqRfUuh>uZU;+S*{N+-Y?CzBQBJj-D>j_GZ6L_XJX|%>kS>p0F zF@B{VOQ2<7d`LfjRW3ZxC2D&pCQi;kfhXf8b_JfJlp^f$Hpb2II&U&u^LX^bXMHM} zecjRnzG|*{%(r(ba6-dl1HamW=NI2MKVt~F4dYK% zFGD-z(f5$?bSBvM6i~#Aa^pL=dyc(0%=%rMiV>ujt^UIq+j>r@>@EZzqeEi`pGfrk z@`NAD#2^Z~qtndOXpi-;-L#?F=NDjb?NgEjoQDj2SJn3mLo8(W?mSF2UQq_VRhJlm zACDi4;#d|YCIIj?S%{YIofTz`Lw!B9hwP8`d5=0-3Z!^;q`N| z*51oKVzDS0pWBMP)Q8C1_4)!lF-qhPld zK2tL5DT(lUaI^Re$UF2`c9LghOzwPPfm1bu<|PS&Ge^3_N$TIjqot_IZW+#u4(bdb z@hF_`!8uY7kf;<|A|?7tgWu<<;FD;&Qz2UgYG0V(yOuP9ObCfn;(X)N`9i{o@*(8q zA-5r%$w*+?p1;_n89ZF&e1uT3cTNQCNUk z6HYI^ewxla%%8E`l=?A`S7{>*39kr#BZ=Bt-DHRfYN^n(YY1xZ7Js)Md0Om!@eM^` zk!Y&r0Os4apeUUG<&=?RZ3}l1ES{C0P#CdXKp{(u?DgNcmWui4NBa9N5l@exq5_$d zHa&OFsb^n>xtCf7-k2}n$w+O<7m$9bef!y~`|g*hqI0p*6>*!O235mVwb?VncU_F& z)p=}KLOee<5*E#lfgxKc;RLb7lsOK>UV24yXdNSW8uPF*XYgg_m$y&IJ1bE#+`Go- zir!ToMC2dEMNt=(FL*I9cPq_=J<`e|6FK^YVo`$#0C04LjjR|pPl!{@fESD$bFHC#At$v`--h+`w&Gf#oQ>CNi9k^&AJl?% zGGM25WxvN~lIrX0ewsxaq~rS^(OkIkEv~AjvWC4i1k8W(+pniqVU2U$T&UcmA59|tk%{dzGp?PAgxOtPMsh%EL1OVqV<9lJ1Bdp+-SrGLf1b( z^d{e$QV9vz9@U;asQ!~C5^!n3d)kt6i3{2Z(>2O8g%N_94QUb#;vOFvn%n8M zD9YC=M$3;MkGburN-M97)k6>J?yHYP-o|$K{(ATQz;L(MqpSyEU!Hpup~(BO?J#B$ zZghk*;!R^4HXdc3`>?m;h-Py=?yIUw*2JY5JI0A7ToqsMYrrgUEDjwKXEowjLWJ6t zesS=UK4U92(EIfT^5C*V`eD8A(8=MUez?7OF?O_Cx&4pJ zv1D)!IRI3h*(jc1GZZB31?lArSmPEr3bMlrZ!@6b{?hs$r>l62O0{Am{3#mp2vYH8 z9jE0i;$FUy%O?$=6I&7-!Lx`qN2H2MJ?y~#aM;I;e5BnK`o)JSr%;+{>6V_BU$Lb9 z8hpjsDGj{$!JR`Hf`_0WZt>pqth>r7Il*&M^Mjpd8gaTTuA`u(0kSPojs~3=<4rx@K^-M$EFlzemX;QZ=(?Onl3(&W z47fur$++BMyMPan55Ac|AS zY85%ixPu>B$#^zH?(8-|JTV=_Q6|U8!h#bKGA4<|3y-7Jm8FvdWa8UUD3h7@<9Dy! ze!$`D#&XuZdIR@*2wN`Jt?Tpg9@6EMkr1>4*a$OKoKT|Jp0hrqfGsDMDod|;IHC6O zwHqdzV_-)WYawV*5dywzY6Kr2t!06N93eZqjMeAKyii=$w3W3o_5&Hgzi<2DP&kltXFC{$jEE*+h(hO3S7$DU_UWw zS>l@?s@J*}?RM;GF1M3A6(Ram<#v&3gW>E2#!UX=y`Mhe{nW9@y%Y#hoM~-Fsq>4h zKm1T0XP!u23|Bo`=~-%9O11dGYPTBlQyav3q9DT|Tv@&_faxjSSKp3|aPa<<0-FEz zTr_ZW?1_2fueTG&0Q{a#4HWU$UxVkLtn%NxsArWgr2qbR@0tVz2WIfR9CsXp`V9*{ zAG_5bhJW@a%4TiwkmMt>3OgV~t6p2OE3;b~+Ge@yp|MTbR#;=oL6qSo0x$Z}KugWL zFqieOA?=FR`SGmt=wKkR3XqFIKkHzpfo5F62FxoUFBl3#_Y)Odw2J&K?JG=>Mo$U2 zWZ6rS6Q>!d9@Q*gMLn+B=U&AetBbUX=`u!KHD8i5e`)<+!&IRuSdN(L#}0P7DMdGbi2A6usi<_lf}m{A3} zGOZX~&Z}l16zjXs9B0_PTE{)9K$~~_d$A+!daCY;GEwysaH1waXTD}P-!7wXR!QJ@ z2G&h7X$TE;X&8(vF=$Xyt(E@R8Tfcq!bD<4egh{oA%-xs!%7H6TI!Dx!_o94u|h#i zW|P`*j9m10sWU;k)Fq~}!>iH9Gcq%AwVUH{R!Xar>7|_m3fP8y?cLDB`{uh7*?P_*b`1y9jpcr=PsB?jp))Z^fwh^s`~1Dsb27^rpXtiFt`GxBZ2sj) zR(J1aX6G2qX8}wm(pFq@D8X83D}hMEJpBy%0)z7l`P7C>RRr8AXR$5U(^?1R`Oa8R zxEM6!-B}=oygre@dWwU1!E^?`8@nBLYH|LUEUklRCzMh{mkfX;(ouC5Zgs{{50Bmj zZVwhzprdi6>UmB*!y8uO$V9O=aA%4IU zC6R`Y-fE!*m2fF!zTM(^JMG6&EB^tC#r#cTP^@ADMo!JfnQRGM&rFr+2LD5LJeuz_ z#rUURAR_2p65Oldl6_L5=yy*moN;C9S3doXk-jh!PzIm)d@Z#(=&}f)Jitx1OeQBO z9|X4c1=qJeDP8+)!V=TQ@>jS^Z@WFf{kS3dYLNbJw7Ez>_3=CmxI;fKT{1G%UJ!hE zMv*#;jrG|2@%wx2jt)1^6% zuipP;rWVCFqit3cMH~)^qdsz+|HO%N8WvS=aZ3zhSBfXwX3|2x@|Vv~)gNNM>MnQl z_-N8Dhd$IXNNu<@HQcky8bv>YLPo)4)-{N_+Y2hU2m$iHC53CyOa3Z=x=4QXSh4sy|!p79@kULg zYbaft9r|w(XefPVM2RK((w%ntcr`onKHb3Qii>MBWfDJo8x|Z7F>O@q*C&`%CExP& z{-a)BT57MMLio|DaYugM{-QFUb#)uc$vMpMX8b>bEjxXmIE#4cUN`J6fb|yodJWF>*m*h}Ph|d zI;8O;*0}4Lo05SO9+oT7O&jPQYP1F@fEHzAP#yFVzSuKM0^_Ry4GD4r= z?*Z_lKuHGOyU?#0odi+g#}XoT$LY&{N1UZ?43JnrXSdgGp_fh9XQnT8RmBo5`7Cn~ zx~oTUw8xH{ICC__y}#$?0*T7_JJR&hJ2ZNNB(rQGp= zyVu`Y06PlTnuP_k!ymjQ=1OxNzl+Gz9CV1NBbPP_{hYAoK1sN$Qb!&;vmo(7m z4$QI{T#!=?T6?pSFq1~~`lvL3!S?D=HV%V4htEHUb*@Uzz{g2oITAG=K|&ny?aIIg zOeEFOCwlI?QNXn3s3W%yais=IhD1mfsfsZF~H|iM{66W!5Pvp%?mdGWR5bp9t8WCuFL+itcNdZjXjjh_W7}(hcC7E zk3U_ECAj?wz7n0cltq|qOR`wH=VZnfP_N2gpTdS+bt5}_iQ9sUZt47Jp}Cv%W4*;0 zxipJ=cf6z(EPS-bEr4zWBCwp1w9yzHG{a&R9ZTRRaDp@t{JzB>-BO`(QB3Rt69Vzq5$FL)fi!S4r->ohW7nDS9sv+luX zVXU({Fv#8TXoFWwi+zoTP+Ln z6k-!OM7w+sY4`0ucIiA;yw)Z~#S<O*FZ2r_o3=t08Uh7q-@qJgn_9rR=KMfRGe;oe~=0^Mfu^S$^WU24} zfdD$?h~eq77x6&=1;lv~U*<3a!iyt|UytT@#8R)0%s9o|VC(D2!6Ns;-s4AzFTEX; zE2gR82;6=7xks`zQ1=sce7?IH19Z(D4?CEK1=grD%-q~8!HP$4H{V9)Z@bQ=aYusk zY7F(=n?ubAUTKF-O7Xiu9(W6+wkG?s^@PXr;t{7jolmPjT{Y19vj**p-RX?a9PpEItD^j8HiT&V5&=iU{lMkhA_9$gqlt z-imxnH-I)y!ILL``dpVM8q>3M-iSBDXSvxxpwbBokRweH5IDQS>MUJ5c+>Ee+9Nw% z5jP&K0ziOW?uviG38jcTFp-8=8zAvYqR&`SDambV^GP%}xUG@+RMBT@fV$E@HE9Ny zlXDjuttNS`g7DhDW!M2&G4k-9Zd%p$LrOF0L`Yt!$6Wefjpl+!4bAQ#aiy>t+W}yj`<>;yx)gJZiZlF?;O) z20VyyH{J25acA=lH~VI0B;rhI4HQexp`W(!2Sg^#fg4ZU5|;e`nt1;^GwN>@Ajdr$ z$+bxL$$;Z za757k`F71+LT>MP7!?k5RPu(OFtOiEnBd8YRVYY$y&KmG1mB(tV3gR{?|{^(C{$go zCfF;THK&EH_59Rt1Mk;LPDIR(WH1|*85u%Y4qNxfN2w`(=SYJHCBzDV^@pw)^u2{Y z^T?o%qBJ}~Q35OY&=vd9l)%sab|!zGBb%lB1q2?x$98X%M}Y2dI0 zvXx+cjz1bki+WyNnCwTn)3rovy+uF4hel4kR@%mgDudRV+d>yWaPH6Q02kIZ+Kt(! zhW4J<2I-vkj#nih_W<#!Z7({F8l9mXe|6deusZ;(WTD=&(`CvJH%1zPDV}iN4-8__ z+G=&$J1;&`*-PN za5)r+HCM(zs#zoL9Ohy6<%GY}dJb*Sf~mNG6O)mrf)X=N0?)`<^Od6Mc2wdF`P=Ds zq-?7$2DUqG4NwsuI`<1i9edmew98RZ;H)Qef2Awqulw3vv;t_k5oAC07VZP+%Xt5U zUZ5-F5OLAyjmm{oe(YP~-2W}&JG{v8@5Dwm#n$%!o09%BfzlV~o>*q*XVd!1ac5Hh zL$<{T1hxNf<+ZWar@{NXg{vYTCqq|+-mD&IDX$1u@nWo49M%f!~Qb;x>cD!fQsNQ^7j?Jv!>AE zbWjVFghQO$ig?#YUESSZGm2xUrHuUT+~$X(xpf$ISS&mnT3}PoKk%Mk*1><eW?{~f|cABI(CgH(%=RCks5yn6{%8G}DKykFHD1ewbskdZv zt-pm{8L>oq@xcGkZK#yF)LkoOz@Jvv^(~^xyq%cg!}6!T`K7ZY;5Z|m_vP4$ex`b% zZG7&6(C6xF$$~e&TRBl(Q?T-Zk2}w!zp~Ef(^m=_(Lxh0fE-MgN>{mn9gL80$haXZ zCrgwN;x$~vw#VBTJrEMtEXj#l!>T&VDHC$&niy94<&#)2M*zk*1)x7>Rnay+6nHEf z1g*y)ujiP-cO#Qh^cdQ5Wv_L=ig@b6=RFI+Z^y37a~H>Nh$bik{y1VLyh+DS@5RS* zN5rh#7WzP7@W*59vhkaUgRhir8sJR5zoYd%#vN=NKCAsg8RG-Cz*h&L9uYs&ZDX82 ze#n4o!g?9Vt@btG+uI{5M~GJ5EV=|pLR8S1WFnzP3PqfC2$Jo863&OC{g!GDZ(#(- zRjmlHG;9SRMZmj0*TgObLvCFbxg|4V3MkkY%j0(X+Uuk^P&{2DmQz6CpmRY5UDQ7n z@p7|TKaZCJ1;kGc&)Mg!H%Uw&4EkEmEH6M(FoSy)%!dWMzslo24+|{8{;1+7)(=l= z^#OkKmRKF6^?MtP1CAI-Ttr^C*C`#J9}D6;5fG0^N)hTb%oggT0z#ZDbpP#Fpo@xC zXe<6WJ+6xnVTI=zGVUy%CfV9hfxF#$lh_p>uvx16l6OD4E8YUMo8c#4%~|(l!GZan z7yJR8Od&R!6HQdm$;5}ZiPYP^F7o!)j2cnM(oC|XWsCu~G7w`JK^F=>GA49>OuOYt zt!!(N5$F+T@@A@LT3MrDOM0iQgV?Oy#Jxku6N&aomh9{1;Gfp};eoH1Hgi`+0hNsu z1#W?bKWH+Gy1{jZ*$!=PK%ol6%0VOY(zQs#v5{|(;rpW;!-!EsvBTeA5SA799d`Ma zfn?Kk>=jtrkV5Jh+^g#52#7aVMRHa4Y(KdUmOm(pLqZ<>R+Vd&duLy{lkL83Cxc=D zW0n%^mPzAw=WhWu21N=+ZmRFrwfByfXT}FE{uN7gVw8-&kEk;Ft&J-q@E2&Oj!b@u z;`X16{roj17maUjQMWtjokhb8oyfnEu%7Y98@#y3L&|;}(_Cvo%%AW-ZXrgiMK>MBZnpt=?%!{U zj&wEK8hsLNs#5#Hs^R_-k@oXfvzYDNl?quc8jf2BMboCsQ{{befcF354xNu%w*sLkjq$%ody^H$WFa&xk->MV2te*x7mvT40aGXAUodOSg}G$tGWcxz3>12MrW*7 zp7#DKw3xG!@(EAA`aX`w7Z?b$rlxD+-MU=gG>KhD)hVR{F0V6jey8=P;%5y_?vThG zJz+`KUJ(6U;cdkrXmaPA+^GmZLL)%^lOQ5X&6q~*muKOu6hi1+?0eM$E|)&Aa6%s1 zM+)7?L|r8(o#+Mue&&Hd#uf4e@J(f_hXT5j(U5CmSdOQx1R@MChOKTpC(>$ZFosJG z(FJzZBtKT$P?QQP-!2Js@8*Ve%UFoYb-vH=g!?p}EewIN5HATVVT*P8G<4S`F^=%X zTcobLn|8_pQDHEze~4aRT3YH}%J(P?oKLH1WoOV@JY^Tl$MUtUr6ZQB%A$Iq&oA|F zD7FV>8Rt{tG8`B2q(nkZS6E*TK26=2F*ZlNIxO!c)r}RIa4H$#fQ-M=;=-ff45Dh1 zAPk1&K=1_>3J^tz{4WZ!<;tMb+>A*``*iJ-?m_dTP0%TV^2*H=)RTCaeKk>=chg~A z!|Ceg0#4ykr!qQa`&uORDMec7KH#S(u3PUF3K=2szM9RbK8UN zYHl)_+XMx1AcL-$Ur>!3RfYLxM=B%l_GSWc!@1}W;t2DX`2)@I zJTf6rP>7HtJTKkcHDn}>50J4|NILa6Lss4h;s;jcXact1=}k0bMC1x|#kz&IIt{WT zfujYBquj)SN?<+yay_A9zTlqqbI6tzC?cMlzaXx*S+~5=ovh#@z(SVDr+V8$QGtiQ zI7)2YMOOv)WcQ1`(M(bFgi9|MSos84A&%13%W>Ok;>GK3hg_;7sI*oi zihpJ_>?iJn@EZg1{Q$Y+;gg?(8yRLz4a9* z-=v~?kRWX{6|&Pn==C^q*xr|T_nnu4TPd~S_}_S5pz5yI?(n-4QI@_4x1;p}Q4^_e zzm7K(6x@u*``ASpq-)0~&u=^rV*`|Hn--5L zaYWWoi^lhUS(G1Sc$M+LuhetlVoP)>8?FBRxw*nT5=KzCafA8^FUk3y%Tx{_zScQ0Yv7z}t$W&7Loi0d3#Rge5z4Kf?lliELc$ z2wQO04w#$lcl)gznpVCM&(iNI;bQz-Wrw4F)x#9R>aXC?9}P7sSy#63E54JPI9;gI zUOU?NK!AO?4Z}^B5DhzQCMpcYbu7|ppj1rL=ov^5ynXZV<4?_JdO#AAOh{~WM$$ZB z_sb|&2bxfUU5}`omEmViq}Q3Je@t(%08I&bGqkXA;ftsm|nWSM0R?Jgo6;IK-a>1h5$3+_6lx>i1;s z(uRDD@Bi{{@b=H>EyCugV`$g_o~aF&CO2Vt&4hs%XDi)2&C_{*qnptrR=c4)-}^vt1CpFraB z$T@>O=FxeyWzWkO0CjyN2^{-VFn>I>+VbEW*+Y4@B#B|C;2Kp<$fv(%mzT33tL>H7x`Zs!ZMd4&<;dyF(`Di-v;62Jmla zD#|srjl?g*rP&Kjsw5JwMvn%pA#CA)&9u~4cdp#=zGHvKm*V`Fm~x8BZMVQz)`4XYgkApJ>rd|6+;K|(BUatA zyWRRRj*lKy=_HD@z~Uoa8i290=j4KD7T0c(@Fx6$l1p6kQ{g2 zDqsNDz~8CNyQejT@* z#<#c_JUI^g-FKPF1|=CjLb_54_@@=+4abTtuB&LV(uN_RKo1E1C;xB`w#kT%_~*N} z*{{u#V{q2LLQ;?7v~Lsk5-x33oW<{4o#5@O|DAT}*XrKc6UQENv~DT(vY41z2|X=D zg5IOy>Ti{n`d^{klJ5`43H8gz9QPZRTl5UO50FQzdwj8P#X{VBYI*Q)xjX<;c|07b zIsUO9P-d3Fe-N|79v`h$)T8EL?_rE3kC<6muUYY4Wy2jq!B3rvNPC?!S_Pn)7qXV(UnYPpgwvP^Im`D!V%; zcaU>YPPo3T?{?h+2Iy6p%XLm39`6qfo}x-*PDNf?8LLr5OpUSdhVIuP4vIO7*)jC< zaR!^KJsxx5T9$o~y}`M2ovCW$Q|+CvS*bENEILQBQunUkgWh3F_1oFr zib%QF-_s^DQkb%DUSi3Y75K9zK7gYW*{;x^y!_QkDVy3;MdM@iW<6g}0=2_?mHjy> ze4c@7|9B*omb1y#)j1Wf2JkVC8TGR6{~Q&l)_pul(mkwYm+sAa^iW3)3Zw_Parq`9 z?y_tTo~L{iY%%sG{lhYggEFA03RaUMW9LzA{8(pOu(8>}a=i)81+UOX+Cf zgqJ1B+^3vAEPmcSW%8M{##|nDxZ|Qek{Ls&1gbd%M4iq6H9fl8FI-zeTb}ZD z+c-PS5)(##f4R^JP6NhQ583&mvUP_uHV40}+VL|}-{m%JWQ!WNo?72I`GoPQ^kT(- zs1)Y)Z))Sep1`)gKl0(KM{mg5_p1*$%cxvKDzC(8rvz4A{1p9=&TNRedV*!3$9H8; z85@dU)YX#f#3}S|e~izw^IARLQQCE05ExxolEHDb9PQ0OD+6zEW&7Lt{aBhz80w8+ zGOP6OvtARy9afDW7_V}AfSA9#@IdKUzdaW$#E=moRGAPX{Q^BVM8~1J?^&LZiy_V6 z<3tl@JsN3F#iiGYCg+^X9o4Gi@?wE-QDg?E%OKcq=!R`u`ZFtmQd!Rn2ti==NQWHT zbGcgLqFUk-<7v>}t9>>@MYA^#ieQbeo!umQoRVTtl{a^EPh53k{$qKt&kGeKI27KJQ{M-!wyPmkXxfKF5lIMSxC>ir{q~Y7;suMus6gI(XIsR=0oL65;h@Qy>1831D|6`$W2G1Z=2S#-?9_)GadV1SS016-aHj8Z68sO znM7e0@2F+{+!+j154=01LiBPkFm-A@Tl+v{^ILBt_WttWTk}7W<-6ZsjXS!GT)Bpb4DO>=Oz8MS`%4WEG-m;p&-xw zJ7ELk=rS|-HE`n}-G&Av)+nm~&aR-(YQ(RBUAl-02(fke3(uv#0&kld34S>XCE%HdfZTEp%_KK2sx-@QCMl-KC zWxI!Ex9%k(^mAHX08Cf*_d*-LEQvvwK^nIW(SYh$AY&?6tEhNiTX@xd?ifK@3ivk@ zq%{12{)!2)o=XJmz?3YI`j z3axG2=D&XSIxq^qsE~$gxt(zmX>K4_ZP--iUZ(B0Ag2;M8%ua z%o6)Q9VoB0YQjauhhQ1@!!wk>MRp>%R$uchyW-}YPsi2oWYbLr=UhD;gDG3?`K|6` zow_Y8$L-K^SfkZyJ5lb`_DeG0wHL2N!~Nie&N(;xz}b#iS8@G8U`wlr#QY+nr89S> zwYlzQoU$gam-+1S#Y=R$e00FGcYvCIt?Z_t8pY$duM{r=t&x|1QbV~_h{C^g1YiJ9 z#e4-3@Y_&1yWKX=wbBMnv zxxE#D1WiDe6ov27&Y}s;SyNHH;tQ_7dRw6dJ5M9IM4eQ!;%Flrig#@Vj3gYp0}J|i z0-5jCaI`#qVNGy5Ot(plZU+qrShO-+$6%e!(&0p$9YRLsP?t~m2`6;nZ1Y8c&r2Lq5V zP(ZpBbM6b$s{u5Xf_{H3>6}L3=2@`9ciq8l7XaiLVg(KY+*+a%ASl32n7Zh1=`8{{ zCjA@TPA*(3?ME(9_w$m|b^blWpKqtfFbvAI9q)xU*^h{lE>78I!I7SF9pU-bPcvtH zp7R+T-4OKRT&PXEU_CXIvq9a9N9;~7E~9i{p|o)N!hy0CrSOZ?^wx3D)`#&=G1*f*z{q)xd&r5)N-j^SIBXj zvd3UuzWXF(b>H(l3oTB{Ly{p|(vq$AK3Bj5$w~Xub*Dqlla79wmkXZHfkyM^v%)?z zdG#*QiCrpYb)WjqcMrNky{P)(a?nX>4Su4~tu?spK9$L0DdW4Q4@azFo1NyhPA9Hg z{|{Yn9uH+3{(ra0ntk6RBs8I9r;yx=%5AR^DtnYLV@>wPRUfprea$Pg$c^=2{`Mf{l=^vL@f^^r)rEKtiBm03 zfzU$jPWZG|{>dl3(7kOX;+5ve2!}T*Bbl^I?iy%QJc@1jhKv2!!iOGWsPuBw-TYJZ znz+^xa_TTIqcbYI!hEi}IAAcU|RO>$Pnx{huCG_8?YKaqah8 z(rm_dEsAfHjn~tP-xI&CD(ghLv3~7LhtRU4OS0h4GEJ%@4LA^-j7M!|C@$hgmxq_E zxM=U~?R49C)as+wg5s&^0!EJho{r%Om-_276{e)+RpMZM!b440>2_|%!riBIb

m zWzE}s_r?Ny{Rpd@fBNOjNW?$??!9g={OI|IvQxlp<}Lp{JjO;NWS1V^#**F+OI5wk zk|nkTe3M6sl`ygK!Iq+iWj)T}i)XH~D~^i|gxmi2gy%NtHbuB10?S7g>D#?Nstn&? zVs);RWprKlDRvj>Na@4DLaOm0|46X)+kAmv_1X%|Kzv|QsW+R|DBhHjHQruY3QPk5 z(DB-hjMYHXDcV3b)(p3aN3#DJ=&z}I_IcUgi>8&_ooY@qAG_gRUSM<^SN*_{tnN95 z0}4Hq8SFToA@E^E-9;&!myQ^sOB*E=?cbKlrW#e3dg;So~9q zxKK+smyDOpsrm!-1nq}B1QfT3Z%GrG+XM;cErQjZtumIP7c$PH%+S1FHwkBWSQPK^ z?k-ViYy_GAjy>bi`1^uE&E%!(wfPiA1L5R(aH{4%Qc_0LK4N0IX>hi2d|BujcqJT{ zZ=pJ_q^tVL?FL@XPgjk6^pSRy{=FIqFbUdR^kS&v)dyk&MlCAR`09CTRI*ax^2=@! zj)_-DgWI_#1X?G^6alImio7jLwUt9|AhvttXphxFkztm?p3E~VJWVmdJ~H!!-tWKp zptn$habmJMh`@n9p*c~{7J&#XPF$$k?wOq@{C*>77Cp#Uvhe3QoN+V7rUK~d10Ee~J zTi&66DmLwr@#HrHD&<4~V=NIq-$Ni6->wif_&{@L6Ub(X{5wc08U&F!BMGz(JKv!3 zHMg+P-oT#HR`_--$N>FQRScC|i~GVc4Y!YglW9c8kj(K+CHCJLlHDEcN-Pn-qGvQa)MqH^ou_Univ~`31el0_OTQ9iC-#7zk3WY@1oeDf%*M|_#HY#kbHLI&vOnm>LO))PinR z{8!1PV)wgh6+^chFARy|4d!Ke&w4!c*&G4tM)QuJ`EG9BIH0axmFYByk)<#RA9!Hc zyC|LsPDUOIA_|*i+@uRjM}K{^|E&jDElfv4uBq?u6jVBOC+!l`)_zZu@z82|m}~Bb*jpv@ zDfcAqX4s5HQqMKf{1wR`aeCU)mUp7>n2QhQlMIKW9Z5qysJ4@1>}kBFvLdthx%7*Q z6=RWniOO1tYz>Yxl$sGf91l}3!sXyo8D>k~ox35s zkP^D)r=sl}yRHjsMJs#`>p7x7e?#H)V8@CBbs2#P{dzWq*weZHx zbz1$<{r1W5)b`O^c_kbC9u%Ka_%*(JoEs_p7yg_|pU*V%kDc>{$vGc0@&2G*0f5ZV z4nBIhrZCqH!+K~;UBVczI}|TJ->B}v=vREs0GM2b66_$J2AIz9GawWIHG#IcOs!pg zj3062Gu^#soaPmFV?7(i32ZKuc>;1*xZscDK!ked^|}t{VOWn-76Zrb;75$HSIjKw z^KxoC0MK9}XxM3VyHAP!p=a^M+p>i!o<(mT{Xigg2~Dp7-?#Ml89{6sQhx;xAM20;jYWXrubi=<~0G}b8<0Cc<=O_(Z zhgsX>O2qJ`zwXLD#uuQvGFZ_+&tN)_OSp}DAW*v=Ouh3nVQ(0Vpo?I5s(iY+MII4^ z6hGE^8hbx?MegmE95JofkGMQdPT&Lm)roRlo`^W$6}~>0iu$Qeo;T4>eRBLn|B*0v z>tlXzPjH_Q!rskeICVNRs^*(8IcQf96}_w~3%Gmif18D|`L*t9g_Mr z7lKaR%`Ya6yR+bdV2D88w61|TM<~fUPbR(*I4u#|pf!FBwU1U275dtf0fc-Me2E&x zk3YW@0Jud5sR*o1l-#2eD2t5FpZV6^1x`P*{D{N9%WJdgT;m?oYzQ}kYY^0^miH!E zrga>;W)aP7FMw-YO5A~BcSZ_%)AePA*!5CCF!hwPVt zPSU{_jgi69XtxBPyZV9;HO<-$z;^jZAD<0eH-HQ_5)O_*39%bN+9^B~EB(B`uYwn7 z{uH`L-`_B;AV>2i`oOUJm`4GIFdcnqV3BfASh5Jc0ABQ*dMmD_A)n2AB^rD|suiDI zufAd-vNWb17ejHtN#^wZaKqLX`I7mD{{Su5Ehg6|hz*0=we=k|_kzkn>Z-|*J>V_A zvk(A|>wfY=l!ClGGOYwhAb-i6!i)KZM+;HA#2o$KKBmBI2?~sj;iet+$iImHa(>q+ zAZ_>WiV`a8IoEP-cS4QWsOM3O!+9M68SSJm-Q2}C4x0Wdd3Kq8Z?`AH2kT<6<#HKB!2H@C`jU8 z9frpIgMzciP)y)3?52nPv~e zPu&_gW-S?m3pvIe#@{1e7E6?mBe zY>$j&@?;x}o*5=t`m=Rh+#|gmA+67J7KH3mqa;|NV>&Z&?ugyOyA$`PbY>2;mDy0D z8$Z6r0BXyR&9Cn-KS$h%If+almzUJQL|+hD<=v6mwwT~?qzOzbHs6qYYk~%D>0U~s z7|1owX=*>V2xF~IQm^lPiD+htrB}m!-w{Q>84ujNc=*+k!3NCc ze*v2@{KWlp*?1s2fpFvQ{~3dZf~@u=8K5X-m`4V>s)mjU;bl)~B|gb#Kxc*=L#Rz4 zTxo){@HU)E4;6yE{MJ}90k^3I>h|mS18yUG<#WKi`D-iuhW=-zt|bm&adZpfWT%G+S8J`%c>GjWa<}nRWv<$sGQYnIGFNcL!?h?~K^XEmZ@#Bax zeiF*@*!$q^(3tuXA}dmzl`Z#{B2T~>L_slfzTKQC>;rz!%7*DuO|gf8 z980%g?9VpwydsAEx2_oKF||W6@KuS08i@1ZEXkzuk#6Z~eC4@$zlL2HVjkRTI1Fzp zBY@S|`_RW|KLc}K1YtQ=zY9v}@#tnrkWrqN5;1~C2Z$t@9_e7b>R*u3M9XVUwcV)@ zQSqCH60j`#y2dksIr#!I7UvAAH*&uH^n^~D%dw}Y)@(~;kCnW7HE{^sjJSFANuiRW z!69;em?Zb&H#1>;pCHj+@Y))jEz$V=f;x+v=$6mClqx>iurP=#Em5Pi`HGI_rB3vo zL04n-Yqjld3XADJjs-YmVVCsOKrdsOYiTNz7jWmoBc+3cAK@Y{U%c3d9vbZS|e#-A2QBLU zK9S;*!N-KST3j_cs+*jaq>olwUgbKFyv05FHRLR?^YcX|>Gu`0OmG;#4%<^R3lu{S z@I4TBdh~<#U^NkDq2=9`x-DzRuaG(nhb7@7$0IyBay>iiT@(&fZE^pWPUjpo-)dC; z1J4r@P$MW(7(F}9`bWC|uf{{thSUjFd6!l9b~{J{H8WBd$G-XeoI1O)|FxLv?+dQE z-!V^@;<$_wlZVH^#@J|B+z8D?S?sXB>igYU5*zvy{kGWkOa^0d=hzGK8an&2sV1fEP8>tU&F z_D=yHIvptl%CmvH+I;kCNL`j+Ualb4MIJ`A zQ5U`^fH?W;VBLaWX(L$-lLp~0MJ(WR^F9rl=m`;>K#vhlnv5R>7n8PsMmjN?f=PlX zGB7@Ra0fP_{sr+XtUY{_cVeIMHQ+(l3Z&2R&r+8zoaqtzTN-YHKo}eusmXD%b+>pu zb>g@YvTZMdvGdmXpxK)B^>#*xy#0B!W3J*$rLB?z54}G0&dGqD+lF+HLrGo!65a^) z^MjXlDpbXI5(BY5sDK%CDfRde3FSat3RpTs_=&-)R}W8{KP`R=^zG2_V6K;#aM|PcUAq^ z90Ck&ACGZ{@R~Ywv7k2%TXFL5zKt>u{R7ilK1!poYh^K-XcQz6RP4)Z~|yHu;s+2pyoUW+Rd{!UI9g8n9nO?bCxH z_Cqf~VIxM7t5IL+l$&c;l4`~McFLZEBOt&b9;CKY62v+(j!7cDVMGYClA6Pr2lG!V zx7HuAYrYLQkFNHuva@uXnP6&3*P%u&GvGJl$H~5G>XTjq9$iDh0zD($$63H`U^+Kc z)AtjE6;mb z%U$!^Je2ZapyuFpcckxTs!6k-qrFnSjJHqPtsa@hHTR31G#R&GadM~I>*in;6GOTrZ`#>a<W3bOrP=xM#DNoye*hG z8v^SSBLb)&Y#lb%uLb#wx&ME3(a*R)tQ%i)bIqxP(pJ715Vpof@*i3nvKrR0J(`N} zL(b8Sm(a#Dqhucsgt0sSL@!(p7X(D`KMH}aws^oZ-sp~sWb2^&Z)L-ak0mh6a(_n&eXRz zW>&*Nlodec&j)Nb$OoFGQ<(RzxADfz34L*$XfPK;Ug1N6``VW^W3cM4Mfjah1PzpJ z^>M&R-YuS#FVXlvkGqyy!i;mciPo?$-aeY;ps@U9@wFeg7XD=Qt(Am*3&-EeABw4g zW9Xpx>nPiS(}@=|s)YxWRR43Uxr19Ri&2!6r%J2##+vJCp_Y6QLw z6x|a<{HS178!y~Em)H4lryYQfQJvw0gaanRc$ks!0|aGp7>>Mk_-sJ9k_tgYZZ9E_ zv~2+v>aYOAw6cZwa(={Jm$bzir2@;`ZcM?sGr&oEoA{+l4YXv2pG3uu_q~@2IE+SU&YG4J_MWK{W^i5R(`E%9izRr_;n- zCjmN+f9wGS!-g(WTNx}^Za(UzrC?KwAsu)(uF(HHcDm+@>WdwU>ZUW<{r#8}mj$f3 znMA=)U#)S3DN4=iEwj^1&sm|&s<^#FO)ODZ;M~hHsl#g#$ZeS}iKIpO10?4(U54%~ z^)~H0GJKjWvf5nv7e3=fDZ3g9HY?t2ec;@QMN?iJe&Q_qKHI)X8u^tdrVYFoKfX+= zVrcd>Ez;Iw0OqMre3*ujq#59no6Htqcu^szfN=c4>1VR(Vbok)ou%U$7i|;F!agTTXmW>i1a* zBNa+_P;r%i(Kip{S#x%*q`30q^RTPO`cS{_b~)+Sgzmk({a>po{#9E;qYqTd`azN% zglMy+73Q#fgss(-WQNN9wC)4a;^tgA*H3YES7lz_$EvBvmA z=t_(FtD{$U!!V7{Dl6}pes)wzpPvtY5U`f|S_}caQtzKznIru#+H`O+wi0XSDzT_! z@(35pNMD zKhw_-vnnbTla_FW(uV=Axk3Q(cZ`>)W$#RHd}LiUxEj=_3j{Bi?rkkkrC>C z0I+?cqnJ+KNX*umNo{uk)DbjYaOtVon`pxr0EUJM%-=%2+l629Cl$9Sbu&K6pkMT@;# zX&x-hYtKo$d2r)U50e$xS)aJHyLreTMye>mW|M;BbQM15cK$4OW%b=|)Va~;+|O!$ z=Sh76ih?8oy=}FW)G*9Wzw_w(cNPu*wNREqXR;?=evn@C79kl0NG#u zG*Xs<5)HH(e@jdnwBKQ_TZF!*jV~Lw{g)du`&H=vcslseT-d?7$`8= zE!5nW`Yv^sp)bZN8+#?es<>d9fy&@@!E`6%G2z|bt43znwPL{zm&OjGePWEt1DBQ1 z6_wpFP4gdHCj{5K+OKpNnfhPI8XMhOS9!f0T8w8Sg|Y_d+kph;FgNYCRJl8FLaVO=v$XL6MX ziU$#VoZ_v!^^x7Lp{jUSeXnkp0lG!@vIW85QFm;`MI&8!P>nv~Dc@qkEkozzWiYhc zOD{r%$TNr&iM8jX>wn;XA+>A|7eXex^V<{cdJH|nh< z@IU{(PJ1;(WQBL zNMWUMdwL0+!of?zxcor=5?M#89zohnxx7<1DQqukS>4Rgc#4GBGzD&rX_&qHCJuwm zMn6bOu@PCbZ5Pw(A&4b&pP#<}m>x+^sgb~!KA05JdPNxM`s`FMf_1!UE|CuJdC9p4 zYv%OCTuG)hzdcCbXyv{-phu^CgJY^h3qI^^O=LuTd9W*xrf;;JGzJDQ+sUM-6~aC0 zMy3bQF<|@&XG0(hmhX71gA#A9k4vzcV!tu=?JgV)2rUBH0c>CNnEFJ~lE1Wp%=I;Z z8m#kmF1#>B7e4*#_OpvHRpdHgXMHo8>E#Zps~fvn^iK88F<}=wze`--eGhKl9n4ZC zf~}XT1_?oxVH->ji2;kYrwQ5(F(u#DrTmMT4Kn}S5*(Yp_;57QZW?gjWY4MWutk0n zYkRInsu}7R(#6erIw(y>pdVJ$a(mkOvQ-)_dNIkF0A&Za5f;7YaLZTyQ>>oxumo0`c(#JJFlhFJb!oV-1o0dV-p* zZ}M+7a6rn_H_|hD9Zn`~!C3=5)j67%hKA_Bi{EXdSg`oOxghF}(Mr-Vle8Jd?$O_; zcDm*+68anz0{1z?e0Tkfc{Ib{rA#Ezuq!&9Q`{a;Lg=YMW`C$W*4G2M9LrHVGvisG zW*gJ2c1bn*c6~t#iz%!!4udfjuD`w!&rNn_NN;;&D8}qMXn;Dl*?BjXwOGpIM|qjc zaP#|@Nqu{ zX%Yz)CE4pk(idKYAM#m9@BbuY1nZLC`?JZ>VsBQcO0X{H&%hQCz65;tWn=_#&LW#& zQeEOc-NDW-l*~nGH}?}Ttxq>A)^f=Ud)pEhG5k!Auwu#hz3+ApDL;=YTwr=8aD01< zoKQKkmmThm+VWagHa>Z$z}*3}u^Ul1x>D_t9-bM|Msac2-O;rJ>sXa?N0T4@crwh` zxWLw7JwY}14yJz1yz}McAO^3rpXO9E+5aHV`L#Pk5^yLz*5y;9?PBk&1$`^F(RSTM zboqw2J?!4P(|;`B$#WbmN$XqqQ@r*50rvuy)S2)4qrs_@<)XjyS-jM&%0g2G;&y0t z?HKVQ7Gxy19FtaMSJ7wcUmLr?(AEoFo?A-~2>F;;P_5nKy&pz;s+5B1utE|RyjvHn_#xRjB? zHFzb2?&xEO{k@twhUXBh5rjZsxq|!#XK#;|cv%S#U{9EP_aE*4G#cw!AZ6$5I-1#C zsCi4qwSQ!v#1y~TmPLNJeN@raS9P>9&8jZEpZ?P-!;Z~#k;sGlmoK6V-oGw1^HG>+ z@R%`NZei@}*}j`Y$FN$I5+$8ueZ@%o(%r%d@@*5n%^&XU*Z$AN`D_!LZV53wZ11mSOq&n=^r{V{? zU&z?u$3|%J`2b3ThEy}S*M&UnVuZ5ExR0y77@bhF{8*ilB0&i#yk>7UhRr??zHq$o z2eu~y*>3ZLT)ZALgZ90g;%3Fdc<`LOZ&bA~mr$A9SmXLqTVGW!$rSWzbkxZMLyYb( z9OX+|zqkHDO>xgG+eiuQbt2<#D}D2W9;)IdlWt{(_#++U{&@E~ospJ*WW-jxl>INJ znrvP5e<+@f<$Z1S&iK#t;PR%i4gF`yz#g)h#!K|ytBJj3j`w$0_B5#r>C_EEHH>2K zoQLFwIH_rXWp^VIccAdJz)buM<}guXcP%*J0;tZ5E=`cJi6M?!ZHrEVwuNMbVd&Br z%eFm?_Fu%u=p}-6IZa3+VQnYt?@swqsLDJ^FVYeVT!1X_ z{DL;OIArWaLA})yN9oJ#Ur_~AvrB;`7GXCY2$i9~*@*QidXx4vZ7#7<+x`))D*V`D zE3IO~l}A0!VrLLKx;qvXDAtyc&wD_Jt;o2kv_?pPpQO|Ejz_YXS9-E?**Aj98WmI!h12izm9RHhTNa2Ir+g*oB#Z)hVf^W#>U#XRZ_80H2sUf-)q(&Pt)6C__h(Q6<4Ig$Y`)dIHqOQewESXK z!P7)_kKcYeY<X5ToD#TW9*DopgM#*J_j1mEN1EY=(km?h74~D~isFfr@l(&YjqfD`u{1ZJ_hABW!J%F4}Bn zyWZBV$Pb_YX*{QBUXf;HxPJSB1sm7QPYyeKxA~E75{Dh&w||6h8@iHVSANia5{TSB zJhzYOcxA8gdZWpQN&MF-bBVIG#7sY-Xe6WfbYUiCjx>_E+c@<3lKg8i!@m_M+|KdAtMtW7*$f=(_YTs_;#Z&`s z;ADo&ITv$Hh4bG(9`IY{z}k2MPJy7-Z7bn8V76PNq8f%>fF`JDc4Sgpl44(*WWepl$s5zuTaI zuF=Z;6ZnD76$L80@n&c=-c{&oh1PahOc}5O(-^9cpQW;#|Nojt#_ZD`D$9= zZ8`aZWVTm&pus2ll-uZH%sYP$cIEyc&3vjQ<~r5K8KAnos0 zIF)W0Q%^{MRz2h~1sSXm_-@goi!R~n<6`05*xVwiWqv{~JmzN|r^HjavoF(x4nmL^ zBAGdpAp)r#oWXV0Vi{vU<#cJ}A$WndCKEDHX}BF6p4eN53gDgU&Oo*t35X9(aT7|J zY*x?(3_vtO7(J2QEHbiioJ7V+h-<)8l=P?{u@XZu(%?@^#dp#Uo#;p&s&MVyLg!a3 zs&Bf*(uM?;W)CEXpVa&_kUzUAI-&Y|C`xA`igH3Id(^mU`_-e(s@r``G4tGCbQBEo z%wp>m{!4Z$xYlZL0b#+wj3^G7kRt+>_j+};&Uf?rQ>bandLu_-w8sO;?pNeMgi?{yp;+>z~QS#Em(cK7z%ul-_A*z$( z3A@Jjt4nuij)iB!>IPHEqe$bjp9aMk*MO@teg)R!CBku$X{F3l%@8*y$1yKPK-#*L z$5PG!q@tLF3@_C|-gww_NHvkDx=$1|wLM(vWu?phVOMGG$B5psW0Ns*+n;994SB;# zC?D|!ufmJ{-cm6?{I(m?htGNZsye=Qzj;T=d~vTb-=b~C{^hCUV6kzWtl-Gkyt-`U zOa1i06qw;!l7Ra1L$sPpVdgA?oe0V^nUYsckr`*9p2g&fMIlogiwp|Xbx@{WN5H+R zfo$tTK~E9280Rfmj7kr2*%yZ&GB8aFXt;I0@$K2U58o{mZel}n0u5Feq1k>QTyF9B z>2)eboMGWN^&;v!#{#xkVP`L7l32uL_>(Glv}1ri#(Sy$s_`d#W22&bT1NOMHtQaq zIhA+)avo?F3^l_9r;CS07GkrAL4bmpNlJZZE*N5=t8ahvk9H!#C_Mn_I#(wz85~4SdNrWPugNgeBKw`#rGznm58|F zKkMj^Vkx-;oO^k(VZ6BgKMAkRmu4J@nQj+@$fq7s#wwP{c}_315eXTtz2kOnaaCJ) z2P%B5ySjwsf@*Ja?@iVp6bCo*3-{GAZf%{U3^zLemE;9#E5%Q!;Vl!hP0JkdQBVGD zH~RgO@zG1mi|;q8eo?Ay>gjI!pnot`Iy>rO<_ljBRz*~V`e2Fh=;|iByU+`H+QWZy157_(9zO za&XqBS$5rz+2YOe?VO{=ipi$97E?1m8ppozCw6(%4nb+g|LrbOymR!D)3{nUv zt?IOr68(bDv?|79tur69@(XqUYt=h!$r;vuMY?9TR~VUBs3v~KO~ycGf4^9Rm_1U2 z+rwt;j}}aCYxzpyy5SIft)WupTHj zGuyR2mHX$oB;hH`MfD>AEV8H^lp>lmUmjGVyWGXAVc~yp?QtL4k3LfHneP-2_mOty z$Jr^LW|~&m_R{!Hv_585a8MR8@g|jSG&Q7|&uo(yrkVaRloGoG`a8-y<%y%FCc<6} zg91ew_8vAVkLL!KNV;@$?G;n$#Tph@EKY}KJ?Jd9x@tYT(f%z>5Ngh^UmPvfn4{spr;?y|$@txogZUhCR3@s?#CF^(GX=_khhLLL~zA=9I>UV+o& zMmMkehJ6{l6*4_}i{E#-o~6Y`X~#6uF|^QJe*erhUmwv=H)UN4^UT3`u;*!W1-QGD zPlQ0Bl>>R+5aO^e%!0R1PdJu*DQdz+Gs>WMl;#Qy(a!ubMq-wLD<=WBrhQXC~xMKkK*G%8wxKq7Sg z*$~UYfAl!>9k~i^i-Maq8~^hFAsqmf124gJD#!~wS(;v3#xv>kEtJ{MF4@PvXh|96 z+jB87D51set++=vpr1O$T2BX9%lmDVM?y4O zZX2#A9b(sQ=q2E`KY8eY-2SH#8~{u#UdR|>B`b$6odN!rHGa&`af!)HyG9-@c_6|? zs20FNwS;@rEJsN_-6`)!Nq6A#yQ@j4$KHB#G(Y;y)x#pSF?YMUpV8LF(1s`o(hYwn z;NA)0fBiftmfD5D>{tEBKbEyxHur!zClLQ$^$mncKo(*_@hPhX(+<0RBC58LE4;UE zr3Rc(a-9TkJev!;m9dnZP(Am8Ta$O|@P6#gy7mdt=R-ZQ#nlNlL zsgVm7!-cwyd?D5g=+zehjha?ZV%nnf;y~Xi6&qe9l#seJ6>7_LS08lP`rwJy13o`{ zh6f;I`9;}TI9(7SbOhL;!-I3TFUC(F!%VhO7Ubt5I$YZaI^U+g;N{*jAj-YJ47#MJ z$MvwbV~5)PCTx9mlxW!3Y@?V$o(Ad_jlDg4JYuW|)FK2)Kvn}0_;D4$pR^{ zhK&5#n93S(Z{F!b%3@x0A~~Tjg@-0qQGwoDvH0S5+K3;D1H8KYc=m~&pHS{gAiMiq zaY$&>EA@ZotSSB3i%3Vn9@CC{SwFxpl41C@=fqxjY z1hUHp>U}n}iTFPY+0SByAmasciSHn>3qK0i#hxoWsR@ys z-l5sekfEIwKP(5C@I%@4sc=EBA(T^FmxVeR6sx6qo03h2P~(%Wq;C}WTPgQKi_>xn zY#WzIbs|9oGU@F?hhpCCUpFu=#`3@AxYzhqrXgQZt#;0&61!Uac0aUW;ww`MH%*ou zwuu-=L9#li95lb1cfPkAtg`Tt^s+7vMxA`fUb_H_Ul@IJ7*)+;Ks|e-bgpcd7~8#F z`|1z*C(lWzCujZ^5#cctHyqIJu(E<8(QmK{orlHxl=zWvUsvG1|8)OZ3gpUyCoy2& zOHCo{_Db{J%~+EW3;k_Ba6+6MZSUy3^xxuWa-`C8bNyr5pmS*UsmF>-wV;UE4eIJR zEYIBrH3pA9h~}Ek&DW|8ZYYzampMh$UQUCm@3;tRxk5-23)K<(#dA$~eWq?iVyu!d z#J1qRqPpn>qu!r!y2|0UP}NAHWgqXfi0A$I)8K30VZZ#tG!In_&1C%>0OX!!G8`+eeK2T%57t)MZ~ zb3-tII(Pt~v0q~fqwHIse>toK8oZC#7VHjZ1-W5budM5M%N)m?N~JRUUfC2bcwYD7 z`_FHjWj_K}YxxCp-<(I2ci7Ng?3-YT?OpS%`I}!|>_zVn>D)sP=+|o(-YI*1)bx0* zF5hqJ{)#Uu+Z)rq$M^T?e@hC!F^V$c?2DrxY`B~2aPW~>Z-R?Yq$)VeAuJi)aTRYSzfae?( z4!FQ&>wN1(V&V#rnjye_=f+XgFuZen?(?iF{?1kI4m)IaA(2VQN)?P13b@cfUw0Ccj~-JTa9nBMkI{>&Ms7=EOb&Gy445!JY}Mwv zhtScb&Vsw58^b^W5@?|PtpRI1gb?EX=*i&-*B$fryPHjdUPK~flU+~}krF`&_s>2f znBWkEKm3j)F!n!!4#)sKQc8i+;y;!T8re_asuDvTqbrFCQCAmo=o7yBmp6O`L!1^J z+a54SjR&1om~PUOBfzX96)J#D0C^7p?UI3{B!*&$jtO}OxKA*7OHb+dBCO%ZI&oF@ z9;N#^BjS0>6|W*}u_e6vZIqIEV2V(D(gkG@m?d5{WwAJ47SGN~0M>X;921QGfJ`;* z|6uwHjCQQCg6)x1=T(ogpO=uFI3&xtW*e^dtk#C(?t${7;Vv@Vfkx`d!+0mEFt**= z3drDsOf6xQJwn*E>whuFTn8sm>Wh8q7dogL-T;I8Hmdo6R6JP-7QAmZeDifx&tb>1 zUeE?0Ekl%vxIq(cNg@hi^a?Zra4G{trtfC(EOw>LAf7v)5)FV(Ap})?HM{_c1P@@_ z8~6tuY8;Pvz%g3kT9+PQyNRyjytC~sQ8jYTBR|G*yEZC?m&(*5NcIL$XQhxl0I~xX z;d(v(R;rFvdKq`&WCVdZN!q;|+@~dn^3e)Rg2QQIA%O0OSBcGD2ug`Le z&N<0G9|Kt#AumNIPe*C~#Z)E>hh>PNG-dT8*j4?Jd>J1&?@)DAOA(Ufma~BmP2Sny#ISjFn>=Tea~T zoWN#+{{xA?J~$wA$jmD-Al1rdXV1trqs5Pu_Cc1Zpn`38@embpdtZa`gvDgQJGzP|HYo}_H zYods6|B%;K>C952#N}@@^;BM-kK4Q(lp)P~ebH>j=^AtXjb*}kt#NY7)?SVHq5?PU z83(85-JJpLNU!}OBuC0j!(mZv2I8b3M~dR!;-&TEch=h`x7b{6eJ;_p0BJjXuKHWX z*M_Hq2z2jP12dGBnAiEZZ6Qw&@oRt)RI22DINOJpZ0SF+k2(M?;AMcZ@q!s}NzoDN*qcdw0dn~wKHg!ei6%Kfns@%@Y&uh>cq z4$@9cslmY`)kyts6Rs;9AM;R-)q8s;EzAzz>FSx>a<0Nu&gBlpPbQ}OQ4T)65`uD$~IM8wU+YF&b z$=rnl4`addctXdT87{6Ll)w$e6gSW-vOR}LL92QTxm2^WOKwgNli1=u} z*s!FdPw594I!}W{I*1$aEJ1L^lg^jzApEo5U8;>TjDA`t%rFf>k)CWgA+E*zY`8kX zo?X$7G5w*$>EGf*RF_(Q@&oJrpIc`MIQQZ=%;*7A;6WGFzEm$$2|0%L)6{vO;FY8@ z(R1$~6-tygV=?LFspl1}uiBQoAatGJRt%W@J3P3Q#nc02^+?4JLG7)8X#|WqK1rgoJh5vM9+< z%c*A(sCEjIJd#8r{wc{Yk#6A9yFJ_tDC67ZPO|n%A{XKOWe$WyWO-GPBK(!wkt4qE zK0x}^ISNR3ILF9T=X6aXr2CPqER4`k;;o%o@I_C-IpzprajW8 z<{w%!GSraaD`->=z)@AX+R!Gefg09JuI8UY%8+0p1h;!YJ}(b*(=~pl{FZcR9;0S5G){CxAg@xu&weDqapy73HPL z*Z%&to9q^CG7Ve%3|(Pt*#}pM#X>?^d64oSX=)s&Eh|kH|DRvQBqto$KpK#i{{ZkN zh)y6zj`0Z(0ktzMcP``;h#{WO7UtYe?~rbN{zSu|-=5hwt!GFZpZdYOjR-epHQ)|U z^K;!TOi3w8;g zm|eEA# zQdkiuzb+*JtNya86JBIJvK93FasVZk;t!*OCl@(RAW6>l-F5PmP)pO&SbeLb zkRAa7A99rIWv2(PGN6TXl4LZ+xsXLJg^Ys&1)bD92^~$%J7@gPr2E|CRgV&ICE#^#Etv6&kUG zf;)=ftrKoex?wgJATcZs1;uo95HPQKze=pKd0+j`vX&Ijm7YK4`0Ty&J5_qkM%HI= zbuV{~?F~o=!y9!%&(sY%ZGr!S^ft;ydLAyzWv?evi%sD8fR#2g7ReyHyzS2xr3Q@q zIJVuT*@U<~1pZFNwS#kLrJfF(%|roj+UaTo@bSB74wAb_gp)+XDWU>y4Il~FVG$_B zm|#&DQiT${;j3~;s3t^(K10kf*LxMS7$({(t8PLE5II_Tc9|wB?R5FtX zNk*F~q?f}S+Y%a2J8q%&B99b&t+25Zk6BTz>NXpwm8^91GW~9&b6TpOtn`51%WNtC8;qAKx;Sw=65VVD#3pr6g6Aj6-}g(VhboJ-U2%wGYPol@Vs0 z1`vye*u^=fnMuZ5$bW5$J4A2(bFyCSxYt2u=`+Y=w@f^6h|;jE1Y$@lPfzejlrTTa zRtusH+AThQ14}p_0!kBB@)MIXfI*%}W_0^mDp!CSA zA!IKdscttK_#09&nn_94N!D&plo&{&Cy0ePq}sTCXKHt6%c*n|aU_&u^P2&#;}&0u z313kyrt^wDv3|7RM#u!&*Q7U;$L2x~YkO+h2Rm>}YuasaWnWL|>WhN|lq4#LE?ZfF zcS4^mFGFQ0mP^_>#s44R@Vs3n17~r6#B|qaWu6&1O#&W8<#?Rbzo{5xeVygI{0ll|r;@vu| z+;90HJNM?8OvT^5^1*;4dQT-kKkNru_ zg9i^NKBZtVvS}EU#)+?(xr!P4akO&A;AEehn^nhktu8SIVs9hO#!$Y^3OxNFrjS*v zuGO@7;L)pyG|h;B+_uCuW#JoTWjrD&@xWPJO8R+3KG_yYm=7YW-bpocbbt`v)2u$DUw=;1C zNCCuNnBPEoinx@CbG)@IaT^)KA15U2mAj^5yyMb>_lqsFUF6oj1!@!pRyzRlyDeTrq>QY+J*T3R87A6rapi!CYd}(9r zm#{krTz>Q3FD}vbp)8da2lY)@(cq~>U#t8Te#(fp=0Sg+?K{OtOgPX43x6wSQNC); z3!+Q7>KjcigR^q>Sf6P1tD{*fEK}lYwOl-S_)7P+tF8QuaZEoCg)s+{(Ux81`N}Wp zcjzQK&Q6xQ4snzC=7(Dbj>cOBldHQ@{gSc|L0u1a9y?JooWWZFRZ6QyU5reBCpvz1 z6PT+2xescfRt6||oHlPBy9V|}`DqQz5+6CxLZW`jn!F5bbbNl2GQv`*zc=YrVV=u# z#D#Oz{CYlp{;BfZ+}zgBS{TvDxqP*A7pYuBI9@hxt<4sG3Ayn10-IO_(#5q>ZpuJ# zjWc!Q5xuybWqwID@rMKXCvjXlUy%smdnB`D*$2Ta;nCAaGlge1|1ZMcI~>lf?ca|Y zC5dhb(V|41h$zuT5Yd8Qgp43cqDMpekj%^>uwx_t~p8H3p=LglW9Tu}ps^D8@NOw|p<473HSWZpl^PH$3rGFabBu zauoQ4Pt$7@PwsZR%quP5#2wQ|5}Oc4?}Uq&V$XYNg9Ue#5MMtOx5b|f13u|=42<)A zMUAGJwCEUVHFPI5XT)s>om7b}i`t_7>eeRzSq-d5@YDCWv$1Mhowc$q#8y=ksz6jcKq(E~0sdzupsr@38G4YQxF6BE9kgto| zGI(bhKZWi&vmVc2rK3NXmodp)B;$LCf8n+N*1mmr>jPCcx{E_J|OLahl|$Lz6O1oG+XE7P;1j5hT2u6L#^1>udc_Eqfc znJj8=U#g{Od}npxR}09;NS%)5CR&5GvXUKYQi}rEu0m`gfCr1W6g>-4p5OYPkdOYt z3Img*1}xIG7sM<==**T|aoT}4ZQ@UfjJZRq&_S% zp968+Fig1L!1%QQ z!YzT~qvYmo&x0*;$O{km!}W%v_TcMXi(w`yxB8Fvj?gwHa)PH@-)%~LBAwHJ`FJQF zo=0l3y0F4mX3BkM6N+F|nimrTWd}2Fr$DD8ufYrF7l+VU3BgnF60ZVJUP5Dv$_-|! zTu$}{LWcCc?Pp6w%9dpE;br*Yo8k?7k*57N&MK4GU+EVd3(XbTR=wTuNS<8_6M+GV zM#inbcy>|m2bCu$b0Z1gr4&cV26uno3n992KQt3^9!_#4m{8$-509@2JV$Y!VpB{;j{goj zRET5UCk3`43Yeq$w#$Hy!29a`rXbd(_QIt)W28cUt5c23)V=`xHN-{k_>d;ar%?_H zZ7WGk;D7F`Njo}mO5LVDa&c=(P5T}XX16=9s1I@5;YNr_i@{32r+Ib!;Ae*!puXtk z-p50lY~}P_MV%wdd_?)IR@&x`woy{uR=b2{Vk*LxPSaSWd-G(hk!P|+av0YwjW6}e z&Oei^Qfa(Ti{!|wX5On*?;kUjd4G=F@^O5k-;L=g#ko2%1Or9+?F^&y=+Zo;YIY{O z1Mz)5_KW%l(p;bGC9{#77DWAgV7TBua7Ga(O{VZqPiy;EpOaKqtBYfA;LT_Tk)`C# zrSM{S={rOE4bH2C3J6we0@4pB7|EFZ_Q^-SG8bZ*LdSpC*6m5Q>&4}0C^#vBVM50z zrTp>auto-`tIGW)vC*8{U+T%B^#Ugc#=Q^h7yJffdnO|uM|B}yf~}7q<=d4~+ug?> zDoL!?zS?gMhOFKn8THEqi;nRQnZ}F>u6A$ses7Rd4U#JVKW;UaiXSviiEe!Y1?sJ#+=~*`zjRx09{jl;F272=gIg z5TsX6pPzC&+)#MzTsNr6>nD*mkFwcO3tike_x0^faC6v5b`lW78scqh^l67N%5mAh zmfA%w`pp_6Uc+kM=-K;h6ge%_VV?9U{3VK%9Qt@vu}k~8bThq~0MtCs3 z*@Lrbgs|fCn=fp>TGZ?kN5i#m_CDoP9%5AuTkp&9M!%yh6VCB}q@SL+wtZV{XZ?SI zR?{&YG`yswm^2A(>`WZL!N{bS{d=rMm;__HUC8=v(o|p@$u1YJk1Y2cn?Gs1q#reFnVnz1{D^z5 zqIq?*E%M$wzS4^tTi@n&S3R!QF0>|+d;(LsKrg=S*_$K1)FH(cx^twG7eY!;O`fITYT%a?vg>;H{cFFoA;ZYs_fLprGC|6ix_J znpu(M?%5B5W2rJ+yxdB)%j zmu4NBo_dkL9!d{=1=F^!{x%ubTqMZCgYQD zlNvy#Z(sDoG3hGvf%mZ1o!#lzjpE5qzx2}CV2LB|BT5kNVhSAf>u0eN@~94PftS(e zMLq`?&aJnFV_#Htr5A^VMHZ|Mr?#?r9&U zj?H_vZ}uVEBn-;&Q#tds1Wxul5n!S!9NVU@Wb%qr>ka?ty`V-&qt`LWWHLF_yDU~P z9QwEFrnm}vt9BWc*wZ3Q29|vfp6lmx*)LQsG;FB$GOX-%G${r(TNoH89eKaG5gI7! z)p2#J;i%|BM*8U8i69pD$%3fdKADBTk66RX@r0JkBt}c9io*bm)QEOfJ zukbfrPtr^lZVo7Ps3H%BW)2x`W?I&zdkh+e##Y^bRm)DAK=TzcSEh;!eBebiROB|? z{5P`L=*xY2u)S=;^3wm@*#*VpSxRZ!>!&HqBVol6a>3u8(9LQ!vL;SrZao1WS@qyn zReD9yhK?$?7ilFCCRRG8)UB)`6+Y+ahOomwvAGmh`_n6d3Gn{*N2jn4MX3kL$d){3 zIFt9%v-{YiV=EIS%DFM=qQ1Jr#!nD%dwF8q5xfv>m4D&b!&+9Y@bmN+g*WYJ0W0;W zyE|`08_IZ)3|K+Uu3RiDDvWzdBLSBY|ATMK zIUc4OQz86GWrvubMrB>=6qq6YBFpzEInk?a&FL5L!5NK;lo@;ASw0o&*b^b<#Z{$p zV)$b@k=5)DqnCAQwMHtlKqI8QMw89?4taz={IvIkjnN-8u-|&vCKD0_Ae7LX(Drc2 zKeKM+3!r1$3=mqX+m%y6>7QxyU??TMkIaUeCDI4>B+le+E_L)v29zejKtcB_*+Fog zs39G9Xvo7OjOj6^v{J~&Y;K^W?l&t?HBE)EX2&{>lh?CgjO)!T;yYjA7eTJJDo77T z5aqwRoo|8PX0YHR94WfhPF-tJm30gBwI`969(`V3;k{c$DU$N7Q}vDeKZ)%1!ATXO z3e5qNVa43^Z_U5fKV!?M3J;d8?oHfNuf#W6l)Hxb9D0@$)mXuy75sdVS2o-cM4H#j zF6Jwl)&fOK9`>@0-q#Mg?$qL`gC9E{b@|#;v*(7G-e+BNR}?7XlV}n-E9f}Pd?|`@52F*KE=Wo;0=>#2 z{mRKs#eFaMhKpJ&$;_!nQ?D!wV(}wCmEQ&j>Wt43Me)z4oJR?!J)RXQDqiEuONF>8 z)MN|dRqA@M^Je3N`S=IJ&A3^gu+N*0SdiHSvAW}!`tK_v@g{I}Uuay)^jocc>1j}S zvv`l7??6)jhE)*nFZZRf7qn@Ncp3q6zT{dDlC1P1?S`D|rtm@z*(N4zZ+a=H3jj9_FDUPL1I#3l|rB>(Oxs+?rp3aG$M2;PMivm8Z_dR}nTjB~a`{ic*X zO_%=j;7Qvf%F)|9+9zG>6IXwsTjG^NNP{v7)d|7)VAL6O;eE&Qjgr~7P-veNx41D( zOT9e2ID$RoF8ye@rU~=J?hhi9{% zdx2LF$rutoX)N;Q%eVd7Ub#{&)TQo6Oa0x)VwRce1~}&xLYMb;*F5){(UqR8WGlvQ zOOI<-Lzx~PEElxUZw3OB&t1FYe8)q_g&@qB!!+$T8DX16-{Uj?tIhtu(ypBmED`?% z$voxbjb{Er3S_(hYrW33VZ{)tv9mm65!pb!1P`-8{ozx3$rmv3QUa{$yZp$fFWk>q zRoa7^XAdo>EV*3QT4b$RE(=qU08xJ8aVH7&TZ>dt(s+Z{(ma;4D(Qytk&opm*}l4NGfvjOXZFg4m zocngg^|+BNn!^rn%JuK71L@(HV*$@*b+qb!2CvI{{Drw~b(pnv!~DgOvEYNhi~~U+TS;+|<^KcOtGx*@tu!=t zxAPk|Qss2U>)6yBwK@GBMCVrAd0K6djw zSDE11!|9O(jOzK;dydDjGgDm+#C=GRK|BtR@H6fg)v03}>Xzeg zR$Xn`E@XPH{pH+a>P5u~Ry=P8wE{dptX$qyqq?lg+IhRHon-0d4I?jEsvMatN$K|Tey5Xj3lyy<{P95OIp71 zu_{77W;m+aZ@2}ZZ+UMg8fw+#r43mm-0nzAds!~CB0hT_?j)Hut*A+HTUOJPuUo-W zVkYX(M^7<3eo8I+?4Fx}25{DMoASF8C<~+NWeQh|gKyW%iR9wOC~sj_M8RCH0i-@v zCS3U@;jmq$gGaq#l0(weKe66V7-yGq`7DcDSdw|H>Jkgc#>VO_f42*)=p3M`R>XUI zaLLVao#c~wR2YTx_IRYZV|vm+Lv+aE*o>RFXZQgw`E)6xI={c!6{UCFXH;e7b!F`E zd~a8odA6LXzH|-htjGge6k34;NsU8|%Tv`&g6M>=l){|O1RfT{M0)RJxsrLpxQbes zzBOi%ha6(1&^L?omVl0aCGMwawfRIN{_Xk;C%2^v@iG-!IKlzk`J?6>j~ zR`rEH8ATO?eLhh5d%^}}GI>m54niM~m$}YjW@}sio5`%$&L~gzF)|-mU~}K(s_6%D zTd{bQiar6>mTHY;x+Bd)jHZ@TGn#qn^m!?__S>QFwSDvh{!w*$T^4t~JOWwXb8`iO{E0m3i)E9Va?nfs)a;w~ z+t|b}vTIfp>ifL{U>kR^q_0&CrTEy8^^tkh6*qOiE($qzS%wzP5XS1(NLP#$|1gIi zkyF?Q2`dbZum8(=MeZ|>9VJ$_v@U)f9uaF|*N0@kSfNQs5hl7WBC zYSzZ=+XfWcz7yA`Uat7qR-FDBoyfLNIomEfBIOi$F*c@QpZbDkah&Z#7uwxD$!5`f zlH(vt1AFqT;$XN6Z;B`#+I47(#6;AzqIggHZQfp>+;2M+j*r*5KRb1~2L?L#4-W9RqFP_k8P@RiNaq{b>nAU9L$4)feN5IrI|5X+OA% z=g2{S6;4Yv8a%I@b#7QCpm3weIFeJ!_Q^CAXBy{G?7;WEbk{ zN~>SB<|~*-AWz-Bq81-Htn_w4_8z*u$gAXSAMZ9@RB=5wo6!01@Y@HLdLQ%%@KlkK zAj)AhJ@T5gw48D6Mr;L7Bf{;nv1V(lepR$fZ`t;FwS&?{Nj2_ht&sdhj!a3#B z$Z~~AnlZuP7?C`df6w4iW2Y}|D}FtB^9DzqDrO)X_gOGCrSStCEQ);WeC+mI#k&7HWCf?)iK^~z3T!oEP8i7^&;apk+LBCQiTqk_WgeG2nYcJ>RUbvpHcQhN9b_W~p z8G8RjMIVBlWa3e+j&WohlQz=lq=X`bqrY~ZSd+`cVt$Ihv+E`lQ$?t0oXYMtZ)`}C z*5!HjQj3&BjLbd=KZKPv_s))5<%gs$<&D1LxayPbEg}BA zMEURBiScc?wJz@Z6ybTAsG{@I-gGDnb1wb3_iogV>ASPzy@>!zs?k0_w?AJKj}ToD z(_is+N!NGjWf{l3Fmqo7z(PZ7yMLr~Tm^f7&!P*0`C>^fH9E-m(DrOwp8m_D`vos% z;|lD}{RKZTMP_0SX%4a|+%)Ec#@1HCQ8j*aHc6S1xAvMPldrxJ_Rm2EK4&?++ik3f z5MBOG^W7VJ&B44ADTta&0;~DZn}VxkIw-Tb1bw(;6TMTxj@bql zw_=|DaFz0z=WU+&02TAx6fMR$@^vyQt>_PNYbxmo?giE8xft)6G6%%ailJ&Wao626 zH}~yzhq?rr{(lZocLqL}Jpb@w#%2#(vhq63A6Ew5D7LwHOOo+8AdP|V*&iyvJr_7t z?_Y8Y*Ir#O#4zAshTI>(3doF_IWb3CA3}x=JiCIoZ2U*FX+C9 zCLJDmzfC{1HWshfN_i??#BMx~5-53uc**b1@Y3W$>VRi#*EC8+1GM@DTc7{BeL#G= z^7lK_c4yjbyj$!cTG8SRg%@;O!;yA*K0l@R616Hm5g>TS=u9oFfcVO`w;VEwn`JBu znHd-vR`$B!Z}CePpmcud{xeUf2tuOa5M$8OR%ESm%KjMyZ*8p_!@qv0CgzNmP=zVT zsBiqG87SQ8f;%%b!DiO2SvHTC+TLG{xjnFl{F!1cOdvJn*q$qE7eouF>uYW`t0 zCIq&DB8Z!x*T>#v;qdjW0lP}~C`Z9ADdDGf zP4Pb6zrE=P9W3=5f|J&txzK2a4W8hbr($+Hn_kSRNKq!SSKeV4W$D}}q4Nuqfc0+? zBIAh^o1BK5DetexmezAGgZFA}`UEZ7Ub`8>NY$`Gl(%`NDNPAQQJ6+u1(Wvqq-Qq* zUe+5{*aNM4raI^ogMk*)dQ5uJ6b+8N4l9wVOE1cl&>Tg0j9@}5TElupovD=BH4S0pR+|5yt5l&om}n$X7k)T@hI=^u10ubM`?JqH2VM#;GcTE1 zU36hq8m8#}MT<5V*GgSu**QX~W2!eIK2dZCx1=;88kR~js;meYEtSrF}Hx&J?$n@cCJ$M!8W~O|=ow(yk7Iw>JbXO-( zRJ@qP=B5Vq63_J@>#=Wr-<<&+Ip9lN)mASs5K$Ay)G!CVp&D(y5~58r%6a1fUn*=N zjp>KDMe_`Lm3AXvgO6ld!t!z-@-ae+h-!4)2IwR+H0M7uETrYfZWp!XWHPHK`8dYr z#NRVlcrN{5W4w&Q>pmyRLZYcAB_4R~UR8b5@B$?m4U*QbQ1RUx=hQ2veP<6uM>z5Y z!7!P@s%ssepae5F8URhI!jGO6_x8;#vTM3=KE2uJ$sjLj{rlr})3!Cy3B_Zb_yS|b z_(E8e*94b$T|MTMs*=_lKvy5WP_nJLfaK{f!6V7V;wyDh#tcAY_Whh__xrw|vo9)0 znJ$e>=UxF93uCl`6FWuhX169cK7W}QN zlrO%;C4LcFdsT?eZoPJ4njyGnXZO)yX0Q6)n%mNgNF?U?ckh<5sn;&@_wC9)AUw)j z&ib9MBMSBAsdAY3Au4Wp*kvp)ZJnXB%w?7lB}?sPqACkcGWJ`QWA& zA1O?3!$wNsGLU_3Sc*?p)8o{}-@=ouf2mC^4|#YybvHTC?inEU5&= z#a3gKni#+IF-UOjxSNl#7 zw}idc>xJP0vax*m7NQE#Ea|O>;#UVMFwEBmT2-5BfOtzU#4Cz3$Y(zLFfX6&acPWR zz&)BYQA=y1=}~DRI4Vpf>tN)x) z{D|F&-z7z+b`CE?K+UB(af7LNGOn<*80~e9f_-K2+%rlkoC8fDVs5wBphj6_?$RyP%~JQ7tzhdD)vKP1BZe<158YIzJHEbO23u*6GqG1_6*WTzJ!M z&@vn{jc%(3oo*bc;tFg=-A!m=kBGJ2PACn`>ve11j4NK{w9(`zd@uRBp;6ke6kiK= zycQ~T@=zp>n|M=B4AP6Xr@v}qBR;R||vOTS+n{;`s_qL{!Jt|-U>N$)%}#W zZS=yE*T?mhBP=7Rq$mQLUTUeVWhsJ&K%$N)FSb9ipxX^_a3+*aD39?_eT8duv_~t|f>yf5 z+B?unfgv|j{{n!-33nUupKHaroYT09(ywkEFG=<}c|?T%34{W6gnh zEs#i&?Zn{DU zl0e((O~6eTG)H1}fNsrcOCSc~0^Gm`nC}Cr?w@6Wdux?-k{7ga*2KtCg1*&o9iSfQ zG)%n*ShAu0z$}yw(s&+kYC%BgQ__0Av!6wwS|912lACsDuZz=GuOevAr&y$4ql1Sx z)&)as(R!fwMr|v0=Lw$Bc$Zt2ym3Mwd7~OQOtYIH1pxw25&#PFs*r{=P}G5AoWXWe zml#i7@v$YyINVZEoNrCMdb3;prPjQv?~TuzlfXh_SB~ZQ0J=P{F#CF3b~gjmMrj=6 zM$4Jgy=V_I0O3`*VFv${LkkZ}qPI5gxB)LwT*ia{1qn7W(BiL)BjzbR#mPBT*Ew%OZk`@pg=B}N%NE4{QapQ5}JCmNW%3w zfX4AJ4m>U}U>LTKjLYch}nl7e_oo-QB=N?oe# ze=+baTY=iQ$OSFIQmAU)cGLE&$51k0Ew}snYahM=!FoaLS!^ewBCCyVN z4WeyqhaLJvCXE*=qw^4O~Bt&u{s4JBFgnD&pw!QQ}oz$`23 zC@=fK`uZ0ZNk*%m(fFwfLE7a_pM|1tKdK;n$b=4DuqFUWmAyBrzWY@2;673n2swKL zFx8kRq&9`3Eo#Z1Yv5c^M3n$s{S=c?2hxrqQCB`fWWVij5aVJklmvf$<2OLQP>a!E z11Q8=4SGQ`6bwLIo8^m%94HJ8f6=8*@U**G4X9zufi)iS;XOz?#w-(}mCo-ZL-wA( z1=5z`7b7WGbofV=+|d3Q?1L{dD*-TA)l|yUQDYuJ(QY@#`E98Qt`AXPZ~Pc4SP7KN zlHq`;UejO+dX#gk$l&xiMddo`JEPHxo)Gk@D725Bsbu8tpu6EEgx!wcUA0aRQ#u|I zXt2g;Q?mI8&RY&H=?EU^eo4LK=+CuxB6fBL@V%T*2DX=oecTl!Oa?$FU3mXLmm?@N zx_~P7k~WFE$RdC|N#?jQ+5lT24fm48QJWR0(Ngix2Lt;A!YUFFMqCuxuN7BNA6QRP&rUpp{MyapJU$TayYD z7%KQ=eHMt0fE+)T{PN@;e+R>y#zNbnvqvJAD_PO^x7qCg;Myg^bDw&2K^P<)z>pS- zT?=^NYm+-&^LgDGh+x@G_8)vn!9~k*69cqXUi+)7bPC6ULFoNE;{F)?TOR~N6x7VF z;aAcJefAGb5bM;3a|#0Aq=p;S2Cw5SBcx(0s=T$eF3^DQ4$j`Zd&wER$2DQ-uFYCW zsLXfA96p?m-xOFt=t+i?L7V}hTaSh?XRX{h7M}0Wgu z$7B>>GB(2-MXA#QAx+nz(f)CeeSq!&4MBDHr|N;+q9aIj0|yoWauxeRw(-IPM9+Kc1Z!gUsgM@C zpE|)|4w(&hlYf0{C6J%IWYH%~52rEJ1PBB`I=~H2esbpED?IRjzwKc8W%0v*R#kB0 zPRAY4slxnkOzXd66}S#xDj$<)MDz#!KaV^N{r`V1Zk+pf&ao7E8XE%elqRzKJHPgC z5h~eE^iUbuU8wHgZH>>Ec$+F&GO{llhLb5M=DCLeYm%XDjgZ$IYBD_m=-@0$T;L`X zn)>k1I2&9Us(Zs9N0PF1%L)JiN_wM0HP&Zc{DEDa{O}UMtg@YoGYI?cJ~D+%_`d=P zf2Nlk|Gx0hrF@e%C}a5 zElnY5=l+Aqun&hYtvmaOC0c#fZTuLM$N)UHt*CKA86vST0M?fxa)Y1)8(?n2-{|4S z3%CRY2{U38#{`&*tgC!2a}WrTZI=P~G~3SfC)Y3>#Kl>&I(M3p{^Z~1h*-&?SDQvcmPJxt?U2%XYrq{ zEf&V9tmMc5hw1t%Zj|zYN-we`yl4b+`qXW33sm>u)*pmZ%9!8W?JoP`8189qvvle7 z;yK-MS`Gq_0Cs_UQ1BK}htzOj+;MvWs3y!bm;8X6JE80~x*%{P4E5IfGKf>bGb~8l zPy3i9uUM>f`c+=$en3Wu-s2F(H>nC8W;Tqka0TNDocLjpBJe}`OcPJb~hqGSI_T%Y)Z{?muId)W1afv5YB|~q`)8eWxMKrv*Q6B)*#$2K5?{h8R*LM|cfU=i z3)7ow@LUWYYweWyTk&-x71c%rz*=_BiVanhJTf#v6WHx zGeEN$F@bt_KO~0?UOv!G9#o9Y0Jb+VBlp)dyeP*)0BgfDj)gPL0M$2 zSi6g_4+pTA=Wb|#A0UMSj;xdb3R3_Kqp?Ttt>^Ej;EA3~aY;Ffu7Q$;a?fCZ`9o9x{Ujq%-H$fwjS`L(TnI%6eEv>-? zw}7`Ңn6a1HL;#5clLUx405)-mRyOzCs{}KOF!7>IDbLcFGW6V*M_<80U(!n-4Cb% zZ1zOiq44aLK;6lS(ewQqe*-sbJxdQ2f~mDy>2^jQ8;kq>hfV-0ip+b9A#e0OepK2r zRyvl`SPDtU^5?H0qTqBX>Z>yMDBm-W|HckEJ8F$rTR78e^@ZB}ZzaR$uZ}IrS)QE^ zx{=MQ=v;o|!f|$0Z=F>(_T0Pgm0~T4O?N%PndwP?Y9`S6tdHUr-9c@`M?izh*B3e3 z-B?_6Sr00boTT&dICxpvu+y@g-H<58IH=j|F}L~>R_3h;Ts%PHV~1PU!Y$*2u^~=g zzdv!+P-8(&8IkL>=K+@M`<0|f*UwZaUa>w4!Eprnu@b@m_X)s+WQ|}-*AuNzZFuwh z$G^HCAk!I{biq;gS`DN0)-S1?R)<|ki~s(WU{~izTvUpoI6Mv)iov;lxQ5Kq0Y?3 z-TOi$d1%29oEnbXTMMFhG^5ToeJwt>$oL@e4sQHm(#&h3()nDH65B$m`0gDUr=w<1 zNy>&I)6y%}f|z7<+ee_J1WP9=z~`$wIDY4MH}Y3FgOR2HAZGUK1VB>elxuat7}7Tn zWSpb;BwJwU@rP7`ataSj>)g6?H3%)q`mjKu3n~&GKp5SL!S5nC2NSmtOvbhjTh&i} z?>E2$M}q{#R3CQ!fVR_MSQYRfUw0UhSMU8zZJClb4>tN=WWz|Ko}5cp6o^*wgBEsdozdp>%4tm zy!$PHM*wJQ9EHhfi>_2oE5Q!RUw;*};iJ=_{BosFrAbsTp)=J${P$4kMuB@uADXXQ zQwEwYaC6tA2MS)a0sR)AbvTG#D*B81g0U*zLv%-F-V$@?Id!&B3R%>*>&kjSLl_9z z%%sRi6}P|_l$f}R1wfE|d66a#=+Y<0$)J;;_k0fpz*mk0iX6b92Ua-&S3n=Ggfm;x zr>Ewia`Nr~mcYFoOpGTc=HzZyA*{^u^2-H4xx}sS!Sd0z_CFjCzR@fP>?~i>b-~Su z>!sPNW+zjbfITL$fUQXntM%b@iyD5T=&s1 zPr4Av?CGf7$D;IoSJQnP=d!VOzxRlCNv?cd+yeO}biW0iLx}V7OvSU~R57&YEpfE~ zzjd{_@e4K(PPh8*fAT^xA)}?*+z0jb=Z?#UD)e)NY^3I7`r8}F5dagsCr`hDtMSb3 zu%DnVlhDOW!SOb#hj<>FEq9?Lab_h)VRCZ(8tr=(^bwvOpKctSu%<-sNEt7DMNGZg+<{(&*6cY1!<#n{kE#!*@|N-_4^Q5To^Q9hhVvI-1;@mtlY z2GJH8buxd-gYA<~J&S)gKjL_G`B-iaN$? zT}`T2hkBUa4%m=;siqHAp(5X7(yE0D`%(|uAHU$&FYMjOxyphOP(+E}&bP`>ymkI| z&9$YWcA?0bd~9Y4b@!kycfR?@*}rCB=D5$-O#94Pq_?7YzB0w7%S=KlH`IJ~a!4p& zk-To*Fmh;!38^bEstSb0Ln*9I=Qq?l>-LXPj=*>Mfz#`R<=%v*{y1wBUE>g@Da{%7^Vr?W4A}biYNE zr@!B+COY#k{^o@e+RHbt@OboD_v&`tS$}Oy$tM%cj1a>@To?J>2||SSjmArLqB)7707P14$cnlUYZ|priCdS4Gqkj zv`M=uN9QWt{m}zHRs05zR_G$Eg^Xax&tZYw_}9O*ZKG)jHle0+VAOPI%5#wU*XyOvL$Vr zdQk6Cf66k}K_(_YzN;?cG9)BHP?F8!JFPpzAFSk%St$T7y0{em)HNq zKWy?hI$YWi3u7IA9L=BfR;8(4n4B+oT?q0AuUFLl{#RwO35Z0vd0-BqIo+uAIlGDa zB99-iyr`kIqYkwxXH1CAd>P0!2yZ?NpG6kJdP1R7p(Pd!MWWQ+SR!eCSFO=wT6Sa( z;qdO}wwIjr(ciEQhk#AdvDw5uaej2U6#ZzhJ`1mD#VvN9W37Qs_<$!X!u^`RRqSh{ zX^LFB_1I$KolsG;q*TfADpGgAu~4D<;Y3i@(tA$qu*lAY59og_tn9 zGhV6SyUOuUaeAszoNvUK{T75!6hCF;&AF`SzTSV1DG1CcS>TdC+O<% z)u%E!L7WfkB;?5?uAY1@4G&x1;N$^S?vy8((OB|BXJYVg7>hz@67aodpZtdaN(+QB ziVJ~7WJS+(&^NniL8IH`=N&U?bYLBjiT*?YN3op#fX39o=oLbUs6DXUemsCnu3fz4d zgNZmA0JyM6WK>$bw`_8JH|!n zy(lA{9_DCiQzJ4U$wPAz#<-ci8Adc<=4|q4PwoO^QaZLf)8V#z4K&Ce>Fs;3i;R9M zu9P-^F-wp5Gdi`^36ELn{bRIsa>kmPR$!|pHUHxdxW}J=wXdJ<93V>fq8GaOy)D9C z)Wnw$*)( z{IT4a_di*2sHJN|%zoa~U>&!O9oT*5H+bh>>0vk=!Qko(;bG5mb1Jw4CG(EQ_vS-A zzQ*ea`q7A_#KxQXxvX&;L1fdvx{(3bp_)mhm}YtaWuo#2Grnq!neC;I0IlSw4z^sH z0OON#448T!f=rtNvOuu7%(e}vu;&{_j zgvoHE5c1oC)F$ZmM?F{%+5II4n+!%OK_6Z0qYBz}2&1TJ!GHOly)HoQJBKR)mjgiZ z1CNAIQt38{@Xd_t#CtzY??1A+e?2RP7c?IT5r@CGCb$aS0?T`13=$=nAz4CP>BmeD zSGNX}z?p936?0p39QEgBa-g^`mq)t4=D_V#+T8;dgipAKQ^q>tHY(e}(SyAk;cw+} z@|rvNILaz8uCrGcXAa!Q6@EjqBuy%AS+aMpxKSbL;pH*=Zys?F4*@0OJAFu zaP|7r*n>Fs{H$Q>@7t(<;JoQ}_%*;l;!IomWXezrreC_kks@Jt*2L zD1di=Gge9z1jZR0;O9$9xHYk7_}om~*?geoe(X~?m~GHm?=#v^jTx(fxmnrz`kQA! z$^E}KPB0?SQ0aqd_ADKtaQK`(=Xc(E{>N50n-Ok$1zMdt9zCKmy(M?Uj5lBZHfk66 z!~X9`($1If?k%ZxS686<_=|ONz5a)l`{^JAu*&+FJR!ldfvVDeO8C@3wsLzlccM=u zB#RbA3U~wzE6omziX0CyQMjbT&$0A#>H;Frr1q)FXxob+UeHAJbZorXx(n-wrW>;g zoHcLU?OwlN=&!+a=P&N5-UcnXwCi0++Vuqbtm?O{y%GH(U$!2;ZFH(GVy*0NxIT)DkL zdZ3lca)bHw4MYf@zIU3};zOsXZK{z}Vz%EI!RD@1|1N8Ug+#h<2Hvd zeO`VuPD+IK>nXbKFmPf0=x;u(P7HN&-Z8Nq-dIQ=lJ3K0bfg?G>UWW1eJv!3`&TEX-Tro*#LvNhsB_5wRSInFGt1pB2t z*(8;In%M;d(_1zsT9pIW$<48vp;!24MM^6L(B?ctu+($v2aTe z-M#KW-cY}|W_wzlPS;0bLuLYdwRhYBHI9|>)By6np0X-h$msum5iQ7U@FHE=KCcIg z$}NFU&==zcTIXAKXn0D~gXP?>d#4K+Gt(pLU%d)^#^z@~wV#SU5Be4aOhzJjCym2A z${V4w#D3MWmA$}v6g8Xfy_}`~b@%4xmlwvFso0y5-_$+L^z~IvZ${>Ug_pOXhl};) zabiW!%VebwhHOk{rgqCto*v%PByrse`n)-V63sIqyeH>-npzsU32L}~tmI~PA1+yQ zVpLqce*!82$#Uu!WlR2=Phh_iIONfLxL||9yZik4d3SJE$iJJMv3*TMU>JYvc!eMt zY^dN#?7a!f#B{}kS4|KyTdDf~82_34!`+1Jax1t5PhwjYAhtFJNuWLLYky^uYkyhV znW#K?M+DEETLAY(u`r%Qw8aE0bsbX_3SnwDaszS)B`HA^gBDr|s|C1y$N(5Ukksn! zm?4muc^2NrtF!aj^vGeWF5$mi02dtWXW1f+( zD<_D+fT;;FiM3&pC0FfvV`6EObHjh!NDK|c*w$oJWhwWQ^zWo5Auj@${YsNM(E{KjJb8MqNZ;8l78rzMnHzY#w(2#91uv^@)1u0eWc9Q zf&a#*@tw4a_1b3gqWZKEf}m8H+Lpbl>6jNQGM|byCpz8_pzcZ;=@BC^z)Qwo^BzCo ze%|QC#r{l-LFvMi-BY=n<4sp;8wD(VMHzbg@kM@3{VMoqX(Vqm`Q2!y{KrJ}C!H7$ zE^SVwCmE|Uyq_{?!dL4K(rD}W+ORVvof+D;=k_D+s5lhk%Abekz#l>NtO_|&I= z=td=)oTC2uE9rlmtz=2&!BYwqd7a0L5AHHB+jaeU@$T&nPA6N{#hDC6l{tQb`4CB| zd%qUi8k=7KEcxOx$NE0XTyY&G{pHc!mrQ6SuXcV2N?mh7>GP6&qq4h`7Xuvr`S#N= z1ABCkb1a0k)lHb?0VUYPFTuEGacC$tRDSYMFK7DFh1cHZ3Tfnj;6&a{Oy| zu(HN;Je5@Nab>rrHaBxmjFb(<^1Yyu7@D@7q8%-} zz8xWo|Nh*sjUaIEv()JHp3?o5#%kxQ^i($}eYx+f&kuQ5Nev#x)tZM_`z-BM@kz04 zj{nBoA(1s}sDePY_u46#WC@L3ehdC7&2z{(928=fT@Bw!J&^qiZq2C{lf_c~{u>}A z6ZKuga~8aVanTfGr&rbN*-Odod4B$=aEHG~Sh_j{%3JZrD%Qr+W}9b6BXv|O#=B!u z$~`KjiP@n$)y4FM2@R@3)24Y&l$}3+Ge+@mA^EL&m%Dr8JVAw9Kio2+`!mi~UqIa4 zZy7g}U(7wa>xN>;*MT2y5WPx;&^S<5D^Ioh@_blFp@=@Mj}Kb0K+V1fKIkeo!& z_NIk4x0}#;@gT4hz*ny<+m?W79;kF56mp5Ck>1hk=8raC?a_gUIV7V5T(M;Z)2~q+ z;Hqlp|0Jd6rZh%K&pe}8cZjC~Prq*(&bGj4nMPhHXWmykqh3`|Dra}BVMcEdw|UztfRoyV5~%hja%dCXj|4I__jABTW@SYVtc6DVQody4kPdvt&;wa z_uJ;&k#&vzburs~xaV9HPqzD9l}t;&$i%S49)8)vmehsJmA@wr-kv@0gmMoa)cW0p zuF@Qv8x%Fzsavfxt_wX`jBv5nNc?llrhqiiHJ?3D5sptZ{&L-gFnIE9UF#@9uw;)7 zK2OvhXb@81)&{s>RM6QmY9m{iZSl$qC%MhmD96TJ>+ik<6R+LIICRqsv-*Elf5Ybb z1?P1cSk)&*Mo(q$IQ*3}Pc19ROe23{D+4>>s)V1g;XLQOE!d$12$2G1faXi&@K z?o-%7zy~&UzIH7bbokb~uwOt08;Rdf(&IqUlX{LTGi#*2of*(DVy5@rDG{~i2FJ~9 zQ$qvWGjZ?Vy?b{0)hs1oum8X`y9y#$xJm+>nSvY8gDBWV1c*V<(vLB^hpe?P@!^X}5MAIvI(be|2jnhCo4m>TiN;;0rF3`HDsa)2b&zkh^NSxnaW5&6ItSN0D?H5kh6|iCj5!Qqve>yZe-l+~ar>Bg z+-lSf?Ro$)$HZVwneSB~|ql@S~!DC{+kbQ&dn90wTRxKtu$@1_%TL0wN&N6M7Xe6csUa z6jV@3=yhnJcPWNmL$68b{oPTYbIynN-rxN~%P>hMv)5khU)GXgX-0|hmHHkk9{=@d zF_`pYN;6DnCc2^lvvSzvhR5mFk)j_ub&TfR{_P{XOr8mQbKmtl__&LZ#XIXc%gOY7 zFE|B}r51IB+-#|YNS$W*4Gl z)dl`67gTxyzEzJx7v(r;T+1YAQ!`d!xnPm5KG&?ZGJ1r&jy;l8U~X*7F~xK`MQt+rgZVb z-OhQbwlH{>9uA))Exm!61TXnMg)&!_S;I@z5zRbsZ&BK_&^3Da&gD@RC6t(Y$$98< z9STapfe{`kXyiU<3?E#V8ZUPis*mG^H^q@ClZoU+W&67q;Wl8>#Fm!+)tkLC%Vl#HqXL!# z23_lB;_+m$A)OvHN}^so5<#|4*pQ9iazHUGGU-2I6;h1KSW;c5>5Rl13Xkb9zGky_0{ld$Dx=j! zo(k87fitpAsvABO|8vjk?Y_*TrJq|n^Cx;7DY38hX&b!C;mPZJ#kQr~fQ4=+=A+Nk zdneuXAiD=!^B09WemJ&tKQb%t;YOB;+T8v3L{Tk8bm!9g1O7ymlUDBvN(cfOMOC%s zIJD#k^Ajxe^0XF*M?YOD&VJcRSn*$eEQxp!HF!S(23`-t^6O0#=N}pmvGPX_{`V{w zj|5r9)2nd4syqJf+9+de<8;$nTuCQPM7JNMVS2S@l5W`_U9}SxlyYRb+v9C$MCMEO zKi6bx8rCh4xjCsYR^gZ5i25+NREFJKOx$%nks7R(RfLDct}CRa94R&#RQwYg^}L05 z1Twc(@*6ik#4RnV2N;&_-xG--Uvd;Pgmrzm4^(quX_;mt^TWr6ypX) zq&Gc`V>R2eHM8SJ{Sd9^5q7JE);RMf$C9eL?P#mCisAT2t5z*KbNvQU&k#$VB<-!S z^ZH7rsxzR=w>qMTHIIiD(eq?m5_{wTS85KQBT|E>%W}tFl+h9vy!UD4491_xBTvcv*qlwR3T%2+6`rU!BrJIb9#7f ziXTjjdGr^UehUHdpl}_j?~^f61t}ugtD8m6=I(rEau`v4MvTsAnm9OaS<>#e6r(Vi@+jl9= zG&>33d6}Jcf2u}W3Yj<7CZTHK-e9>yeXnb`>mEXf4LBs!!UGKh1q_IZX;i&IzC^7O8?+Qu8xf$-;l7=K>AFyY( z12WMJN>QRkfrG{_)9P~vaxL9tqw2O?eQHtfBB66{a6K;2b2j}qb(osZ42a3pO^{B~#<00wP%Z1I+OxPSQRZYz@f-zif%mt*K2C zqrPDt!)%0vHP;S;p9jMC=W4VUUy=da5%LcPkWVe7(d!{wp4HL&$iv=xg??Rz4hOir z`sh+he?9q1?*&A5lE^{t;?m0L_b{G?!;E`kKsxIm2KJ~g0msXnDj7VJpv*9e@#RDa z@^{b$tt#!}Y!}{RVL71N^GU#>o`Tz0Sx7Z9C~AX?odl82GzwjTS#(bt!yu?gnhSy= zB@l+=tw5`Vp(KJiLAk^1aJZRpU2aR0-7)ZJ9=8c}gx-?K-XK={_sQOjN4hLFBmeiw zWDp>H%LstY;x6c-d<(4?8br!2nhwE|@c|cb^#)%Iw|*pG%b*J&-J-)^=RB9@Uv$_V zo#p-~SMzzrd@haYI0zU_zq7gP%#8SwZAHy}RG#0eLEWL{VXb{T!#(ti&&PhpDxcHb zG=o`+S3wD}a1;N88Kif9zNeAy&#&Q&|IerN@mu|T0L1Ng?1cIDx*^Z`^yvk``gq^@ z?WqO9+Y0mLw@ZgSeC9*y_qq4-mh-Z3ZQ~D3EEWhp`G%tbo|)VaMHWCZJU& z#|)+>Lej~2^Dt~p8u`?@GcZ|zh$I()M5$=@z5!4y*;ssw zJ2E#I3Qot} zQ@t>^#bEvotbFPz#Gf3Xx9F#%Ula}sP3aG(X=~{aJ+2Vw_Oz|VWO<|8(x=Wc14qw- z$jKAP6sp&ja(9up(g-AVM2jLUZ?NHtGbIrlsS?NyYJsQ5EnQWCR{OwMbu#Sqs;F&> z)O+PK0iC?pePU$kCE-D1N#r-@V(8!pPd9>M&)@xAkTHgj7cOvof$c3Xe5(O9@m_-- zeo&|pGY2Z*L`MW7yYdL4Is>a5f-R7)V9+<|}E&a)Xz>df?G7gQT4K-{Xo+{w&*Ppq|>uQj>DsN4qj*V^4Fr zF2{qS=y-9jU%bL``=?>oBsr2%+b8bBV*DNy6lft~P>v)#|JM#}%tVwN%PD~LSV&ys;(0?oL6m0K{BOy{H*hi1J+K7bu z=y8Zpj!+eesIKsrA*t;%cs zTC%pF;3O)}08Ew!vuqy{RaB?EJZmur8a}r+sgKieLegUO40M`jd|-k{CVa47)mpRG zz!Z`MMLx4$2CF)8k2_Kx40@#kNF-y$x14sLltQrB>dZ|#Btwu9~SIRFld2gy_N#-pM+%?!(YC)J+^Jm9naq3!_(44)3u%6URrw=a@ zsKCfyxvgudG5t~_WjQ)s;j692IoY2m#dI2tI$*>y8O8#i?|w4OZ z99{YS{VX9!g$XSDjCM%?bOj4T5Y3fi@Y+OSLRnMy>p%>eVHwS70`5X@k0gv{!n!g;~?NWh-#|t;ORENeAuYbR4WoM*P!J zA+#W+Ad&*5TM*r%EAXa4+f;*%y2-z`?>c%}*LToCu{c)duM-o}9BC@we6? zPV}~(2fL}l<-h;^miS1u8_G;y{sV&izb(&$+q!&?IW*k{0e+6#fA9~Q(bLnRe>TAX z|3&K;L^E8iGxKd8WOebGLj;t@*SIT;Z!`baO!(?gMzNXaKi=7%*kw|@V+1|3BuYd? zVfUnY*(Y(kr+I{sgO{@LeUd_9R$VRMhPaQ6Sf^0r;V*L! z7UQBSdAdNObaZ0DGf?AM6^f|hiROAs1Rw*~2a|4gc&)V5_oBym%uNP3`c5wOJa~|x z*#*`JWvQP@(68i>t34jWO?u~IUM$eHq)UYF)@;1c;Pjwo!b25*fM&JVzY8hpjQ1nw zqb9vNd+tkBh9B|{6!u@=9G2Xv8Fjon2p+gr#2TAfI73=pCt9gGjhGbbq;EU90#vEu3!N`B9c#% z-Cq<2r%q`@jk<;(|N1);#@W}^S|CxO^^<=?#c=DpU6h1TU$h&U^u> z*jrbK^YT1!FSVo42S8SWEvpB1C=kGsCjHxZ;e}WV#ZLhv6?DK;Vf28_eC9fMjbi-* z@OzC6e0NZ2`tsH$HlsjAw~ztHEN&tgAd&=)+-Uwtx^f+RGY%wN_MC_lVc3aQI-GU} zeKoCZVBG2geYcIq{$}0WqbHa5M^)#9t`8dRnJldexJHaN?r(Rl&g4%yJ+coiYLU;} zl!&psu}{T$#f?C&dFuYLd{xG$0n6I`w@V2g-_vPxR%dB?a$6Xbt&3jf^EOL%C`#H# z=yVbKY&r>j@VwQh`G~lGN{XamS!}qQ;V|3-T}XFcms_jIv_JRuW^0Xpr&h*LM#qiK zff@Z%{;Ni@d0*QRU!k2}7h}GetoI(?=G#SPbYW;GZ*tjZ?KfeD46f>Ke+AD(NM(B} z6{w?>w##ayZZ-Gw4oB8H?rX ziCbE&^`1;B7wVQVLpbA&96_rTaH4YUwjd5s9gbDP2du~tqJNei+tj)6mMYlfEp^USmOXv3R5!XvFylIpq) zY{%IMo^JD7m%arpXv;I&fsF5K(#wyTFWG)qP52(XpTxevl+mP>xKmxuw(}#fGYGH) zB;+nz|JwOmgIV!bh4|uUuS&8*jJe&|afJ&86ORs(S>dM_v!g3?m-bM z;{vTua7($&uUffSzngT@j$y@{UD!j49RO|?oI^p8bh#RyXov)62R3j_)FdNNqvDu* zjIX2+MNfTUT%WKc0H){yXkkk97+lh-?Ki4DG5kvA?Q~D3A|6A2W{^c)G0AdS4P|wh z<9i`6llM12z^umv6+e*1^eVV#A?T*vh>8_i9He*kAUes#(uk1wUn(G~l^n+c>M*u% zD2WLG>cqMH&JBV|i6Wr(jcb=c0w6=?bZDDs%YuTphT#s=Lg&otxhi$y9b z+nIA&8XD^MKS`EAn*C0B=rlXM3P_zS9L#_jbbMmh>g`>_QCAvzwh$OcP{!I$V8{kW%@g{l34!z8Av`ri6=q{v_>wOjlodiU zzGAGN>B^wAie$hId`Vjd`5p7co=S{#lQ)fz^|OQh0g@;*KQ|ZHYJ$%@oftb4fJCGD zGVh>RQhV@CY%hAW4C#`HChW67#4Pya~io_orA(N~@9$M4vV>{>W}1fvRY2%HsLUnH@Iq%GNyyww~v9z&TJl6g8Ig53_iC?q7M)!T514W zXK@j}Lz?7wyMt1(zYFT;(37uWM1lakNz9tUzo(9hm~3T5XxP8otE#0iF!VYyX{dok z6C!|Y(NYY&!Uq-g5TftF2*gZ*42x1AWQ-{nvmU^fcS>@Bzn6Ej*(mMJ*v`CD<&z^dZXmbUcxL(o+@)GPT`RqH zCV9`LO^N)$U9)QL`me@^DP~lYvBSCvq59ch3%Rpi*{^z3Z>ET)Wg?2L^tQ%y$`HHU zu;3)E91yCC7=Gg_NGq`iNCnUJvOAn-L8q?L^5e+T^r`JO&&|}80MDJ|nV4b4%fd!t z5%F%bGY(y0>zIiG*yY>93OlqG3w^6IUE-~GJy)XIzO}inwRJ?47kwME!Mh9=ug<6& z?YgW=NfXoTJAj1vaqrV6R}|b4%5kzb-8R~r*7}iP>ALilaj0o019f2&|GYnEYyXKk=5b={^nJMTyNkom5>meTV4NhMzK{RTFJ9c@`r)#1Ba+^7i`bmgrop#Tn;_e&xW1EX_iJ_~#Ag4&lk8_R2ebs3 zSPAy9zi~a$63J7l_yBo4Ac5P}%@IpDYW&_XOa$6^@30#p^)<|Lp|g&2Li%y5XywbeWyR{tBMc$KN@D@P<_L`W5bZn6y(QIB zk?ld6vdvt(i61LUd$TslyPrv0rxX6MZlByK7|uX=Zbg}=b|0^R$e2K;|>9HRM=Likg!)+6KybmoiW(Yo&7`8sow@v6tG zf=%L(?n~qTf_N|PaW+8hnnpYaEG4;v5Tg`fE1{r0fl69jn^_l}FnkJ=9Jufoc72ZQ|i z5|4hiN<3lfPWhgkyuX&5kRGqG7gN4}|4H@=8?wc4v407&+|rxGc@Xt~xy~dBWo&6Y zQgZg)-+}K7D6H=+T^eo7`FXzW_gg+9Z_+|MCokAyNS3hdW?Z2MKvSnU0jkaE+0fwY z5uXnC0}*Yg_g zH>yJwg+^}bsgrDSXT5Q49G0@kaZ(y$ZCYYQvLKP zz&Xr8b_YLH)auNxDjJ;L0RSVZssu0yo_?_I;>ij3=ZS^@ThX=xG=ef1qzus1Q}Csm z;qJ^}3i6(XQnvJ4{mhAE!7fs`A_)bG*E?iXFYOt-;PubnTbs7u${oYKAD9t7cOFU+ zYP39I2?p|kS){Pbx^FXzorC=HMBBkT9M|!aP%8cJURD7ZT*0(DM`{sCb6O3XAoK1Z zv+C6`up^ILUUlIQI!yoBCzAYL+7YQbICSH6SZ-LGeFPTv7ToFr%2+7!7%1irBIn47 z8lZzh9cGS%A0n!hbOrJ8;@%hO;xT)5v~V+~!-%0C%~Nn`je}s?D0gQyW^Us!)6TQ@ zlD&72;^^nD$FCiC|L33E0cy(EABMdFxW9Fxnr=9TIq02yKfyp**Jx|p1KBK}w9ql| zU1IbpVvCXpm|xNS6b4HURgR7*H?(a}IY>Dq#YBt{2BoV>4)Y6}CI1m>WPu1H3Ioos ztwOmI{nj^rvjUv@Kozn(8o>9}#M9PgtCgir6Sg;0W+K1DG1fhyGaNbO3g6cq1ihO( zW~L0^m${DH&iIA5sii)~(U&_jZ@{)Cqy|1Z%@0I8PBVxLl_BS@`0^!0PQne`DiV*` zv>HY7dd$xp^4vr%gls;FiIMcA%rAERaPH9Z^l(|tIUXF5NpNya|M$h@3a6o+qKDz0 z=H;JO_g<>&axupQw%W3Hy$+N%v&tx+$XMCvY|}Bxeb)QNp@F^ok&lA_zBn|-rd&!` zH}|UhM!V{xW6IwDlr4UfW88Uk#BWu*F8)KjC;8VQ+!u%L-LNkSv9D%h*$AmF_$}06 z)?a29-;-#KG<+ihk_PAXZ%r9xpNqvYoTc+pQeR*Q4~00FHI*4slGbg(INR%epgD72 z$$F_nLqxg8@b)NqrejEJws6W@HB~O_$gJE#EJi5iLF<-d@p@(5x8b_MXBoE6QG0GH z4d$tG6@hh9Lr?CxZ_X3smT*I^V;85sFu+S?t$PZKDE3Q3>&XjYN;A#5E-l4}gumrl zuE#srYTaIDyC)S3RKtq9Lv=1o3S*H*53O5_G7DMjLHlOcUmwXiB^|FkNbQ0`<>-C! zw(^IAy-JiwqsPYG)v}5Y_*Tm(efp2$3I=SARz2L@N(8GH$`3&S>x{EZTMGl2jFcf- zRjZ3S2v_^_%`;rmhVHxRkA}~tp%Ydmb|>A@IQswDM~%PXqvd|m5zoS2W+>ZP6q$F% zYcA>93!d=ad7;%jxc`$Fky1j=FA-{8U)`I`t4nYZ8FU=3itvJV7iqcpf>QcPY-cQR z+AwFM@6HRw$7i~iGiFvNN)*V@ch+|Gj~W`~wT4s0nYUK@Bo?{PK0WTc_@aNV#MQ@j zbjRlTC zmYZr`=UT94Y~0>4!S1XWSy2jfa(<2vJnt>(!3-Y%QthXEK7SIn`@L#L4yy&s0A0kg32ORnpENZ%DXC-L(!>mt_rSG|KE2j93 z$JoGsES@;TmMM^CyBg-R+M-JEyJB_v+`>#625ovg~<bLKUclz^cN||_9yjN-6t9VxK2LrEN-;TEIXvPlu)QyvXHM?#GFR%CT>B(GAC`dTH zn%OP7>atBwSz=$KJvTy^&z-^`Hyb+ty@&lbFpHsII}gpOCDYep#9BXt-AVH(9Kg7w zD&k>2;shY>D8@iWbu$KdXlzH>qcFq@OzAxZMAw5xA=aF)Kxq{ZHU*K^53p`B1MdnO zt<(T%dX^tqYCEj@OT1H_@%|G{y|Q!(?c+d27A8o#fcIYp1M|+z&WB1k+p_p+$?uk| zkce73I9n{oncgswC0!$rmJkaY8~LK;;AshFbGp&%ZMqNYaPX3EtgWfTdiwPJj&o9v z%bxBJFyJ0?CwMH@%xYd;I18o*3m^^4O+pl`;fP|W_*U8@qlfdW>`RqWV|9L1l_yl7 zrb}E$!;GRP4ISNoO-(ZJ`61A|=|6a_rkC$urMH}9C|9Lu187M_OE&@4y(dQ?Wfi4J zAW;ZCq_wL0Elnok3Qs6*RkPwVubgmzMuq)pF?-H`)|gwoTPJ9^cShW0M$Rs%g2R+~ zQ0U}<<5d{^r}2CErckcQaekmz0?55xrFte{;so3n@A_SUXK=xNt^~-!ZSE_hZ9@*r zf@;tb;tb;se1Os}sKYoVL4l9lLSnN`A&Bj~MR-BbZ^0-I;HrI8=W|(X3sn)InlR6Jg z3^wDoj)DDyCuPQuiYa|DE@E=1Q0iwi@`hdLM-g$NHJ{*T8l!lpywG*uMI+{;NwMyn zfQ81VMPzqzT4t?Y?3pOE+6h}P@%Y=Hd3}x~mwMm4ZHxhAs^2L`R;!`0ds7qi_hx;u zQm<(K@wktZdbL3d2Qr9}5K7`w07%%l{8$k*AITIBd!ot!ijR{H(Py5BhBAXlfMj~{ zqUp3(LxGBYg438rb7PL|!{F4# zGwKi6o0cq&_~2{}?+${?F$#Sc8Ej4-XVL82dXa74~oT$1J7MaEY?$ z9$geTyD=@am{PVg8$+A1kwUozb`^wct29>0KPpL0SKORAVrOO2Sl<%=K7{XU_+oE@ zV4lDS9DQx7BDje)XhnOV%9v+?idLnXNxbhBeaC{%$+8=tfCoK*Io>aj;@k+dh5-&I z2IgeFjQ_?!6;<1of;MvP9Pp>ViNvvq<;21JhSe4;fQSEe?xb1 z7bH!=T(uKT7#rNKXq=tVo=C<$Am8{6VSr%YtO6pVhkZ-H%nB_aXZ`+GWuQ!+A>ADIo8s*$`OC;bfxjrkpM*;Uzg1+q-ElZfO}Bvl z?-RAd(|WJm{TC@zxP;v8dhb3nCP>k@JNiH8O{$D&gJL{f`ZF9_mN!e?{6|H>+1mZL zJD_Cn;*=lj<|y&SIEarNb6}h=uNUZ@-^Bv;mDbr+R%IFHr!?IA0!XbY?i5TFLrTp0 zgcmh&)kdtSFy1}bXJ&7|rrr!9cX!E{Kj7u?!*uYCf)OZXQw-+S#ZeVNS!|pf1@!LB8yN8_BXj|zx&wMs>@9KP6Ds>ilTH8d z)(lUS>G?Z*TMy~yL`VfNYjw|80d-m>_KisWS%i zQ4SD01N#KmnIGosb6z6dejRxkEkHnSW=YM?HOI0actO`ES5Ah0^m_*MPFe5Ism8W> zx8g&?9dCEY{G{)(GuC`S>H_T`ffzUp2KqC$f3L-awjS6Ih8=o;h#{0r@d?u*EHF?5eN-Te^eB+P*Xw8_RtHW_ z{4!fSx0Wdcdg4*+rY;a@jQji~O!n9MZIWpd5ch&u8^swYCTBj3e#5Tup8JtLu>WzZ zNF&y6F7%SeJ{hm^i~CvY6KWXp!|pfEgJqo6xAGH79vp;b5FwEo`=Jh`+&uQG+UG-3@*kDh-uI? zE!>&0>9;39VnTB)PVt$61#Gl3YjWRZ#sjG#v#=I7(-Gj5ntQLT$GM=dQEJNCGFRs! zT-6}UTGq)1{lY$vQ9QZfaG;8Zi@jK}K+hAY=Ng4Y>(^(&6`Z2`D;(3n-&`Se>NneR z3qsETAAPM9ksKvllQSU;!0f41@*AKhJOJou)BG`ZqCW^?{2&-z=?8)yrwj_k3apKV zFQhB_cR<_7C|EZ-&PDfxUbXH92TnJk%yHk1Z=goo)jeB?xoyjj+wn%iz_gmczTU28z zkMkVA{_lwi07oh}EXt|N2>}A(q%(o+@hx9YBOaO|LW`U%WamP6jlMlNS*B~inzXi) z5%j`1Dl)&0&usang!F>9!Y>5jhU?!Dddi8mEuMWs<{?}6ju6i~JS^%$AV-7~cx~&| ztBwysx*s;asS(V&(;S_WZ~i05+AptW%WAZz-}#&W9L4CuiN-%Rgwd30Cw3=|HK#gb@Vlh8LVuXIR~2qJ zieR-~e&Is(-;`g0nTsy0xNx%QVKu$;F7u@miwA$?>Vy_q+;l;b*J>>(HTED%FH8EonZ~`a z`wy1@ktX{n=MWI~>}LnQ>}n~Om;V0R(-z=N^5)8=ZZ1vU|BUyZlUx(HU_bKUw9(N( zZoA(xoW_x{wyEEX8{u@?r;aM=DtEJMsRU|vYl`vR5S^D-?9&+#22xamgW>Yn%Nva6 zDyV@RdX9zXkeYd{$Nz!5O&A3sDE>O*T^IVk?329#EwWiB52Aq^KweBz;CKw@Nuc&J z2?V`Afk>p94`_PW5bxMnLtys6|HD}+^j;3wgSa2yQ!x?;M4I78-78#YU`I{v!#D11 zoArB^A~+8@SywRiQThF}SAEen>!0wPKP5|UFb`8@0!(`ZBk(;KTjLD$t{`A=Q2#s& z1s-yBf6=);BeX23wcnB_g0dY%vCx8t?ts>78qk({9E?2p%*0>bsIh6Re=MTs>)W>U z=C$Zd<7L&EX?;bRolwbuhPZ&O=$~yuLTU6@LNVt~L#?Ol|G3YtYm%K|ld z#))fjJ*-nQ4mB96B7jMkBnA@h>NYy~WG)$xcv77*`Lg%$_(0zq*wmdtMkG{nrF zVCVl1B(+M7xW(S6c)jd>-(y5mD6qHK&=Uy-!#sc*+1vR8Ow1owEhW|uF7Z~T>ChOJ z@^8g#QB}yVRnb{yPu2IYz!n^h3lP`bffu1zlHiw&omZcd-)WFFwd5)LIId}vZUQu7 zH<8pG`_lP}%ba$n0nhax+W(xu!H}`TC%@30l5J7n*tos@PD`0&?A1T1@~N_k;R;b@ zFkUxfnR1X7TtETS53pTE_|7UwC;eb-$$qdb%Z4BH24^2<^031>c!2yHNYtMCQPnN9 z93YF1|KtNBzX5_Zbu?%^xSY68t|T7DUVu|KXPIy3obi}}-orT1a}Nl<8{!Oc^t`w| z3Gc|052F+-nUQ(qM&!N4odpKh&K9_5_TH~< zSsM{Tx1XBcoE2o3S@fJ3QT2(6vz{^ap8jm9&tzBt#>50<^z%hn!yJ9I`<=7}IrFGL z+GfTzfy}G;Mfv|8^~|+LLaw9uP>E`;9_}+nwlc-KB!w*(>Dr-C(y9J{7Blv|R9jBe zB40fHZ+1)r=E)-lVnjS;JV?Jkia<-Cye+>}}IZ%!a^WMwK4Zz$^uBER=gL6yb#}XZVQqp@ven0GF-848Gp&z{-9!j83%r}!_%xC#P=(UMqCs>uW zs7Y9T0EV-c*JH6>RkwNp1kP2`KXXSW^MFiDp8b#GT((mcXJ}$YR0fB>4S~Dc>fj<7 z|2o`J9x-ov8#vQIz)pmEHU?1Wfbdt+Z^#rz7g0p8t%%nw)KjQHOY|=vdc9DKX&LMv zr)F2SAt8fVtEj9^1&>=Ov3MX_XMj(J)`7l~q=+s4_IUVyL9-NB2^T+ZfJzy>i z=%bpL?>Zfc)@(Y?U2(vt9W;!sqDf3DWfT}?&2{j@o=9PMPm?9Z3D^Ku@k$0bRabi8 z0DN|hA|ebAVCt}eF$eEC31Owi>l9#J7vh3`nJYXxq0@@(x3R+QKxC}AYg zVr!s@Dcr0C!S)h`be3J+s~q$72!QS7H>Ad!+(AuD8pmO!5RQl;gK?Lubss4B?)L*| zGEpu^f4&m47e>nFe$WVtJA2R^_pi*E^C$pGoQtl2- zpAz6s^S!&j?QJP-)>o9P?0uNyEvq_!+7MeABGcOPwjWb zcj1vesMv5j>2%se;F&`E)>R&7K^*fU8=s|cliE;O4X%vmp`I?t4+x+ug<#X?Av$Hv z({D&)vQSD4pdv&1fV7@|gnNV$F6je=L(y?0bhKAGRxSjl*j!6pk~Ci&U&A2j>z_JL%DlCD6@7qwXKCy3di17;uQU?yAK8d4!xgAnDkUsCyJj8JCpC!@KO~TngyQ!S zqYDz-9a0%R@h{ycI?199mOg@MA>D=PKuaBRMpG1Re}JBjQ?_mH>MZhFN_PzNIRT&d zW1velf5i-)%^9Hyl_@j>Sd@0+5RomsChpW2)eS?Ud1y27mAq$ zP;|{1E>n#2g%H(N;P^nwn`V_+%s3|z0Q9vO)=?FsycP_3={;5|UlMi}*j#fPFfNG@ zXA+WXgJ=Yv8^*4i{f1=lZ&bSzfbl~}#XAiPGlEjqYOX^~{Z2K%eL2$x)kZc@A@77r z$yey5C)#;y%_cv2>f`anVc-?OQBP0)Yth=RS-M1gw8a7%mk`?}>#~;^+Sj1e&&07}w6Qm%UMv3ys5BC*0EdwzOq;ws@$p z(YJM_u7C*JKjkm5xf5HvhqQ)EUaCIHlG%qlAw_v(_dT{6P5V2N7LRp4{Bwa0rlHbXzE0Nl(pB7UY<4E@Q6!HzYR8t}Sl6 z%PvJ;ZT-TlCScogI#JMmdt3YWYyiRyA_4(ORPmUSwA*eWxJW>nd7Y zAP6ww-b3~x%WoFh^}yY1gR5snhWq%E8M*G4>gD>-*=mdvz)8SHEg1qMwga!^P!Ccmu)B`d z;}k9#ws^;=#LcppT;r@{YWcO5VHlh|_zgA!S&K=~fMzRRyVP0##`Zpr71mE%Jmg+J zr4aHaJaro4rsI?orhPB-6B+B}%Ll<}HD`h-W6E(EC~eLU+;`CJ9xGQb5r6TeIWk`+ zz2W9+90w1}Ledt&pPIcp6BD7#%I@il`YedMPe8|t@oTZS(Cryy3hSY)7tkzQA)h0n z{Xssd0BGng(E@8rUJc+x#JChp89ddBWhxHKZ8Z~XAW1~E3nR%Jwo`vxMj8Dsl_ZYG z^dvE5{t0Ejx@p3&>7a-J;3q!yoU>CYK};|*=5gSv%bB&zBvIbo?m2%&ybc)I0ZN0w zn{GAcze|a+SD<$51ZBj{&z{rWKydMbdd)=<+6bo7ED85fpS+nLbpQd{akAVhQ0jZT zhfT1$TQ{&Zv);a5i;1La@Ows50T!o?s164$?7)Y@91sk%Rc|KX(g!HlwwfEjE(5+LArwpFzsH}KU1b$XJ z2{#Uw1g9KF{hDH~u}dx>w5`3gVn@|n6YSf7ySDKT$v(-#C zwiT)g4UJvHgLce!y%>GSi3s>0w-_1COTb}=A+CBj@fJo4}-)qDS z+doBc!Iuu*j00E&6}bqen({f%qL9M^IBn`K6P_(_4{q;M)L{8xuv9+Vy{c{ms<4GKtLmT@MH%;Hvq7Ov# zVDvPUHj>2ZmO zpJ~{Hf!xj0{f|k7UiK*=8{dt# zxedxb6vdVDuM&21u?Y@8u#ZnYrq#FODc$N%9Zmc}9|Y>&IbT+UK|&Km z)zQBMUQGqWQxH>HYcfzssz(r`=iveXQGEUfta(3Re!q89i(q!(ymV@4%2|kpQKiek z-!SA`uJ6Kkm$2eT06n@pLx&Qf2NMq_Jom_zH^M2N@tIlePVbwVnw;F;2u@(IRxfk; zOnb8|r*1c`A$oHi)t?M*Ggr|bo_s%X(9iB%-BUSzOP{}HSYh9qSiFyZ&}@2thse~+ zsJ9*E+cULVK`r|mm=@ZMR-R0MgpyXH6s~z+@Y#tC5|OF1SRtl1(Q2s1a9*oAuO-L! zPe+o|xq%+qEph1)OKCE%9@USUJ1=g1=#k&`F52`SUxN4UyE&E)?7Hk}muNqY^^-82 zh?e*jdrHSv_;(irI$=DA3)?SwzGAc8twTD8nFI%ON{ZEY6Eyo`TT!nThbD%L*4JmY zyO-|nZ?{Q)RZ=ujFJ5%sG8NF+$4v+)xhEtpjij|;th3C@WQW_jnrd5Nm>uZN>Ww1Z){Hx z+;mM}iq*}A`2`>P%{kG0J`eQACEl5q6}1{MRPl9<^wGJN(#`6DY#!%7haefLBKJhI z_9?xe$GN2LfKy?>sdixX`Soc^j8_;z5*%w*<+es&J;U8|Atq$`x3k=AsH@3(hkN?6 zIs2DCPhIJ!k|E(Iq_f|%J%_5RXYK3kFeI+KmA-88hUng z;#xAY>7q2ku2vN#cHuWq<^X`Ydv+B{M3rmf6+pIUGir(y1>DiS1ac$&DIa!3m-;^w zpj$}Y68A_5fc}Ib@WPB6gVbt;3a;U08Zkkj1r>GM5m_Ko8_)os75K$Q=BGgZ2_WWk zLssNNP@Z?h2BMjhuf!6*B+m1w~ zpM+Sp+a@LVd3z#=TE?J5kFi(=-o@4iA*`4ECAvTcsqp)KpXEm#vS}5b{O8@ZgUf#W zm}6?ep`=)|Q9-meR9%1LMs^*O)Xtrjm`%0e5h0BDn9JrH&};)=ZEWB# zaWImp=wsNC^pKoV0gwpMzaAesf69y4_AAQRrx*wi+wxnRawLrU=Y3`2 zZ6LRG)Mo-a%Ysk0sQU=A3%av?4eFcn7j`;9J|&DxvF?WC^_Ly*@sT|GYD4|8Op%tZ zH+Jr^|3|TYZ0B@tUbj*JP#=SxAo~z5_fTJwVP670r>bJJtL><_?f@~c*tVwdsO3uo z(t&Qfx=&(bj7{?Iu$)Sr4wfc`1)>eb@aIeivFn{dAl588dwcT;Yg=%)rF zz{@EwwiZYCXZMnus}`IGB0M%zA9T=`7DiqTPRg8QsB9LVDQ$PP?EoLSF zyUauS*{$~O*gL3w{Y}qDVTS~?brWm}GLNDj{=^#XG9IF{cqUI=w;i{+|di%6vTrAAJ^ABnP^X>`vf3O4C_~tcjEQHtJMe^Gw@5#iqkt zX?<$uM=DOkbNwsskDPtMdFT<@4Oh1`zuc0kL$}udzW@c`SvYVw46NAoQu^P-le{Dv)YeZPp+U zAcIe{A}wTOksM+$QfnP6=09nCl14+Mx!& z)_o>d>S@2asn2)2|KeHGZNL3CdK9tpqR5V&vz}y_$TF_ptt=mzp?_(^8Mzu56_St1 zkS_H7yI9_1Y2&1@4D|b+$TK*`aVS~M?d&|ehq=7SjpBsdbT;&7tmoS9r@KS&8XX$73G0YAw3L+N>J0$j~c0s~6}364nTgq!JUAGV5I zWRVcKq*g5L1gCq9qa{R~2D{hp*usIvh*<47mdS~5ThKVcQEGSMarfCt({=%@hl`a8 zGm~|()X}ENI#vX&My^feb&0-2O^UZ4~H9b2onOuCd>2wP_$#i5NrNey)cYnU1esw)?!*%w( z?)-j^i6ovr39__PN!ZJ^TYX>Tj*8Wrx|uAd3Qgt=@4q&oIICVS<+d2B6_gOBG=F>R z8f1>vnZ7E{?~UsD$1Y4+TR7f(A+A#KT!|tN660rZnWLq6s+@3%FdL~9k}#a$t-t?O z)t}r7x525;vjU*0`pVe0mlC)!NVu$hDk<~xqR8a%{&8b1+gi(!0Ip6fXEQaPW%#pu zpjK#=E}Mh4@fWUufrpF>w8$3+PisM^GsjpZ`6jD`^1Q@(S1ZK~V-=YCs?6NKPmO(B z0@1s-Tb?u$&A-G12)Kp3YZOuJJtuJX&^QuU*Tf=w%R6`3)b#76_c;s%@y57@AsS41TzxWvGYgNdCwR0}oq z8a=;=p*ktTyOVq3rGJDnJu|Oi2nl^N zcK;wPm>2gtNA*$9ZGMJ6)t478kixef@NZUvwj377ZvkODI9DwT8 zl&GW^l)-c%E*Cld019R0OFnL!B`Ytt%sOa+bKP;84Y|llIE`1pm?3r!8}`MiPjM8$ zf&OT-xu2bAx-`oT-)BDSZ5(Npdhx$uM^1$N+4b(5m?JKUh~_63tfOA)$s`Qz&%O&i zY^Z9g+LwQmCvx4fcJQ!l)&Cj(snz`&-*F`IddpKMCQE9(rODPi`>p=Je|4hHsN*!* zWO1rO`e%|eZ*kY`Mbv7Vciu@9wU*6};?|c(2=iy9Vf5EGrq@X4Cx&2Xnh75*!^#@nAO$!djbdJ&=$kG^RdnvGZ*LC4jcYIo<%=W+yS zySGeejjotaocpt&6o?hOQ28nZosjeHFriRQYiZ;nGiI5IKw7w>PAA@B?HAsOlrf91 zE1H_gcAZ~ulwO?hM_IZKIyWu`c=@Ss#H4*J7L(dX*}DJ((to1MlM3-XTgNR`U3rLp zakjj8Lq)~Ak>j0~sNz?-B=9Vtj3oMQOEOS@seDv;0p<55(x?^|dku{$#rBT|L zGkQ^bi5hIO*k8EVWfW`SVrf=z>!eu$*jI0NHQU!0B>P#f&FApEfy{tNqHn{##&J@E z@A$j+NwiSFoN@jj{?9K5=NbZL5mw4~bCWE;uv9$Z(xr?4*@*Fx+e+kwS*|1ML&KCR zlU1B5RS@ZzEIEO|U(ZLGb+i@M`m$#aaS`3y@pszPkS3m*gu>~CY~T8V(HB^35?*`4 zq>;>_;I-VWa)rzx;@OlbdD&faWgx4OJdF1hJTo~*6JkoU{Z>)?RQ{;WRIqv~uGm{Z zXDmZ`T<&w{+qH>Dm6waY1+R?Fi&}WsN8@$WKAGm_-yCkrLY=uMfsW?V$m7v;wxP>}M9=A`qjkPQhTUp?)H?$T#p$}J7W=U{ zgi%^s;Plg$MlmBzw;ZKFuae89UY4Z{3uxt`@?p)d7Ky8{+Oo4dK#52s7}*-xk5 zGeif@BoGYQZUzj0jj^p6`_FzWTRNQ~q&4hLamLUj5idVk8I{wSNRM{htj??^9xxOnv|ez z@AlAA3^6CA4y3lXb$oU3%wJycZ9EyR^>iYCvbW04?;hiPeF1IdBYv5U$3$q)(%M&* z?$4k3OuF2#(UKN<-@33I9&@|>b8LkDQk^R#&+%-vvTP7$-V~jf9Q!5E!ugZ*@RmKlAq$tEZ6ER zXU3fK(mrkw3j|X!Sw3UME#c{fluJgPbt@wu$8IGpM9q|l85v?3o36RfAi#Z}k{=oX z(_7wy|2E6#6W&K-=h$nwTQP?eOa{K8CBIMdSI}CnVQU@}thj8N)hXyE^QBX;MT^gB zHm8Y*hdZjRzsBxrjgesV7q#HKQC~p&rCd8acSv2FS^NI&+uIX@8okw~euN{4)Pe5y zCO%N@;en8yS2rEj3miBvs$iO{eC z>}hCnO0)uU;UJ!B0fS8?&VUcj_Z~lR=!rj)E;>{uAqem>V8A~p&W))#xl-D0mAwAO z(5pdz&s1h(Mq8gAR^Pi+yb9-yPF&&|EYH>R`kJT_WpdCH4&Zri{1jmOD+DnON(`7s z@KV+*j3H8gNxa~ZHETjVuZEGIc6=V52L#cc>C5-6#*lX;mUeb#4k<*e2uw(r3ZTjH zi+%N7z5>-L0d( ztL%xOt*bLz!&!YfybzsGELYtMoZR2`6AJ;=E!B|N!3tNbXlF1v^7#C@R{{_#`X zpJji-*5q^>eoAydA8Mx>GZ0rvry+izn<(zLo!lS31aE`kOh#ReGYoV_@7E$sKh45N zfUB(ck8lYTR?7J`Y&sCEWndnZ$neh{4M7sO>cyPr)E*v7A^kqN4Qdrgs#Uec91|Kz z>Iso0X&Rp@O7X8C10nFgB;WQ>Z2}9LR22@sY6EDElv*fE2uj)dmuuCBtSw#NEawId zKL`w<<~C}$~NnXQY~T6X-}&K%OT5?9=sXtUh(@lLaQ z&Lz(*XOesIl)%CCQLJx8e7^l_{Cc7ytzI)Zpi8{5-qe@8A}RTzZ7I^dFe$%BX}V)h z#sehGc!m0R1k+_hca`p>Onr-C5L#>Z+ilDa3}_3we9-a4ASp{BlkVN3ye8JQ9Vbk) z@B?AFO!VJG}0`Y|p1t7~Npp z8KPHjgBN9V4tBagENDNQIzjaYKAKE{OMzvG8-PK!xeSsklzUO1W2bCv6g|OwMK`_q zi^E1a0SBE?$oPPVD;nwl&!z0bM7+xcXA2UrXiFV~a_JP?3Xq}x32;rh+ALWImQYmBP7xL0B2knlw zTEU@?moKj7xe+N${@%uPq{u9U)q&?44~A!ro_oAx`qc-m-aFv-Q6t+=AsW>$a0qm3 zyel&{Tb635i*DMlce0NCWtvW$-Q2JoOt0QzM_FW#1CHi<04es(CEULEEMk^hHR#k$a^KR}>e7;t@OW$BCJ{}o&6hqg!MmkBO&OAW| zKsvaQaX!jbZclY}mVgq9_5ff5cuWIS)5zDGNEr!E7*g^=M%G%M^x-SDenK~n^X{lX zo1(CV8v_>?KD?LreC`%HbSqZvuKI_9E)OV$va1H)MNeqiQBxE8#am@G;8^j5pLXrr z1pn>0+G1(`(+?T>?qqT4NZK`ssP(*_o{0%~f@I_z3d;@!f@N}kw@OY#~<1V zb{@hP<$1wGnm}w}T8$itGcZNOg_bEgT^Q~w$ED?EJ#cXwdM+F(R$*Z3+|`_rR=BJS zj)M4)hAP*(6o+?yS|(C}F@W8+%r26&Xp?CToeh1c>k_Zq4|gqN^BS#JJC8dWq$Ir@ zjgQSaMnP&B{p4ga#>n_oR3nbk9Z}n#SV54513Vjm*H#2cC6CGW z`jWn1-aJP^g>*}@q6flMGP5%f68$6zdFK1qY?3*ms$)@vB?Z)#g|LN>Nk$1oN3@2I z+pYHxs+gV#6nNmX9Vefn&2#34>e^_Ghum0qXWEoMIAd|dS<70WHqnXhR0X`?;S*QU z9S^5&l$oCfC)0-(gu%gEF_Ho!z!$(1vT%pb1;_#H*Vgy8W<)uQj6XoOfTd}bmu=JR z3Dy`gn-FwaofD8n2x&PGrw-X?sD5|8?rFY^-@gylWmIZHyV_8h z)V-WheNPszilrbraIrEUszSoq8XXA5Nf`g}uX76_JM0_!${DNa-h|(n!2YPsUR;;o z*r&OwG{O}fq?}pP4ci?Fu~_f7E3Lr1&i_=N?>qZA>M3-xqGx;@)>o?*qEx!sfxej{ z7dgRo@>YAr>X+B_V$9pKX?b60C4o=kPS%JIDQ!>S$W?Y`uC*KrL^^c7R64#D77F7j z-Clp9njE>!uGj9>XPm=?tBXk_S-@-GzNevVZlN2h3MRCa_`I?gHqd>Ea|qR)m-_cE zLY_#<9C2f2XDG3hMYMF-pFDUwXLsq#l$4oK^6E=@lei-iAD^GCk%m)O%}{&pU-ZB% zOm@zrrL9cKvvmonym4~~`(0Di2E^7WDPM}Iq+{ofzz5a{6kK#lX`8sR)f6K0R|vb* z&UQpFPUMU1&1HQ5zrIFIq6@P5@8?!ORtzM1ZEVuHt7)mJ>~b}TU2YXj^QyR;f4^se zQl3BrbxqVbM^GJ6yA#l)%tWPy>LJG=$_BC{DS2cSRU2wVK|MRNtTL(-eUy(}d z`AB-u4)+ZPbkbi=(#t!NWZW+$Kx58&D-c7ynoFtrLn%?8Y6<)@*PTeNFQZctIyb(- zeZtX={KL4RF;#WwDR5WsT~~IWD*Pn^)t47HL$wlP7(?|FmrCCNhncAY+F+I*{wWU3 z-km7solzIzM)^AznysbZx*k@f>6nfQoA!XqkHj(oG0tze!bAIq`KdQ0tJwJbuB9dM zqdxCP(dvVL*lxx-8)1sZ4ky@;UQRAql}p>%lsh}|zKAHF+aaZ(o;+nB2u*%g4DG5sq!59uO6356 zis3vAD`P>0#A?5Ui~zgh_+#F}a=LFbtNiLAp1rxXT7J zJSgjv4R*?{2Sa7;cnLlqJg)qaQ`_OLhy|(U49_g&TN}jIdjWyxxkqpRiF$(j8~h3i z4`x0N1JL;zgJ>I2+T##14e)AEEQRS4s~O_@Ca0<%^kDfp+v#I(nI0ksn!dexg+T>3 zd|>A-AG3n<9&N!9FtX_QSg@nhRdbn6m>{AUap~T;6g&}@XlTrNkU$grOVV(0iMugQ zg2mf+x5{~%J)At<4RFzr-ZW&;V^c%-En4)US5@ zB`%>G%=VW>0;e+%{mIE$SJC2_IzB#Jso(0DTALICOmc$vMbsarp%U%8j@*XU$tC>M z*AJu0KeV?ua?%yLVk9cHv%g&Mt}$kbS#lM270!qyb<~1G4a0`2#kRm{4FHh~b2uc& zzhr7^`}&^ePJ@;v^E>+PP%pt46KN-nQm%K#Azv18W0p93^=d#el4Pr23V*>a(Sh!p zis>-$7l&E)Z8eD%FcJoqJ}k!|P{qjgQr198fB>f+00>7njf!5M&imBy&lia&AUHl> zB+)aLN@3~;D>JA^qgO0yb~*45-7qjrpO%o9bacr+NQVwo?_5y}Hzn@?+6%!eS{2{^|!j6HlLcN;}+w)9v z<8B9Haz?CdNv>?HFTe*IVT?Hl`cd#zho1*@xBG9(51!ZNUiUYfwKn_A!8C}}8D;F# zWVR8)LS`h_s?uT(+~{B%L=4hJnmFwetly7d>Jdz&B|zFQdRX7Om_a3U9!?m|gYoC9 zwk|$yj5x53s86il=oFDPC5(k|oZhWQ$dedOxiT`dut>ZE!uX!y%UP1J?ZfBndr=A( zkc`1XnC+#f+TLFz1&axv0ov1;j4K0J=DxBF!Ge460K{mbk!yyc_hMtGqCf^qiYqb3 zPvtu$JJxdY;PSrGfdfCM{s(9p{s&08C9MT!WGQG#l7{#S|Y9Em1x^so4B z;C5z;HsRa+i86S7gV{1fZjkWX9=^fUKMN{(7Q@DW%(Cep8@|DP)B7TUlO>)7a?wwA z9HyNcqq=PPCkTKn43yWPEI^zPFy=j{Y*(5;J4O;aWXx)2@UH z@QGMB&Wg&~M=<)24)(uaJbxEJAS`76O)C8Ns}kI{J3C)!f72i6>I!sxSSzji2tC{C zrQy1Z%lGA%Wo16xsXdHV7hROu?Xaa_7x(h^6@NCBGXC9OYs!Xkh6Sa3q!O>)PtW=% zyxpV$Ud-(4w1D4kfC{Ed;y1v#tA&R#5FDk}137DQwb9SZT4=ERI0x886NdT^>$s7E zrXM)Q>%ao}ZW%i2uI%#)NID7S63&6a2eYZ&a#{m(}fA9u2` z_(4{d{MN~F6vrReQGpYNBy9?bMUIa4GvH-Q`_We0TQ&uJ&{EJnf>}yrB-!i4l*MrZ zK=F9G3SFhXxt@g!C>XvNEIGk5s26wu<-8rTw$Ca+Fi4l=b06*v-%YI1a}F*u0RRbZ z{x-d}407;PJnXoi&1kix62`_rq$=;R#lAtk-^P+|4Tzv7a2tRJ$J!MLycSjZ#RIq( z+-EfIS&n5(oSUF;@0l3(GzRAdGcs{7{zkLVtV>c`%wL>lQZ<~G;Be=9(ljo<>e8#1 z0gLeo`ZAyZi!cPY5)BO6Wi%lMCU#@GB|#I=3o=CM!c`i2!Ui@=ftj%c-_nr2>CYFCLQ+{CFZewD@SDxyWc{nr5rin;>O4V1m`JWGmHbO(V(+4Ok q{{1uoBY$Dr|F*!8BOm?R;P|SMaIP;)MsOSaG0-#A{d~eIt From ee3683a76c51e508cb66a2b1e3741d865476a3c5 Mon Sep 17 00:00:00 2001 From: krishr2d2 Date: Thu, 8 Sep 2022 18:39:43 +0000 Subject: [PATCH 34/34] ran linter test --- ...ular_classification_model_evaluation.ipynb | 2876 ++++++++--------- 1 file changed, 1429 insertions(+), 1447 deletions(-) diff --git a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb index 6fd218428..fb35c558c 100644 --- a/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb +++ b/notebooks/community/model_evaluation/automl_tabular_classification_model_evaluation.ipynb @@ -1,1449 +1,1431 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ur8xi4C7S06n" - }, - "outputs": [], - "source": [ - "# Copyright 2022 Google LLC\n", - "#\n", - "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JAPoU8Sm5E6e" - }, - "source": [ - "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", - "\n", - "\n", - "\n", - " \n", - " \n", - " \n", - "
\n", - " \n", - " \"Colab Run in Colab\n", - " \n", - " \n", - " \n", - " \"GitHub\n", - " View on GitHub\n", - " \n", - " \n", - " \n", - " \"Vertex\n", - " Open in Vertex AI Workbench\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "tvgnzT1CKxrO" - }, - "source": [ - "## Overview\n", - "\n", - "This notebook demonstrates how to use the Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d975e698c9a4" - }, - "source": [ - "### Objective\n", - "\n", - "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", - "\n", - "This tutorial uses the following Google Cloud ML services and resources:\n", - "\n", - "- Vertex AI `Datasets`\n", - "- Vertex AI `Training`(AutoML Tabular Classification) \n", - "- Vertex AI `Model Registry`\n", - "- Vertex AI `Pipelines`\n", - "- Vertex AI `Batch Predictions`\n", - "\n", - "\n", - "\n", - "The steps performed include:\n", - "\n", - "- Create a Vertex AI `Dataset`.\n", - "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", - "- Import the trained `AutoML model resource` into the pipeline.\n", - "- Run a `Batch Prediction` job.\n", - "- Evaulate the AutoML model using the `Classification Evaluation Component`.\n", - "- Import the Classification Metrics to the AutoML model resource." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "08d289fa873f" - }, - "source": [ - "### Dataset\n", - "\n", - "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", - "\n", - "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", - "- `Age`: Age of pet when listed, in months\n", - "- `Breed1`: Primary breed of pet\n", - "- `Gender`: Gender of pet\n", - "- `Color1`: Color 1 of pet \n", - "- `Color2`: Color 2 of pet\n", - "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", - "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", - "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", - "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", - "- `Fee`: Adoption fee (0 = Free)\n", - "- `PhotoAmt`: Total uploaded photos for this pet\n", - "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", - "\n", - "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aed92deeb4a0" - }, - "source": [ - "### Costs \n", - "This tutorial uses billable components of Google Cloud:\n", - "\n", - "* Vertex AI\n", - "* Cloud Storage\n", - "\n", - "Learn about [Vertex AI\n", - "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", - "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", - "Calculator](https://cloud.google.com/products/calculator/)\n", - "to generate a cost estimate based on your projected usage." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ze4-nDLfK4pw" - }, - "source": [ - "### Set up your local development environment\n", - "\n", - "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", - "all the requirements to run this notebook." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "gCuSR8GkAgzl" - }, - "source": [ - "**Otherwise**, make sure your environment meets this notebook's requirements.\n", - "You need the following:\n", - "\n", - "* The Google Cloud SDK\n", - "* Git\n", - "* Python 3\n", - "* virtualenv\n", - "* Jupyter notebook running in a virtual environment with Python 3\n", - "\n", - "The Google Cloud guide to [Setting up a Python development\n", - "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", - "installation guide](https://jupyter.org/install) provide detailed instructions\n", - "for meeting these requirements. The following steps provide a condensed set of\n", - "instructions:\n", - "\n", - "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", - "\n", - "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", - "\n", - "1. [Install\n", - " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", - " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", - "\n", - "1. To install Jupyter, run `pip3 install jupyter` on the\n", - "command-line in a terminal shell.\n", - "\n", - "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", - "\n", - "1. Open this notebook in the Jupyter Notebook Dashboard." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i7EUnXsZhAGF" - }, - "source": [ - "## Installation\n", - "\n", - "Install the following packages required to execute this notebook. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2b4ef9b72d43" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "# The Vertex AI Workbench Notebook product has specific requirements\n", - "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", - "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", - " \"/opt/deeplearning/metadata/env_version\"\n", - ")\n", - "\n", - "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", - "USER_FLAG = \"\"\n", - "if IS_WORKBENCH_NOTEBOOK:\n", - " USER_FLAG = \"--user\"\n", - "\n", - "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", - "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", - "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", - "! pip3 install --upgrade matplotlib {USER_FLAG} -q" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hhq5zEbGg0XX" - }, - "source": [ - "### Restart the kernel\n", - "\n", - "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EzrelQZ22IZj" - }, - "outputs": [], - "source": [ - "# Automatically restart kernel after installs\n", - "import os\n", - "\n", - "if not os.getenv(\"IS_TESTING\"):\n", - " # Automatically restart kernel after installs\n", - " import IPython\n", - "\n", - " app = IPython.Application.instance()\n", - " app.kernel.do_shutdown(True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lWEdiXsJg0XY" - }, - "source": [ - "## Before you begin" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BF1j6f9HApxa" - }, - "source": [ - "### Set up your Google Cloud project\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", - "\n", - "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", - "\n", - "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", - "\n", - "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", - "\n", - "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", - "Cloud SDK uses the right project for all the commands in this notebook.\n", - "\n", - "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WReHDGG5g0XY" - }, - "source": [ - "#### Set your project ID\n", - "\n", - "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "oM1iC_MfAts1" - }, - "outputs": [], - "source": [ - "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "riG_qUokg0XZ" - }, - "outputs": [], - "source": [ - "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", - " # Get your GCP project id from gcloud\n", - " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", - " PROJECT_ID = shell_output[0]\n", - " print(\"Project ID:\", PROJECT_ID)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "set_gcloud_project_id" - }, - "outputs": [], - "source": [ - "! gcloud config set project $PROJECT_ID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "region" - }, - "source": [ - "#### Region\n", - "\n", - "You can also change the `REGION` variable, which is used for operations\n", - "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", - "\n", - "- Americas: `us-central1`\n", - "- Europe: `europe-west4`\n", - "- Asia Pacific: `asia-east1`\n", - "\n", - "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", - "\n", - "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sduDOFQVF6kv" - }, - "outputs": [], - "source": [ - "REGION = \"[your-region]\" # @param {type: \"string\"}\n", - "\n", - "if REGION == \"[your-region]\":\n", - " REGION = \"us-central1\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "06571eb4063b" - }, - "source": [ - "#### UUID\n", - "\n", - "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "697568e92bd6" - }, - "outputs": [], - "source": [ - "import random\n", - "import string\n", - "\n", - "\n", - "# Generate a uuid of a specifed length(default=8)\n", - "def generate_uuid(length: int = 8) -> str:\n", - " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", - "\n", - "\n", - "UUID = generate_uuid()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dr--iN2kAylZ" - }, - "source": [ - "### Authenticate your Google Cloud account\n", - "\n", - "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", - "authenticated. Skip this step." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sBCra4QMA2wR" - }, - "source": [ - "**If you are using Colab**, run the cell below and follow the instructions\n", - "when prompted to authenticate your account via oAuth.\n", - "\n", - "**Otherwise**, follow these steps:\n", - "\n", - "1. In the Cloud Console, go to the [**Create service account key**\n", - " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", - "\n", - "2. Click **Create service account**.\n", - "\n", - "3. In the **Service account name** field, enter a name, and\n", - " click **Create**.\n", - "\n", - "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", - "into the filter box, and select\n", - " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", - "\n", - "5. Click **Create**. A JSON file that contains your key downloads to your\n", - "local environment.\n", - "\n", - "6. Enter the path to your service account key as the\n", - "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PyQmSRbKA8r-" - }, - "outputs": [], - "source": [ - "# If you are running this notebook in Colab, run this cell and follow the\n", - "# instructions to authenticate your GCP account. This provides access to your\n", - "# Cloud Storage bucket and lets you submit training jobs and prediction\n", - "# requests.\n", - "\n", - "import os\n", - "import sys\n", - "\n", - "# If on Vertex AI Workbench, then don't execute this code\n", - "IS_COLAB = \"google.colab\" in sys.modules\n", - "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", - " \"DL_ANACONDA_HOME\"\n", - "):\n", - " if \"google.colab\" in sys.modules:\n", - " from google.colab import auth as google_auth\n", - "\n", - " google_auth.authenticate_user()\n", - "\n", - " # If you are running this notebook locally, replace the string below with the\n", - " # path to your service account key and run this cell to authenticate your GCP\n", - " # account.\n", - " elif not os.getenv(\"IS_TESTING\"):\n", - " %env GOOGLE_APPLICATION_CREDENTIALS ''" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "zgPO1eR3CYjk" - }, - "source": [ - "### Create a Cloud Storage bucket\n", - "\n", - "**The following steps are required, regardless of your notebook environment.**\n", - "\n", - "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", - "\n", - "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "MzGDU7TWdts_" - }, - "outputs": [], - "source": [ - "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", - "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cf221059d072" - }, - "outputs": [], - "source": [ - "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", - " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", - " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-EcIXiGsCePi" - }, - "source": [ - "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NIq7R4HZCfIc" - }, - "outputs": [], - "source": [ - "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucvCsknMCims" - }, - "source": [ - "Finally, validate access to your Cloud Storage bucket by examining its contents:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "vhOb7YnwClBb" - }, - "outputs": [], - "source": [ - "! gsutil ls -al $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account" - }, - "source": [ - "#### Service Account\n", - "\n", - "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "UwC1AdGeF6kx" - }, - "outputs": [], - "source": [ - "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "autoset_service_account" - }, - "outputs": [], - "source": [ - "if (\n", - " SERVICE_ACCOUNT == \"\"\n", - " or SERVICE_ACCOUNT is None\n", - " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", - "):\n", - " # Get your service account from gcloud\n", - " if not IS_COLAB:\n", - " shell_output = !gcloud auth list 2>/dev/null\n", - " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", - "\n", - " else: # IS_COLAB:\n", - " shell_output = ! gcloud projects describe $PROJECT_ID\n", - " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", - " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", - "\n", - " print(\"Service Account:\", SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "set_service_account:pipelines" - }, - "source": [ - "#### Set service account access for Vertex AI Pipelines\n", - "\n", - "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6OqzKqhMF6kx" - }, - "outputs": [], - "source": [ - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", - "\n", - "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XoEqT2Y4DJmf" - }, - "source": [ - "### Import libraries\n", - "\n", - "Import the Vertex AI Python SDK and other required Python libraries." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pRUOFELefqf1" - }, - "outputs": [], - "source": [ - "import google.cloud.aiplatform as aiplatform\n", - "import json\n", - "import kfp\n", - "import matplotlib.pyplot as plt\n", - "from google.cloud import aiplatform_v1\n", - "from kfp.v2 import compiler" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "init_aip:mbsdk,all" - }, - "source": [ - "### Initialize Vertex AI SDK for Python\n", - "\n", - "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ksAefQcCF6ky" - }, - "outputs": [], - "source": [ - "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8d97acf78771" - }, - "source": [ - "## Create Vertex AI Dataset\n", - "\n", - "Create a managed tabular dataset resource in Vertex AI using the dataset source." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "3390c9e9426c" - }, - "outputs": [], - "source": [ - "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2011a473ce65" - }, - "outputs": [], - "source": [ - "# Create the Vertex AI Dataset resource\n", - "dataset = aiplatform.TabularDataset.create(\n", - " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", - " gcs_source=DATA_SOURCE,\n", - ")\n", - "\n", - "print(\"Resource name:\", dataset.resource_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6da01c2f1d4f" - }, - "source": [ - "## Train AutoML model\n", - "\n", - "Train a simple classification model the created dataset using `Adopted` as the target column. \n", - "\n", - "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "5dd3db2d1225" - }, - "outputs": [], - "source": [ - "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "0614e3fb19da" - }, - "outputs": [], - "source": [ - "# If no display name is specified, use the default one\n", - "if (\n", - " TRAINING_JOB_DISPLAY_NAME == \"\"\n", - " or TRAINING_JOB_DISPLAY_NAME is None\n", - " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", - "):\n", - " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce9c9f279674" - }, - "source": [ - "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", - "\n", - "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", - "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", - "- `column_specs`(Optional): Transformations to apply to the input columns (including data-type corrections).\n", - "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", - "\n", - "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d33629c2aae6" - }, - "outputs": [], - "source": [ - "# Define the AutoML training job\n", - "train_job = aiplatform.AutoMLTabularTrainingJob(\n", - " display_name=TRAINING_JOB_DISPLAY_NAME,\n", - " optimization_prediction_type=\"classification\",\n", - " column_specs={\n", - " \"Type\": \"categorical\",\n", - " \"Age\": \"numeric\",\n", - " \"Breed1\": \"categorical\",\n", - " \"Color1\": \"categorical\",\n", - " \"Color2\": \"categorical\",\n", - " \"MaturitySize\": \"categorical\",\n", - " \"FurLength\": \"categorical\",\n", - " \"Vaccinated\": \"categorical\",\n", - " \"Sterilized\": \"categorical\",\n", - " \"Health\": \"categorical\",\n", - " \"Fee\": \"numeric\",\n", - " \"PhotoAmt\": \"numeric\",\n", - " },\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "391c51c98647" - }, - "source": [ - "Set the display name for the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "454f077b984e" - }, - "outputs": [], - "source": [ - "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "21b5a27e8171" - }, - "outputs": [], - "source": [ - "# If no name is specified, use the default name\n", - "if (\n", - " MODEL_DISPLAY_NAME == \"\"\n", - " or MODEL_DISPLAY_NAME is None\n", - " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", - "):\n", - " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "93ebafd3f347" - }, - "source": [ - "Run training job on the created TabularDataset by passing the following arguments for training:\n", - "\n", - "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", - "- `target_column`: The name of the column values of which the Model is to predict.\n", - "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", - "- `budget_milli_node_hours`(Optional): The training budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", - "\n", - "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", - "\n", - "The training job takes roughly 1.5-2 hours to finish." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "9ce44a2ab942" - }, - "outputs": [], - "source": [ - "# Specify the target column\n", - "target_column = \"Adopted\"\n", - "\n", - "# Run the training job\n", - "model = train_job.run(\n", - " dataset=dataset,\n", - " target_column=target_column,\n", - " model_display_name=MODEL_DISPLAY_NAME,\n", - " budget_milli_node_hours=1000,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bfa52eb3f22f" - }, - "source": [ - "## List model evaluations from training\n", - "\n", - "After the training job is finished, get the model evaluations and print them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d56e2b3cf57d" - }, - "outputs": [], - "source": [ - "# Get evaluations\n", - "model_evaluations = model.list_model_evaluations()\n", - "\n", - "model_evaluation = list(model_evaluations)[0]\n", - "print(model_evaluation)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bd2e1da7a64e" - }, - "outputs": [], - "source": [ - "# Print the evaluation metrics\n", - "for evaluation in model_evaluations:\n", - " evaluation = evaluation.to_dict()\n", - " print(\"Model's evaluation metrics from Training:\\n\")\n", - " metrics = evaluation[\"metrics\"]\n", - " for metric in metrics.keys():\n", - " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "19c434d8b035" - }, - "source": [ - "## Create Pipeline for evaluations\n", - "\n", - "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature attributions on its results. \n", - "\n", - "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) Python package.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ab9f273691cc" - }, - "source": [ - "### Define the Pipeline\n", - "\n", - "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", - "\n", - "The pipeline uses the following components:\n", - "\n", - "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", - "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", - "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", - "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", - "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a Dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", - "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "327d8d4e11b2" - }, - "outputs": [], - "source": [ - "@kfp.dsl.pipeline(\n", - " name=\"vertex-evaluation-automl-tabular-classification-feature-attribution\"\n", - ")\n", - "def evaluation_automl_tabular_feature_attribution_pipeline(\n", - " project: str,\n", - " location: str,\n", - " root_dir: str,\n", - " model_name: str,\n", - " target_column_name: str,\n", - " batch_predict_gcs_source_uris: list,\n", - " batch_predict_instances_format: str,\n", - " batch_predict_predictions_format: str = \"jsonl\",\n", - " batch_predict_machine_type: str = \"n1-standard-4\",\n", - " batch_predict_explanation_metadata: dict = {},\n", - " batch_predict_explanation_parameters: dict = {},\n", - " batch_predict_explanation_data_sample_size: int = 10000,\n", - "):\n", - "\n", - " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", - " from google_cloud_pipeline_components.experimental.evaluation import (\n", - " EvaluationDataSamplerOp, GetVertexModelOp,\n", - " ModelEvaluationClassificationOp, ModelEvaluationFeatureAttributionOp,\n", - " ModelImportEvaluationOp)\n", - "\n", - " # Get the Vertex AI model resource\n", - " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", - "\n", - " # Run Data-sampling task\n", - " data_sampler_task = EvaluationDataSamplerOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " gcs_source_uris=batch_predict_gcs_source_uris,\n", - " instances_format=batch_predict_instances_format,\n", - " sample_size=batch_predict_explanation_data_sample_size,\n", - " )\n", - "\n", - " # Run Batch Explanations\n", - " batch_explain_task = ModelBatchPredictOp(\n", - " project=project,\n", - " location=location,\n", - " model=get_model_task.outputs[\"model\"],\n", - " job_display_name=\"model-registry-batch-predict-evaluation\",\n", - " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", - " instances_format=batch_predict_instances_format,\n", - " predictions_format=batch_predict_predictions_format,\n", - " gcs_destination_output_uri_prefix=root_dir,\n", - " machine_type=batch_predict_machine_type,\n", - " # Set the explanation parameters\n", - " generate_explanation=True,\n", - " explanation_parameters=batch_predict_explanation_parameters,\n", - " explanation_metadata=batch_predict_explanation_metadata,\n", - " )\n", - "\n", - " # Run evaluation based on prediction type and feature attribution component.\n", - " # After, import the model evaluations to the Vertex model.\n", - " eval_task = ModelEvaluationClassificationOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " problem_type=\"classification\",\n", - " ground_truth_column=target_column_name,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " predictions_format=batch_predict_predictions_format,\n", - " )\n", - "\n", - " # Get Feature Attributions\n", - " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", - " project=project,\n", - " location=location,\n", - " root_dir=root_dir,\n", - " predictions_format=batch_predict_predictions_format,\n", - " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", - " )\n", - "\n", - " ModelImportEvaluationOp(\n", - " classification_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", - " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", - " model=get_model_task.outputs[\"model\"],\n", - " dataset_type=batch_predict_instances_format,\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "1abb012ce04b" - }, - "source": [ - "### Compile the pipeline\n", - "\n", - "Next, compile the pipline to the `tabular_classification_pipline.json` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "e526b588cae9" - }, - "outputs": [], - "source": [ - "compiler.Compiler().compile(\n", - " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", - " package_path=\"tabular_classification_pipeline.json\",\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "26eef4b83c88" - }, - "source": [ - "### Define the parameters to run the pipeline\n", - "\n", - "Specify the required parameters to run the pipeline.\n", - "\n", - "Set a display name for your pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "63b84f5490d2" - }, - "outputs": [], - "source": [ - "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "e0a18b803bb7" - }, - "outputs": [], - "source": [ - "# If no display name is set, use the default one\n", - "if (\n", - " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", - " or PIPELINE_DISPLAY_NAME == \"\"\n", - " or PIPELINE_DISPLAY_NAME is None\n", - "):\n", - " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "a9571ef567de" - }, - "source": [ - "To pass the required arguments to the pipeline, you define the following paramters below:\n", - "\n", - "- `project`: Project ID.\n", - "- `location`: Region where the pipeline is run.\n", - "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", - "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", - "- `target_column_name`: Name of the column to be used as the target for classification.\n", - "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", - "- `batch_predict_instances_format`: Format of the input instances for batch prediction. Can be '**jsonl**' or '**bigquery**' or '**csv**'.\n", - "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "52d622c274d2" - }, - "outputs": [], - "source": [ - "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", - "parameters = {\n", - " \"project\": PROJECT_ID,\n", - " \"location\": REGION,\n", - " \"root_dir\": PIPELINE_ROOT,\n", - " \"model_name\": model.resource_name,\n", - " \"target_column_name\": \"Adopted\",\n", - " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", - " \"batch_predict_instances_format\": \"csv\",\n", - " \"batch_predict_explanation_data_sample_size\": 3000,\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0409b0f330c2" - }, - "source": [ - "Create a Vertex AI pipeline job using the following parameters:\n", - "\n", - "- `display_name`: The name of the pipeline, this will show up in the Google Cloud console.\n", - "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI or an Artifact Registry URI.\n", - "- `parameter_values`: The mapping from runtime parameter names to its values that\n", - " control the pipeline run.\n", - "- `enable_caching`: Whether to turn on caching for the run.\n", - "\n", - "Learn more about the `PipelineJob` class from [this documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).\n", - "\n", - "After creating, run the pipeline job using the configured `SERVICE_ACCOUNT`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "894afe1ba396" - }, - "outputs": [], - "source": [ - "job = aiplatform.PipelineJob(\n", - " display_name=PIPELINE_DISPLAY_NAME,\n", - " template_path=\"tabular_classification_pipeline.json\",\n", - " parameter_values=parameters,\n", - " enable_caching=True,\n", - ")\n", - "\n", - "job.run(service_account=SERVICE_ACCOUNT)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ce6beLsXASnK" - }, - "source": [ - "## Model Evaluation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mKRTDi8ioXBY" - }, - "source": [ - "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", - "\n", - "In the UI, many of the pipeline directed acyclic graph (DAG) nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XcKaONSsGNC4" - }, - "source": [ - "### Get the Model Evaluation Results\n", - "\n", - "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "ec4ec00ab350" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the evaluation task\n", - " if (\n", - " (\"model-evaluation\" in task.task_name)\n", - " and (\"model-evaluation-import\" not in task.task_name)\n", - " and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " )\n", - " ):\n", - " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[0]\n", - " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", - "\n", - "print(evaluation_metrics)\n", - "print(evaluation_metrics_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ca00512eb89f" - }, - "source": [ - "### Visualize the metrics\n", - "\n", - "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "f9e38f73f838" - }, - "outputs": [], - "source": [ - "metrics = []\n", - "values = []\n", - "for i in evaluation_metrics.metadata.items():\n", - " metrics.append(i[0])\n", - " values.append(i[1])\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=metrics, height=values)\n", - "plt.title(\"Evaluation Metrics\")\n", - "plt.ylabel(\"Value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "049c9bbae2cb" - }, - "source": [ - "### Get the Feature Attributions\n", - "\n", - "Run the below cell to get the feature attributions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "03ca8c149bc6" - }, - "outputs": [], - "source": [ - "# Iterate over the pipeline tasks\n", - "for task in job._gca_resource.job_detail.task_details:\n", - " # Obtain the artifacts from the feature attribution task\n", - " if (task.task_name == \"feature-attribution\") and (\n", - " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", - " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", - " ):\n", - " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", - " feat_attrs_gcs_uri = feat_attrs.uri\n", - "\n", - "print(feat_attrs)\n", - "print(feat_attrs_gcs_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "719d2cd57d10" - }, - "source": [ - "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "82e308dd8aca" - }, - "outputs": [], - "source": [ - "# Load the results\n", - "attributions = !gsutil cat $feat_attrs_gcs_uri\n", - "\n", - "# Print the results obtained\n", - "attributions = json.loads(attributions[0])\n", - "print(attributions)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5bfe517357f8" - }, - "source": [ - "### Visualize the Feature Attributions\n", - "\n", - "Visualize the obtained attributions for each feature using a bar-chart." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "d7a7dca9e3cc" - }, - "outputs": [], - "source": [ - "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", - "features = []\n", - "attr_values = []\n", - "for key, value in data.items():\n", - " features.append(key)\n", - " attr_values.append(value)\n", - "\n", - "plt.figure(figsize=(5, 3))\n", - "plt.bar(x=features, height=attr_values)\n", - "plt.title(\"Feature Attributions\")\n", - "plt.xticks(rotation=90)\n", - "plt.ylabel(\"Attribution value\")\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TpV-iwP9qw9c" - }, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can delete the individual resources you created in this tutorial.\n", - "\n", - "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "sx_vKniMq9ZX" - }, - "outputs": [], - "source": [ - "# Delete model resource\n", - "model.delete()\n", - "\n", - "# Delete the dataset resource\n", - "dataset.delete()\n", - "\n", - "# Delete the training job\n", - "train_job.delete()\n", - "\n", - "# Delete the evaluation pipeline\n", - "job.delete()\n", - "\n", - "# Delete Cloud Storage objects\n", - "delete_bucket = False\n", - "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", - " ! gsutil -m rm -r $BUCKET_URI" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "automl_tabular_classification_model_evaluation.ipynb", - "toc_visible": true - }, - "environment": { - "kernel": "python3", - "name": "common-cpu.m90", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/base-cpu:m90" - }, - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.12" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ur8xi4C7S06n" + }, + "outputs": [], + "source": [ + "# Copyright 2022 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JAPoU8Sm5E6e" + }, + "source": [ + "# Vertex AI Pipelines: Evaluating BatchPrediction results from AutoML Tabular Classification model\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tvgnzT1CKxrO" + }, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to use the Vertex AI classification model evaluation component to evaluate an AutoML classification model. Model evaluation helps you determine your model performance based on the evaluation metrics and improve the model if necessary. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d975e698c9a4" + }, + "source": [ + "### Objective\n", + "\n", + "In this tutorial, you train a Vertex AI AutoML Tabular Classification model and learn how to evaluate it through a Vertex AI pipeline job using `google_cloud_pipeline_components`:\n", + "\n", + "This tutorial uses the following Google Cloud ML services and resources:\n", + "\n", + "- Vertex AI `Datasets`\n", + "- Vertex AI `Training`(AutoML Tabular Classification) \n", + "- Vertex AI `Model Registry`\n", + "- Vertex AI `Pipelines`\n", + "- Vertex AI `Batch Predictions`\n", + "\n", + "\n", + "\n", + "The steps performed include:\n", + "\n", + "- Create a Vertex AI `Dataset`.\n", + "- Train a Automl Tabular Classification model on the `Dataset` resource.\n", + "- Import the trained `AutoML model resource` into the pipeline.\n", + "- Run a `Batch Prediction` job.\n", + "- Evaulate the AutoML model using the `Classification Evaluation Component`.\n", + "- Import the Classification Metrics to the AutoML model resource." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "08d289fa873f" + }, + "source": [ + "### Dataset\n", + "\n", + "The dataset being used in this notebook is a part of the PetFinder Dataset, available [here](https://www.kaggle.com/c/petfinder-adoption-prediction) on Kaggle. The current dataset is only a part of the original dataset considered for the problem of predicting whether the pet is adopted or not. It consists of the following fields:\n", + "\n", + "- `Type`: Type of animal (1 = Dog, 2 = Cat)\n", + "- `Age`: Age of pet when listed, in months\n", + "- `Breed1`: Primary breed of pet\n", + "- `Gender`: Gender of pet\n", + "- `Color1`: Color 1 of pet \n", + "- `Color2`: Color 2 of pet\n", + "- `MaturitySize`: Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)\n", + "- `FurLength`: Fur length (1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified)\n", + "- `Vaccinated`: Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Sterilized`: Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)\n", + "- `Health`: Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)\n", + "- `Fee`: Adoption fee (0 = Free)\n", + "- `PhotoAmt`: Total uploaded photos for this pet\n", + "- `Adopted`: Whether or not the pet was adopted (Yes/No).\n", + "\n", + "**Note**: This dataset is moved to a public Cloud Storage bucket from where it is accessed in this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aed92deeb4a0" + }, + "source": [ + "### Costs \n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI\n", + "* Cloud Storage\n", + "\n", + "Learn about [Vertex AI\n", + "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", + "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", + "Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ze4-nDLfK4pw" + }, + "source": [ + "### Set up your local development environment\n", + "\n", + "**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets\n", + "all the requirements to run this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gCuSR8GkAgzl" + }, + "source": [ + "**Otherwise**, make sure your environment meets this notebook's requirements.\n", + "You need the following:\n", + "\n", + "* The Google Cloud SDK\n", + "* Git\n", + "* Python 3\n", + "* virtualenv\n", + "* Jupyter notebook running in a virtual environment with Python 3\n", + "\n", + "The Google Cloud guide to [Setting up a Python development\n", + "environment](https://cloud.google.com/python/setup) and the [Jupyter\n", + "installation guide](https://jupyter.org/install) provide detailed instructions\n", + "for meeting these requirements. The following steps provide a condensed set of\n", + "instructions:\n", + "\n", + "1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)\n", + "\n", + "1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)\n", + "\n", + "1. [Install\n", + " virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)\n", + " and create a virtual environment that uses Python 3. Activate the virtual environment.\n", + "\n", + "1. To install Jupyter, run `pip3 install jupyter` on the\n", + "command-line in a terminal shell.\n", + "\n", + "1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.\n", + "\n", + "1. Open this notebook in the Jupyter Notebook Dashboard." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i7EUnXsZhAGF" + }, + "source": [ + "## Installation\n", + "\n", + "Install the following packages required to execute this notebook. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2b4ef9b72d43" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# The Vertex AI Workbench Notebook product has specific requirements\n", + "IS_WORKBENCH_NOTEBOOK = os.getenv(\"DL_ANACONDA_HOME\")\n", + "IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(\n", + " \"/opt/deeplearning/metadata/env_version\"\n", + ")\n", + "\n", + "# Vertex AI Notebook requires dependencies to be installed with '--user'\n", + "USER_FLAG = \"\"\n", + "if IS_WORKBENCH_NOTEBOOK:\n", + " USER_FLAG = \"--user\"\n", + "\n", + "! pip3 install --upgrade google-cloud-aiplatform {USER_FLAG} -q\n", + "! pip3 install google-cloud-pipeline-components==1.0.17 {USER_FLAG} -q\n", + "! pip3 install --upgrade kfp google-cloud-pipeline-components {USER_FLAG} -q\n", + "! pip3 install --upgrade matplotlib {USER_FLAG} -q" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hhq5zEbGg0XX" + }, + "source": [ + "### Restart the kernel\n", + "\n", + "After you install the additional packages, you need to restart the notebook kernel so it can find the packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EzrelQZ22IZj" + }, + "outputs": [], + "source": [ + "# Automatically restart kernel after installs\n", + "import os\n", + "\n", + "if not os.getenv(\"IS_TESTING\"):\n", + " # Automatically restart kernel after installs\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lWEdiXsJg0XY" + }, + "source": [ + "## Before you begin" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BF1j6f9HApxa" + }, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", + "\n", + "1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "1. [Enable the Vertex AI, Compute Engine, and Dataflow APIs](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,dataflow.googleapis.com).\n", + "\n", + "1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).\n", + "\n", + "1. Enter your project ID in the cell below. Then run the cell to make sure the\n", + "Cloud SDK uses the right project for all the commands in this notebook.\n", + "\n", + "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WReHDGG5g0XY" + }, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oM1iC_MfAts1" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "riG_qUokg0XZ" + }, + "outputs": [], + "source": [ + "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", + " # Get your GCP project id from gcloud\n", + " shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n", + " PROJECT_ID = shell_output[0]\n", + " print(\"Project ID:\", PROJECT_ID)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "set_gcloud_project_id" + }, + "outputs": [], + "source": [ + "! gcloud config set project $PROJECT_ID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "region" + }, + "source": [ + "#### Region\n", + "\n", + "You can also change the `REGION` variable, which is used for operations\n", + "throughout the rest of this notebook. Below are regions supported for Vertex AI. It is recommended that you choose the region closest to you.\n", + "\n", + "- Americas: `us-central1`\n", + "- Europe: `europe-west4`\n", + "- Asia Pacific: `asia-east1`\n", + "\n", + "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n", + "\n", + "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sduDOFQVF6kv" + }, + "outputs": [], + "source": [ + "REGION = \"[your-region]\" # @param {type: \"string\"}\n", + "\n", + "if REGION == \"[your-region]\":\n", + " REGION = \"us-central1\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "06571eb4063b" + }, + "source": [ + "#### UUID\n", + "\n", + "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "697568e92bd6" + }, + "outputs": [], + "source": [ + "import random\n", + "import string\n", + "\n", + "\n", + "# Generate a uuid of a specifed length(default=8)\n", + "def generate_uuid(length: int = 8) -> str:\n", + " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", + "\n", + "\n", + "UUID = generate_uuid()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dr--iN2kAylZ" + }, + "source": [ + "### Authenticate your Google Cloud account\n", + "\n", + "**If you are using Vertex AI Workbench Notebooks**, your environment is already\n", + "authenticated. Skip this step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBCra4QMA2wR" + }, + "source": [ + "**If you are using Colab**, run the cell below and follow the instructions\n", + "when prompted to authenticate your account via oAuth.\n", + "\n", + "**Otherwise**, follow these steps:\n", + "\n", + "1. In the Cloud Console, go to the [**Create service account key**\n", + " page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).\n", + "\n", + "2. Click **Create service account**.\n", + "\n", + "3. In the **Service account name** field, enter a name, and\n", + " click **Create**.\n", + "\n", + "4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type \"Vertex AI\"\n", + "into the filter box, and select\n", + " **Vertex AI Administrator**. Type \"Storage Object Admin\" into the filter box, and select **Storage Object Admin**.\n", + "\n", + "5. Click **Create**. A JSON file that contains your key downloads to your\n", + "local environment.\n", + "\n", + "6. Enter the path to your service account key as the\n", + "`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PyQmSRbKA8r-" + }, + "outputs": [], + "source": [ + "# If you are running this notebook in Colab, run this cell and follow the\n", + "# instructions to authenticate your GCP account. This provides access to your\n", + "# Cloud Storage bucket and lets you submit training jobs and prediction\n", + "# requests.\n", + "\n", + "import os\n", + "import sys\n", + "\n", + "# If on Vertex AI Workbench, then don't execute this code\n", + "IS_COLAB = \"google.colab\" in sys.modules\n", + "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\") and not os.getenv(\n", + " \"DL_ANACONDA_HOME\"\n", + "):\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth as google_auth\n", + "\n", + " google_auth.authenticate_user()\n", + "\n", + " # If you are running this notebook locally, replace the string below with the\n", + " # path to your service account key and run this cell to authenticate your GCP\n", + " # account.\n", + " elif not os.getenv(\"IS_TESTING\"):\n", + " %env GOOGLE_APPLICATION_CREDENTIALS ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zgPO1eR3CYjk" + }, + "source": [ + "### Create a Cloud Storage bucket\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "When you run a Vertex AI pipeline job using the Cloud SDK, your job stores the pipeline artifacts to a Cloud Storage bucket. In this tutorial, you create a Vertex AI Pipeline job that saves the artifacts like evaluation metrics and feature attributes to a Cloud Storage bucket.\n", + "\n", + "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MzGDU7TWdts_" + }, + "outputs": [], + "source": [ + "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}\n", + "BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cf221059d072" + }, + "outputs": [], + "source": [ + "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", + " BUCKET_NAME = PROJECT_ID + \"aip-\" + UUID\n", + " BUCKET_URI = f\"gs://{BUCKET_NAME}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-EcIXiGsCePi" + }, + "source": [ + "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NIq7R4HZCfIc" + }, + "outputs": [], + "source": [ + "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucvCsknMCims" + }, + "source": [ + "Finally, validate access to your Cloud Storage bucket by examining its contents:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vhOb7YnwClBb" + }, + "outputs": [], + "source": [ + "! gsutil ls -al $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account" + }, + "source": [ + "#### Service Account\n", + "\n", + "You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UwC1AdGeF6kx" + }, + "outputs": [], + "source": [ + "SERVICE_ACCOUNT = \"[your-service-account]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "autoset_service_account" + }, + "outputs": [], + "source": [ + "if (\n", + " SERVICE_ACCOUNT == \"\"\n", + " or SERVICE_ACCOUNT is None\n", + " or SERVICE_ACCOUNT == \"[your-service-account]\"\n", + "):\n", + " # Get your service account from gcloud\n", + " if not IS_COLAB:\n", + " shell_output = !gcloud auth list 2>/dev/null\n", + " SERVICE_ACCOUNT = shell_output[2].replace(\"*\", \"\").strip()\n", + "\n", + " else: # IS_COLAB:\n", + " shell_output = ! gcloud projects describe $PROJECT_ID\n", + " project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n", + " SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n", + "\n", + " print(\"Service Account:\", SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "set_service_account:pipelines" + }, + "source": [ + "#### Set service account access for Vertex AI Pipelines\n", + "\n", + "Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6OqzKqhMF6kx" + }, + "outputs": [], + "source": [ + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI\n", + "\n", + "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XoEqT2Y4DJmf" + }, + "source": [ + "### Import libraries\n", + "\n", + "Import the Vertex AI Python SDK and other required Python libraries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRUOFELefqf1" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import google.cloud.aiplatform as aiplatform\n", + "import kfp\n", + "import matplotlib.pyplot as plt\n", + "from google.cloud import aiplatform_v1\n", + "from kfp.v2 import compiler" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "init_aip:mbsdk,all" + }, + "source": [ + "### Initialize Vertex AI SDK for Python\n", + "\n", + "Initialize the Vertex AI SDK for Python for your project and corresponding bucket." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ksAefQcCF6ky" + }, + "outputs": [], + "source": [ + "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8d97acf78771" + }, + "source": [ + "## Create Vertex AI Dataset\n", + "\n", + "Create a managed tabular dataset resource in Vertex AI using the dataset source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3390c9e9426c" + }, + "outputs": [], + "source": [ + "DATA_SOURCE = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2011a473ce65" + }, + "outputs": [], + "source": [ + "# Create the Vertex AI Dataset resource\n", + "dataset = aiplatform.TabularDataset.create(\n", + " display_name=\"petfinder-tabular-dataset_\" + UUID,\n", + " gcs_source=DATA_SOURCE,\n", + ")\n", + "\n", + "print(\"Resource name:\", dataset.resource_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6da01c2f1d4f" + }, + "source": [ + "## Train AutoML model\n", + "\n", + "Train a simple classification model the created dataset using `Adopted` as the target column. \n", + "\n", + "Set a display name and create the training job using `AutoMLTabularTrainingJob` with appropriate data types specified for column transformations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5dd3db2d1225" + }, + "outputs": [], + "source": [ + "TRAINING_JOB_DISPLAY_NAME = \"[your-train-job-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0614e3fb19da" + }, + "outputs": [], + "source": [ + "# If no display name is specified, use the default one\n", + "if (\n", + " TRAINING_JOB_DISPLAY_NAME == \"\"\n", + " or TRAINING_JOB_DISPLAY_NAME is None\n", + " or TRAINING_JOB_DISPLAY_NAME == \"[your-train-job-display-name]\"\n", + "):\n", + " TRAINING_JOB_DISPLAY_NAME = \"train-petfinder-automl_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce9c9f279674" + }, + "source": [ + "`AutoMLTabularTrainingJob` class creates an AutoML training job using the following parameters: \n", + "\n", + "- `display_name`: The human readable name for the Vertex AI TrainingJob resource.\n", + "- `optimization_prediction_type`: The type of prediction the Model is to produce. Ex: regression, classification.\n", + "- `column_specs`(Optional): Transformations to apply to the input columns (including data-type corrections).\n", + "- `optimization_objective`: The optimization objective to minimize or maximize. Depending on the type of prediction, this parameter is chosen. If the field is not set, the default objective function is used. \n", + "\n", + "For more details, please go through this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d33629c2aae6" + }, + "outputs": [], + "source": [ + "# Define the AutoML training job\n", + "train_job = aiplatform.AutoMLTabularTrainingJob(\n", + " display_name=TRAINING_JOB_DISPLAY_NAME,\n", + " optimization_prediction_type=\"classification\",\n", + " column_specs={\n", + " \"Type\": \"categorical\",\n", + " \"Age\": \"numeric\",\n", + " \"Breed1\": \"categorical\",\n", + " \"Color1\": \"categorical\",\n", + " \"Color2\": \"categorical\",\n", + " \"MaturitySize\": \"categorical\",\n", + " \"FurLength\": \"categorical\",\n", + " \"Vaccinated\": \"categorical\",\n", + " \"Sterilized\": \"categorical\",\n", + " \"Health\": \"categorical\",\n", + " \"Fee\": \"numeric\",\n", + " \"PhotoAmt\": \"numeric\",\n", + " },\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "391c51c98647" + }, + "source": [ + "Set the display name for the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "454f077b984e" + }, + "outputs": [], + "source": [ + "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "21b5a27e8171" + }, + "outputs": [], + "source": [ + "# If no name is specified, use the default name\n", + "if (\n", + " MODEL_DISPLAY_NAME == \"\"\n", + " or MODEL_DISPLAY_NAME is None\n", + " or MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", + "):\n", + " MODEL_DISPLAY_NAME = \"pet-adoption-prediction-model_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "93ebafd3f347" + }, + "source": [ + "Run training job on the created TabularDataset by passing the following arguments for training:\n", + "\n", + "- `dataset`: The TabularDataset within the same Project from which data needs to be used to train the Model.\n", + "- `target_column`: The name of the column values of which the Model is to predict.\n", + "- `model_display_name`: The display name of the Vertex AI Model that is produced as an output. \n", + "- `budget_milli_node_hours`(Optional): The training budget of creating this Model, expressed in milli node hours i.e. 1,000 value in this field means 1 node hour. The training cost of the model does not exceed this budget.\n", + "\n", + "For more details on the other parameters used in the `run`() method, please visit this [documentation for **AutoMLTabularTrainingJob** Class](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.AutoMLTabularTrainingJob#google_cloud_aiplatform_AutoMLTabularTrainingJob_run).\n", + "\n", + "The training job takes roughly 1.5-2 hours to finish." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9ce44a2ab942" + }, + "outputs": [], + "source": [ + "# Specify the target column\n", + "target_column = \"Adopted\"\n", + "\n", + "# Run the training job\n", + "model = train_job.run(\n", + " dataset=dataset,\n", + " target_column=target_column,\n", + " model_display_name=MODEL_DISPLAY_NAME,\n", + " budget_milli_node_hours=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bfa52eb3f22f" + }, + "source": [ + "## List model evaluations from training\n", + "\n", + "After the training job is finished, get the model evaluations and print them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d56e2b3cf57d" + }, + "outputs": [], + "source": [ + "# Get evaluations\n", + "model_evaluations = model.list_model_evaluations()\n", + "\n", + "model_evaluation = list(model_evaluations)[0]\n", + "print(model_evaluation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bd2e1da7a64e" + }, + "outputs": [], + "source": [ + "# Print the evaluation metrics\n", + "for evaluation in model_evaluations:\n", + " evaluation = evaluation.to_dict()\n", + " print(\"Model's evaluation metrics from Training:\\n\")\n", + " metrics = evaluation[\"metrics\"]\n", + " for metric in metrics.keys():\n", + " print(f\"metric: {metric}, value: {metrics[metric]}\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "19c434d8b035" + }, + "source": [ + "## Create Pipeline for evaluations\n", + "\n", + "Now, you run a Vertex AI BatchPrediction job and generate evaluations and feature attributions on its results. \n", + "\n", + "To do so, you create a Vertex AI pipeline using the components available from the [`google-cloud-pipeline-components`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/index.html) Python package.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ab9f273691cc" + }, + "source": [ + "### Define the Pipeline\n", + "\n", + "While defining the flow of the pipeline, you get the model resource first. Then, you sample the provided source dataset for batch predictions and create a batch prediction. The explanations are enabled while creating the batch prediction job to generate feature attributions. Once the batch prediction job is completed, you get the classification evaluation metrics and the feature attributions from the results.\n", + "\n", + "The pipeline uses the following components:\n", + "\n", + "- `GetVertexModelOp`: Gets a Vertex Model Artifact. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.GetVertexModelOp).\n", + "- `EvaluationDataSamplerOp`: Randomly downsamples an input dataset to a specified size for computing Vertex XAI feature attributions for AutoML Tables and custom models. Creates a Dataflow job with Apache Beam to downsample the dataset. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.EvaluationDataSamplerOp).\n", + "- `ModelBatchPredictOp`: Creates a Google Cloud Vertex BatchPredictionJob and waits for it to complete. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.aiplatform.html#google_cloud_pipeline_components.aiplatform.ModelBatchPredictOp).\n", + "- `ModelEvaluationClassificationOp`: Compute evaluation metrics on a trained model’s batch prediction results. Creates a Dataflow job with Apache Beam and TFMA to compute evaluation metrics. Supports mutliclass classification evaluation for tabular, image, video, and text data. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationClassificationOp).\n", + "- `ModelEvaluationFeatureAttributionOp`: Compute feature attribution on a trained model’s batch explanation results. Creates a Dataflow job with Apache Beam and TFMA to compute feature attributions. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelEvaluationFeatureAttributionOp).\n", + "- `ModelImportEvaluationOp`: Imports a model evaluation artifact to an existing Vertex model with ModelService.ImportModelEvaluation. For more details, please check [here](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-1.0.17/google_cloud_pipeline_components.experimental.evaluation.html#google_cloud_pipeline_components.experimental.evaluation.ModelImportEvaluationOp)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "327d8d4e11b2" + }, + "outputs": [], + "source": [ + "@kfp.dsl.pipeline(\n", + " name=\"vertex-evaluation-automl-tabular-classification-feature-attribution\"\n", + ")\n", + "def evaluation_automl_tabular_feature_attribution_pipeline(\n", + " project: str,\n", + " location: str,\n", + " root_dir: str,\n", + " model_name: str,\n", + " target_column_name: str,\n", + " batch_predict_gcs_source_uris: list,\n", + " batch_predict_instances_format: str,\n", + " batch_predict_predictions_format: str = \"jsonl\",\n", + " batch_predict_machine_type: str = \"n1-standard-4\",\n", + " batch_predict_explanation_metadata: dict = {},\n", + " batch_predict_explanation_parameters: dict = {},\n", + " batch_predict_explanation_data_sample_size: int = 10000,\n", + "):\n", + "\n", + " from google_cloud_pipeline_components.aiplatform import ModelBatchPredictOp\n", + " from google_cloud_pipeline_components.experimental.evaluation import (\n", + " EvaluationDataSamplerOp, GetVertexModelOp,\n", + " ModelEvaluationClassificationOp, ModelEvaluationFeatureAttributionOp,\n", + " ModelImportEvaluationOp)\n", + "\n", + " # Get the Vertex AI model resource\n", + " get_model_task = GetVertexModelOp(model_resource_name=model_name)\n", + "\n", + " # Run Data-sampling task\n", + " data_sampler_task = EvaluationDataSamplerOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " gcs_source_uris=batch_predict_gcs_source_uris,\n", + " instances_format=batch_predict_instances_format,\n", + " sample_size=batch_predict_explanation_data_sample_size,\n", + " )\n", + "\n", + " # Run Batch Explanations\n", + " batch_explain_task = ModelBatchPredictOp(\n", + " project=project,\n", + " location=location,\n", + " model=get_model_task.outputs[\"model\"],\n", + " job_display_name=\"model-registry-batch-predict-evaluation\",\n", + " gcs_source_uris=data_sampler_task.outputs[\"gcs_output_directory\"],\n", + " instances_format=batch_predict_instances_format,\n", + " predictions_format=batch_predict_predictions_format,\n", + " gcs_destination_output_uri_prefix=root_dir,\n", + " machine_type=batch_predict_machine_type,\n", + " # Set the explanation parameters\n", + " generate_explanation=True,\n", + " explanation_parameters=batch_predict_explanation_parameters,\n", + " explanation_metadata=batch_predict_explanation_metadata,\n", + " )\n", + "\n", + " # Run evaluation based on prediction type and feature attribution component.\n", + " # After, import the model evaluations to the Vertex model.\n", + " eval_task = ModelEvaluationClassificationOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " problem_type=\"classification\",\n", + " ground_truth_column=target_column_name,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " predictions_format=batch_predict_predictions_format,\n", + " )\n", + "\n", + " # Get Feature Attributions\n", + " feature_attribution_task = ModelEvaluationFeatureAttributionOp(\n", + " project=project,\n", + " location=location,\n", + " root_dir=root_dir,\n", + " predictions_format=batch_predict_predictions_format,\n", + " predictions_gcs_source=batch_explain_task.outputs[\"gcs_output_directory\"],\n", + " )\n", + "\n", + " ModelImportEvaluationOp(\n", + " classification_metrics=eval_task.outputs[\"evaluation_metrics\"],\n", + " feature_attributions=feature_attribution_task.outputs[\"feature_attributions\"],\n", + " model=get_model_task.outputs[\"model\"],\n", + " dataset_type=batch_predict_instances_format,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1abb012ce04b" + }, + "source": [ + "### Compile the pipeline\n", + "\n", + "Next, compile the pipline to the `tabular_classification_pipline.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e526b588cae9" + }, + "outputs": [], + "source": [ + "compiler.Compiler().compile(\n", + " pipeline_func=evaluation_automl_tabular_feature_attribution_pipeline,\n", + " package_path=\"tabular_classification_pipeline.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "26eef4b83c88" + }, + "source": [ + "### Define the parameters to run the pipeline\n", + "\n", + "Specify the required parameters to run the pipeline.\n", + "\n", + "Set a display name for your pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "63b84f5490d2" + }, + "outputs": [], + "source": [ + "PIPELINE_DISPLAY_NAME = \"[your-pipeline-display-name]\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e0a18b803bb7" + }, + "outputs": [], + "source": [ + "# If no display name is set, use the default one\n", + "if (\n", + " PIPELINE_DISPLAY_NAME == \"[your-pipeline-display-name]\"\n", + " or PIPELINE_DISPLAY_NAME == \"\"\n", + " or PIPELINE_DISPLAY_NAME is None\n", + "):\n", + " PIPELINE_DISPLAY_NAME = \"pet_adoption_\" + UUID" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a9571ef567de" + }, + "source": [ + "To pass the required arguments to the pipeline, you define the following paramters below:\n", + "\n", + "- `project`: Project ID.\n", + "- `location`: Region where the pipeline is run.\n", + "- `root_dir`: The GCS directory for keeping staging files and artifacts. A random subdirectory is created under the directory to keep job info for resuming the job in case of failure.\n", + "- `model_name`: Resource name of the trained AutoML Tabular Classification model.\n", + "- `target_column_name`: Name of the column to be used as the target for classification.\n", + "- `batch_predict_gcs_source_uris`: List of the Cloud Storage bucket uris of input instances for batch prediction.\n", + "- `batch_predict_instances_format`: Format of the input instances for batch prediction. Can be '**jsonl**' or '**bigquery**' or '**csv**'.\n", + "- `batch_predict_explanation_data_sample_size`: Size of the samples to be considered for batch prediction and evaluation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "52d622c274d2" + }, + "outputs": [], + "source": [ + "PIPELINE_ROOT = f\"{BUCKET_URI}/pipeline_root/pet_adoption_{UUID}\"\n", + "parameters = {\n", + " \"project\": PROJECT_ID,\n", + " \"location\": REGION,\n", + " \"root_dir\": PIPELINE_ROOT,\n", + " \"model_name\": model.resource_name,\n", + " \"target_column_name\": \"Adopted\",\n", + " \"batch_predict_gcs_source_uris\": [DATA_SOURCE],\n", + " \"batch_predict_instances_format\": \"csv\",\n", + " \"batch_predict_explanation_data_sample_size\": 3000,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0409b0f330c2" + }, + "source": [ + "Create a Vertex AI pipeline job using the following parameters:\n", + "\n", + "- `display_name`: The name of the pipeline, this will show up in the Google Cloud console.\n", + "- `template_path`: The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI or an Artifact Registry URI.\n", + "- `parameter_values`: The mapping from runtime parameter names to its values that\n", + " control the pipeline run.\n", + "- `enable_caching`: Whether to turn on caching for the run.\n", + "\n", + "Learn more about the `PipelineJob` class from [this documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).\n", + "\n", + "After creating, run the pipeline job using the configured `SERVICE_ACCOUNT`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "894afe1ba396" + }, + "outputs": [], + "source": [ + "job = aiplatform.PipelineJob(\n", + " display_name=PIPELINE_DISPLAY_NAME,\n", + " template_path=\"tabular_classification_pipeline.json\",\n", + " parameter_values=parameters,\n", + " enable_caching=True,\n", + ")\n", + "\n", + "job.run(service_account=SERVICE_ACCOUNT)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ce6beLsXASnK" + }, + "source": [ + "## Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKRTDi8ioXBY" + }, + "source": [ + "In the results from last step, click on the generated link to see your run in the Cloud Console.\n", + "\n", + "In the UI, many of the pipeline directed acyclic graph (DAG) nodes expand or collapse when you click on them. Here is a partially-expanded view of the DAG (click image to see larger version).\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XcKaONSsGNC4" + }, + "source": [ + "### Get the Model Evaluation Results\n", + "\n", + "After the evalution pipeline is finished, run the below cell to print the evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ec4ec00ab350" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the evaluation task\n", + " if (\n", + " (\"model-evaluation\" in task.task_name)\n", + " and (\"model-evaluation-import\" not in task.task_name)\n", + " and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " )\n", + " ):\n", + " evaluation_metrics = task.outputs.get(\"evaluation_metrics\").artifacts[0]\n", + " evaluation_metrics_gcs_uri = evaluation_metrics.uri\n", + "\n", + "print(evaluation_metrics)\n", + "print(evaluation_metrics_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ca00512eb89f" + }, + "source": [ + "### Visualize the metrics\n", + "\n", + "Visualize the available metrics like `auRoc` and `logLoss` using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "f9e38f73f838" + }, + "outputs": [], + "source": [ + "metrics = []\n", + "values = []\n", + "for i in evaluation_metrics.metadata.items():\n", + " metrics.append(i[0])\n", + " values.append(i[1])\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=metrics, height=values)\n", + "plt.title(\"Evaluation Metrics\")\n", + "plt.ylabel(\"Value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "049c9bbae2cb" + }, + "source": [ + "### Get the Feature Attributions\n", + "\n", + "Run the below cell to get the feature attributions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "03ca8c149bc6" + }, + "outputs": [], + "source": [ + "# Iterate over the pipeline tasks\n", + "for task in job._gca_resource.job_detail.task_details:\n", + " # Obtain the artifacts from the feature attribution task\n", + " if (task.task_name == \"feature-attribution\") and (\n", + " task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED\n", + " or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED\n", + " ):\n", + " feat_attrs = task.outputs.get(\"feature_attributions\").artifacts[0]\n", + " feat_attrs_gcs_uri = feat_attrs.uri\n", + "\n", + "print(feat_attrs)\n", + "print(feat_attrs_gcs_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "719d2cd57d10" + }, + "source": [ + "From the obtained Cloud Storage uri for the feature attributions, get the attribution values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "82e308dd8aca" + }, + "outputs": [], + "source": [ + "# Load the results\n", + "attributions = !gsutil cat $feat_attrs_gcs_uri\n", + "\n", + "# Print the results obtained\n", + "attributions = json.loads(attributions[0])\n", + "print(attributions)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5bfe517357f8" + }, + "source": [ + "### Visualize the Feature Attributions\n", + "\n", + "Visualize the obtained attributions for each feature using a bar-chart." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d7a7dca9e3cc" + }, + "outputs": [], + "source": [ + "data = attributions[\"explanation\"][\"attributions\"][0][\"featureAttributions\"]\n", + "features = []\n", + "attr_values = []\n", + "for key, value in data.items():\n", + " features.append(key)\n", + " attr_values.append(value)\n", + "\n", + "plt.figure(figsize=(5, 3))\n", + "plt.bar(x=features, height=attr_values)\n", + "plt.title(\"Feature Attributions\")\n", + "plt.xticks(rotation=90)\n", + "plt.ylabel(\"Attribution value\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TpV-iwP9qw9c" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can delete the individual resources you created in this tutorial.\n", + "\n", + "Set `delete_bucket` to **True** to create the Cloud Storage bucket created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sx_vKniMq9ZX" + }, + "outputs": [], + "source": [ + "# Delete model resource\n", + "model.delete()\n", + "\n", + "# Delete the dataset resource\n", + "dataset.delete()\n", + "\n", + "# Delete the training job\n", + "train_job.delete()\n", + "\n", + "# Delete the evaluation pipeline\n", + "job.delete()\n", + "\n", + "# Delete Cloud Storage objects\n", + "delete_bucket = False\n", + "if delete_bucket or os.getenv(\"IS_TESTING\"):\n", + " ! gsutil -m rm -r $BUCKET_URI" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "automl_tabular_classification_model_evaluation.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 }