Don’t Just Dream to become the certified Pro-Hadoop Developer, Achieve it Choosing the World Class Trainer who can help you learn the course from 0 Level to Advanced Level like a Pro.
♦ Start GCP instance terminal :
♦ In Cloud Shell, set the default Compute Engine zone to the zone where you are going to create your Cloud Dataproc clusters.
export REGION=us-central1 export ZONE=us-central1-a gcloud config set compute/zone $ZONE
♦ Enable the Cloud Dataproc and Cloud SQL Admin APIs by running this command in Cloud Shell:
gcloud services enable dataproc.googleapis.com sqladmin.googleapis.com
♦ Create a warehouse bucket that will host the Hive data and be shared by all Hive servers :
export PROJECT=$(gcloud info --format='value(config.project)') gsutil mb -l $REGION gs://$PROJECT-warehouse
♦ Create a new Cloud SQL instance that will later be used to host the Hive metastore :
gcloud sql instances create hive-metastore \ --database-version="MYSQL_5_7" \ --activation-policy=ALWAYS \ --gce-zone $ZONE
♦ Create the first Cloud Dataproc cluster:
gcloud dataproc clusters create hive-cluster \ --scopes sql-admin \ --image-version 1.3 \ --initialization-actions gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh \ --properties hive:hive.metastore.warehouse.dir=gs://$PROJECT-warehouse/datasets \ --metadata "hive-metastore-instance=$PROJECT:$REGION:hive-metastore"
♦ Copy the sample dataset to your warehouse bucket:
gsutil cp gs://hive-solution/part-00000.parquet \ gs://$PROJECT-warehouse/datasets/transactions/part-00000.parquet
♦ Create an external Hive table for the dataset:
gcloud dataproc jobs submit hive \ --cluster hive-cluster \ --execute " CREATE EXTERNAL TABLE transactions (SubmissionDate DATE, TransactionAmount DOUBLE, TransactionType STRING) STORED AS PARQUET LOCATION 'gs://$PROJECT-warehouse/datasets/transactions';"
♦ Run the following simple HiveQL query to verify that the parquet file is correctly linked to the Hive table:
gcloud dataproc jobs submit hive \ --cluster hive-cluster \ --execute " SELECT * FROM transactions LIMIT 10;"
♦ It will display the following result :