Hadoop GCP with Hive

  • date 4th April, 2019 |
  • by Prwatech |
  • 0 Comments

Hadoop GCP with Hive

Welcome to the world of GCP Hadoop Hive Tutorials. In this tutorial, one can explore  Advanced Tutorials on GCP Hadoop Hive which was designed by Big Data Training Institute Experts. Learn the Advanced GCP Tutorials and Course under the certified Experts of Big Data Training Institute and Be a pro with Advanced Big Data Certification. Don't Just Dream to become the certified Pro-Hadoop Developer, Achieve it Choosing the World Class Trainer who can help you learn the course from 0 Level to Advanced Level like a Pro.

GCP Hadoop Hive Tutorials

♦ Start GCP instance terminal : GCP Hadoop Hive   ♦ In Cloud Shell, set the default Compute Engine zone to the zone where you are going to create your Cloud Dataproc clusters.
export REGION=us-central1
export ZONE=us-central1-a
gcloud config set compute/zone $ZONE
GCP Hadoop Hive   ♦ Enable the Cloud Dataproc and Cloud SQL Admin APIs by running this command in Cloud Shell:
gcloud services enable dataproc.googleapis.com sqladmin.googleapis.com
GCP Hadoop Hive   ♦ Create a warehouse bucket that will host the Hive data and be shared by all Hive servers :
export PROJECT=$(gcloud info --format='value(config.project)')
gsutil mb -l $REGION gs://$PROJECT-warehouse
GCP Hadoop Hive   ♦ Create a new Cloud SQL instance that will later be used to host the Hive metastore :
gcloud sql instances create hive-metastore \
    --database-version="MYSQL_5_7" \
    --activation-policy=ALWAYS \
    --gce-zone $ZONE
GCP Hadoop Hive   ♦ Create the first Cloud Dataproc cluster:
gcloud dataproc clusters create hive-cluster \
    --scopes sql-admin \
    --image-version 1.3 \
    --initialization-actions gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh \
    --properties hive:hive.metastore.warehouse.dir=gs://$PROJECT-warehouse/datasets \
    --metadata "hive-metastore-instance=$PROJECT:$REGION:hive-metastore"
GCP Hadoop Hive   ♦ Copy the sample dataset to your warehouse bucket:
gsutil cp gs://hive-solution/part-00000.parquet \
gs://$PROJECT-warehouse/datasets/transactions/part-00000.parquet
GCP Hadoop Hive   ♦ Create an external Hive table for the dataset:
gcloud dataproc jobs submit hive \
    --cluster hive-cluster \
    --execute "
      CREATE EXTERNAL TABLE transactions
      (SubmissionDate DATE, TransactionAmount DOUBLE, TransactionType STRING)
      STORED AS PARQUET
      LOCATION 'gs://$PROJECT-warehouse/datasets/transactions';"
GCP Hadoop Hive   ♦ Run the following simple HiveQL query to verify that the parquet file is correctly linked to the Hive table:
gcloud dataproc jobs submit hive \
    --cluster hive-cluster \
    --execute "
      SELECT *
      FROM transactions
      LIMIT 10;"
GCP Hadoop Hive
♦ It will display the following result : GCP Hadoop Hive

Video Tutorial on Hadoop GCP

Quick Support

image image