data fusion techniques Archives - Prwatech

Share Ideas, Start Something Good.

Home
Courses
- Big Data Hadoop & Spark Scala
- Big Data & Hadoop
Interview Questions
- Data Science Interview Questions
- Python Interview Questions
Quizzes
- Hadoop Quizzes
- Spark Quizzes
Blog

Home
Courses
- Big Data Hadoop & Spark Scala
- Big Data & Hadoop
Interview Questions
- Data Science Interview Questions
- Python Interview Questions
Quizzes
- Hadoop Quizzes
- Spark Quizzes
Blog

Blog
Video

Data Fusion with BigQuery

date 29th May, 2021 |
by Prwatech |
0 Comments

Cloud Data Fusion and BigQuery for data integration

Prerequisites

GCP account

Open Console.

Click on Activate cloud shell

$ echo $GOOGLE_CLOUD_PROJECT #Display the Project ID

$ gsutil cp gs://cloud-training/OCBL017/ny-taxi-2018-sample.csv gs://$GOOGLE_CLOUD_PROJECT

#Copying file from cloud training public bucket into our bucket

$ gsutil mb gs://$GOOGLE_CLOUD_PROJECT-temp #Creating bucket

Open Menu > Cloud Storage > Browser

The temporary bucket is created and the file is copied in main bucket.

Open Menu > Data Fusion

Click on View Instance. It will open the Data Fusion page.

Click No thanks.

Click on Wrangler

In Default Cloud Storage > Open the bucket which we copied the file.

Select the copied file.

Click on drop down arrow.

Click Parse > CSV

Select comma and tick set first row as header.

Click Apply

In drop down of body column, Click delete column

In Trip distance drop down > change data type to Float

open Filter > Custom condition

Give Condition as >0.0

Click Apply

In right side of table, we can see the transformations which we did in our table

Then Click on Create a pipeline

Click on Batch Pipeline.

It will create a Pipeline.

properties in Wrangler

The modifications done in table will be already mentioned in here

In right side, output schema is visible. In that delete column named extra.

Validate.

Click on close if no errors found.

Open Console > BigQuery

Click Create dataset.

Give dataset name. Click Create Dataset.

Click on More > Query Settings

Select Set destination table for query results.

Give Project name,Dataset name, and table name.

Select Write if empty or Overwrite table and press save.

Click save

In Query Editor, Paste the below code

SELECT

zone_id,

zone_name,

borough

FROM

bigquery-public-data.new_york_taxi_trips.taxi_zone_geom

Click Run

The output will be saved in the created table.

Go back to Data Fusion

Source > BigQuery

Properties in BigQuery.

Reference name.

Give the dataset name and table name which we created.

Click Get schema

The Output Schema will be displayed in Right side.

Temporary bucket name.

Click on Validate.

if no errors found.

Click Analytics > Joiner

Drag the arrow into joiner from both BigQuery and Wrangler.

Click on properties in joiner

Jointype : inner

Join condition

Wrangler : pickup_location_id

BigQuery : zone_id

Click on Get Schema.

In output schema remove pickup_location_id & zone_id

remove zone_id.

Validate.

close if no errors found.

Sink > bigQuery

properties in second big query.

Give Reference name, dataset, table name.

Give the temporary bucket name which we created.

Click on Validate.

close if no errors found

Now drag the arrow from Joiner

rename the pipeline. Click save

Save.

Deploy

It will deploy pipeline

Click run.

It will Start Execution. This execution will take arround 10-15 minutes.

Execution is Provisioning

Starting

Running

Finished Successfully.

Go to BigQuery. In the dataset, new table will be created.

In the data Fusion, Click Logs.You can see the logs of Execution.

Cloud Data Fusion and BigQuery for data integration

Popular Tags:
BigQuery, bigquery console, bigquery documentation, BigQuery SQL, bigquery tutorial, data fusion documentation, data fusion examples, data fusion techniques, data fusion tutorial, GCP, GCP BigQuery, GCP bucket, gcp certification, gcp cloud console, gcp course, Google Cloud, google cloud certification, google cloud console, google cloud courses, Google Cloud Platform, google cloud platform tutorial, google cloud training

[ssba-buttons]

Generic selectors

Exact matches only

Search in title

Search in content

Post Type Selectors

Filter by Categories

Amazon Athena

Amazon DynamoDB

Amazon EC2

Amazon ElastiCache

Amazon EMR

Amazon RDS

Amazon Redshift

Amazon Route 53

Amazon Simple Queue Service (SQS)

Amazon Simple Storage Service (S3)

Amazon Virtual Private Cloud (VPC)

Apache Spark

Apache Spark CASSANDRA

Apache Spark INTRODUCTION

Apache Spark KAFKA

Apache Spark RDD

Apache Spark SCALA

Apache Spark SQL

AWS

AWS Elastic Load Balancing

AWS Identity and Access Management (IAM)

AWS Introduction

AWS Lambda

AZURE

Azure Architectural Components

Azure Compute Services - 2

Azure Compute Services Part - 1

Azure networking services

Azure Serverless Technologies

Azure Storage Services

BIGQUERY

Cassandra

Cassandra Modules

Cloud fundamental concepts - 2

Cloud Fundamental Concepts -1

CLOUD SQL

CLOUD STORAGE

Data Science

Data Science Modules

DATAFLOW

DATAPROC

Elasticsearch

Elasticsearch Modules

Getting Started with Cloud

GitHub

GitHub Modules

GitLab

GitLab Modules

Golang

Golang Modules

Google Cloud Platform

Hadoop

Hadoop Admin

Hadoop Admin Modules

HBase

HDFS

HIVE

IAM

INSTANCE

Interview Questions

Interview Questions

KUBERNETES

Linux

LINUX ARCHITECTURE

LINUX COMMANDS

LINUX FILE SYSTEM

LINUX INTRODUCTION

Machine Learning

MACHINE LEARNING MODULES

MapReduce

Module 1

Module 2

Module 3

Module 4

MySQL

MYSQL BASIC COMMANDS

MySQL DATA TYPES

MySQL INTRODUCTION

NETWORKING

Pig

Power BI

PUBSUB

Python

PYTHON DATA TYPES

PYTHON DATA VISUALIZATION

PYTHON EXCEPTION HANDLING

PYTHON FILE HANDLING

PYTHON FUNCTIONS

PYTHON INTRODUCTION

PYTHON NumPy

PYTHON OOPS

PYTHON PANDAS

PYTHON VARIABLES

R Programming

R Programming

Scala

Scala Modules

Software Installation

Software Installation Modules

Sqoop

Statistics

Statistics Modules

Tableau

Tableau Calculations

Tableau Charts

Tableau Filters

Tableau Formatting

Tableau Groups

Tableau Introduction

Tableau Parameters

Tableau Sets

Troubleshooting

Troubleshooting Modules

YARN

Recent Blogs

Storage account and container Hands on

Prwatech

How to create a Storage account and container, and upload the file into the container.…More

Azure Storage Redundancy(High Availability)

Prwatech

Azure Storage Redundancy(High Availability) Azure Storage offers several redundancy options to ensure the…More

Azure Storage Types

Prwatech

Azure Storage Types Azure provides several storage services that cater to different needs and…More

Virtual Network service endpoints

Prwatech

Service Endpoints Provides secure and direct connectivity to Azure services use optimized route over…More

Azure Virtual Network Hands-on

Prwatech

Azure Virtual Network Hands-on …More

Take a Big Step in Your Career

About Us

Training lays the foundation for an engineer. It provides a strong platform to build ones perception and implementation by mastering a wide range of skills . One of India’s leading and largest training provider for Big Data and Hadoop Corporate training programs is the prestigious PrwaTech.

Popular Courses

Home
Workshop Bigdata and Hadoop
Apache Spark Scala Training
Hadoop Trainee Reviews
Contact Us
Terms, Conditions & Privacy

Find Us In Bangalore

PRWATECH Address: Sri Krishna No 22, 3rd floor, 7th cross, 1 B main BTM 2nd Stage, Near Canara bank colony, Bangalore 76
Land Line no : 8043773819
Mobile no :+91 8147111254
Mail ID : hello@prwatech.com

Find Us In Pune

PRWATECH Address: 201, 2nd floor global business Hub, kharadi, Pune, Land Mark: Opposite EON IT PARK Pune : 411014 Maharashtra India
Land Line no : 8043773819
Mobile no :+91 8147111254
Mail ID : hello@prwatech.com

Prwatech

Prwatech © 2019. All Rights Reserved

Subscribe

Quick Support

( + 880) 01737 488 440
periodicitems@gmail.com
Skype: md.shahin6264
Support Time: 9am-4 am
Monday - Sunday