Adding GCS file into BigQuery

  • date 29th May, 2021 |
  • by Prwatech |
  • 0 Comments

BigQuery data loading from Google Cloud Storage

 

Loading data from Google Cloud Storage (GCS) into BigQuery is a fundamental process that forms the backbone of many data ingestion pipelines within Google's comprehensive cloud ecosystem. This introduction will explore the intricacies of data loading from GCS to BigQuery, providing insights into the mechanisms, best practices, and benefits associated with this essential operation.

Google Cloud Storage serves as a scalable and durable object storage solution, ideal for storing a wide array of data types, including structured, semi-structured, and unstructured data. BigQuery, on the other hand, offers a powerful, fully managed data warehouse platform with lightning-fast SQL queries and built-in machine learning capabilities.

The process of loading data from GCS into BigQuery involves several steps, including defining the data schema, configuring the load job settings, and initiating the data transfer. Users can choose from a variety of file formats supported by both GCS and BigQuery, such as CSV, JSON, Avro, Parquet, and more, ensuring compatibility and flexibility in data loading operations.

Prerequisites

GCP account

Download file by pressing the below link.

https://storage.googleapis.com/cloud-training/OCBL013/nyc_tlc_yellow_trips_2018_subset_1.csv

This is a google cloud training data.

Open Console.

Add the downloaded file into Bucket.

Open Menu > BigQuery.

Open the Project.

Click on Create Dataset.

Give dataset ID. Then click Create Dataset.

Click on Create Table.

Select Source as Google Cloud Storage. Click Browse.

Choose the file from bucket. Click Select

The File will be selected with file format.

Choose the Destination table.

Give project name, dataset name and table name.

Click on auto detect. Click create table.

The table will be added.

Paste your query in query editor.

SELECT

*

FROM

     gcs_bq.nyc_trips

ORDER BY

     Fare_amount DESC

LIMIT

10;

Click Run.The Query will be executed.

Paste your query in query editor.

SELECT

*

FROM

     gcs_bq.nyc_trips

WHERE

     EXTRACT(Month

          FROM

          Pickup_datetime)=1

ORDER BY

     fare_amount DESC

LIMIT

10;

Click Run.The Query will be executed.

 

BigQuery data loading from Google Cloud Storage

Quick Support

image image