Working With Dataflow using BigQuery

  • date 30th May, 2021 |
  • by Prwatech |
  • 0 Comments

Prerequisites

GCP account

Open Console.

Open Menu > BigQuery

In Query Editor, Paste the below query.

SELECT

 content

FROM

 fh-bigquery.github_extracts.contents_java_2016

LIMIT

 10

Click Run.

It will display the results.

Paste the below query.

SELECT

 COUNT(*)

FROM

 fh-bigquery.github_extracts.contents_java_2016

Click Run.

It will give the query result.

Click on activate cloud shell

$          git clone https://github.com/GoogleCloudPlatform/training-data-analyst

$          ls

Create bucket in console. Give bucket name as same as the project ID

In shell, execute the below command

$          BUCKET=”<bucket-name>”

$          echo $BUCKET

$          cd training-data-analyst/courses/data_analysis/lab2/python

$          ls

The files will be displayed

$          nano JavaProjectsThatNeedHelp.py

It will open the file. Ctrl + x to exit

$          python3 JavaProjectsThatNeedHelp.py –bucket $BUCKET –project $DEVSHELL_PROJECT_ID –DataFlowRunner

Go to DataFlow > Jobs The jobs will be running

Open the running job

See the dataflow running

After running, It will shown as succeeded

After execution, Go to bucket.

Open javahelp/ folder

The Result will be stored in it.

Quick Support

image image