Working with Dataflow

  • date 30th May, 2021 |
  • by Prwatech |
  • 0 Comments

Prerequisites

GCP account

Open Console.

Click on activate cloud shell

$          git clone https://github.com/GoogleCloudPlatform/training-data-analyst

$          ls

Create bucket in console. Give bucket name as same as the project ID

In shell, execute the below command

$          BUCKET=”<bucket-name>”

$          echo $BUCKET

Open Menu > API services > Library

Search Dataflow. Click Dataflow API

Click Enable

$          cd training-data-analyst/courses/data_analysis/lab2/python

$          ls

The files will be displayed

$          nano install_packages.sh                         #open the file install_packages.sh

The file contents can be shown. This file is to install the components.

$          sudo ./install_packages.sh

To check python version

$          pip-V

$          pip3 -V

$          nano grep.py                                   Open the file grep.py and check the content

$          python3 grep.py

$          ls /tmp                       #It will display whether the file is executed or not.

$          cat /tmp/output-*             #It will display detailed output.

$          gsutil cp ../javahelp/src/main/java/com/google/cloud/training/dataanalyst/javahelp/*.java gs://$BUCKET/javahelp

Check the file is saved or not.

Open Menu > Cloud Storage.

Open Bucket.

The file will be copied or not.

$          echo $DEVSHELL_PROJECT_ID $         

echo $BUCKET

$          nano grepc.py

Edit the file.

PROJECT='<project_ID>’

BUCKET='<bucket_name>’

NB : If the Project ID and Bucket is same, we can give the same ID

To Save and exit. Press ‘Ctrl + X’. Press ‘Y’ and ‘Enter’

$          python3 grepc.py                          #Execute file grepc.py

Open Console >Dataflow > Jobs

Open the Job which is executed.

Click the Job Graph.

The Graph is displayed.

In Job Graph on right side you can see the Job info and resource metrics.

Open Shell.

$          ls

$          nano is_popular.py

It will open the file is_popular.py

$          python3 ./is_popular.py                       #To execute theis_popular.py file

$          cat /tmp/output-*                                     #Display the output

$          python3 ./is_popular.py  –output_prefix=/tmp/myoutput

$          nano /tmp/myoutput-00000-of-00001

It will open the file with output.

Open Menu > Cloud Storage.

Open Bucket.

Open javahelp/ folder

The outputs will be stored in it.

0
0

Quick Support

image image