Cloud Composer and Airflow Integration
Prerequisites
GCP account
Open Console
Open Menu > Cloud Storage > Browser
Click on Create Bucket
Create one bucket with same name as the project ID. Click create
The bucket will be created.
In Composer, Click on Airflow
Choose the login
DAG Airflow will be opened.
Go to Menu > Kubernetes Engine > Clusters
The cluster has been created.
In Airflow, Go to Admin > Variables
Click on Create.
Key Val
gcp_project <project-ID>
gcs_bucket gs://<bucket-name>
gce_zone <zone of cluster>
Do these one by one in Key and Val. And press Save and Add Another
In last one Press save
The key and Value will be added.
Open Composer.
Click on DAGs Folder
Copy the path.
Click on Activate Cloud Shell
Paste the below code in shell. In DAG path paste the copied DAG path and press Enter
$ gsutil cp gs://cloud-training/datawarehousing/lab_assets/hadoop_tutorial.py gs://<paste the DAG path>
It will copy hadoop_tutorial.py into Cluster bucket
In Airflow, click on the DAG.
Hover the curser to each one. You can see the details.
Click any one of it.
Press View Log.
You can see the log for the Execution.
Go to Bucket. Open the Bucket which we created.
file saved.
If it is not Executed, Open the Airflow > composer_hadoop_tutorial.
Trigger DAG
Trigger
Graph view. Here you can see the execution.
The below colors shows the execution state.
Its runnning create_dataproc_cluster.
Open Menu > Dataproc > Clusters
cluster created.
Now the green border is on run_dataproc_hadoop. It is executing the content
Then it changes to delete_dataproc_cluster. It will delete the cluster.
Check the cluster in dataproc. deleted.
Open the Dataproc > jobs. Open the job.
In Airflow click on Code.
To delete the composer environment , Click on Delete.
Press Delete
Cloud Composer and Airflow Integration