Blog

Loading Public Data into Big Query

Accessing Public Datasets in BigQuery

 

Prerequisites

 

GCP account

Open the Cloud Console.

Open Menu > Big Query > SQL Workspace

Click on Add Data > Explore Public Data sets.

Type NYC bike.

It will show New York citi bike trips data. Click it.

Click on View dataset.

The dataset will be added.

Expand the added data set

Goto new_york_citibike > citibike_trips

Click on Schema. It will show the schema of table.

Click on Details. It will show the details of the  Table.

Click on Preview. It will show the table.

Click on Compose new Query.

Paste the below code in query editor.

SELECT

   MIN(start_station_name) AS start_station_name,

   MIN(end_station_name) AS end_station_name,

   COUNT(tripduration) AS num_trips

FROM

   bigquery-public-data.new_york_citibike.citibike_trips

WHERE

   start_station_id != end_station_id

GROUP BY

   start_station_id,

   end_station_id

ORDER BY

   num_trips DESC

LIMIT

   10

Click RUN

It will show the Results of query.

Click on Query history. It will show the query history.

Click on Job Information.

It will show the job description.

Click on Execution Details. It will Show the execution details

To save Query,

Click on Save Query button.

Give the query name.

Choose the visibility mode. Click Save.

To see the Query settings, Click on More > Query Settings.

In Query settings, you can choose the engine for query.

You can change the destination for your query result. Instead of temporary table, you can choose a table in the project itself.

Select the destination table for query dataset.

Choose the project and dataset.

Give the table name and write preference.

Click Save.

Write the below query.

WITH

   trip_distance AS (

SELECT

   bikeid,

   ST_Distance(ST_GeogPoint(s.longitude,s.latitude),

   ST_GeogPoint(e.longitude,e.latitude)) AS distance

FROM

   bigquery-public-data.new_york_citibike.citibike_trips,

   bigquery-public-data.new_york_citibike.citibike_stations as s,

   bigquery-public-data.new_york_citibike.citibike_stations as e

WHERE

   start_station_id = s.station_id

   AND end_station_id = e.station_id)

SELECT

   bikeid,

   SUM(distance)/1000 AS total_distance

FROM

   trip_distance

GROUP BY

   bikeid

ORDER BY

   total_distance DESC

LIMIt

   5

Click Run.

The result of query will be displayed in query results. It will be saved into new table named table_query_result

SELECT

   wx.date,

   wx.value/10.0 AS prcp

FROM

   bigquery-public-data.ghcn_d.ghcnd_2015 AS wx

WHERE

   id = ‘USW00094728’

   AND qflag IS NULL

   AND element = ‘PRCP’

   ORDER BY

   wx.date

Click Run.

The query results will be displayed.

0
0

Creating a function in BigQuery

User-defined functions in BigQuery

 

User-defined functions (UDFs) in BigQuery represent a pivotal tool in the arsenal of data analysts and engineers, facilitating the customization and enhancement of data processing pipelines within Google’s powerful data warehouse solution. UDFs empower users to extend the functionality of BigQuery by encapsulating custom logic into reusable code snippets, thereby enabling complex transformations, calculations, and analyses that may not be achievable through standard SQL queries alone.

At its core, a UDF is a piece of code authored by the user, typically written in languages such as JavaScript or SQL, which can be invoked within BigQuery SQL queries to perform specific operations on data. These functions can range from simple mathematical computations to intricate data manipulations, providing a versatile mechanism for tailoring data processing workflows to meet the unique requirements of each use case.

UDFs in BigQuery offer several advantages, including increased flexibility, code reusability, and enhanced productivity. By encapsulating complex logic into modular functions, analysts can streamline query development, improve code maintainability, and foster collaboration across teams. Additionally, UDFs enable users to abstract away implementation details, promoting a higher level of abstraction and readability in SQL queries.

Furthermore, UDFs empower data practitioners to leverage their domain expertise and implement custom business rules directly within BigQuery, thereby democratizing data analysis and fostering innovation within organizations. Whether it’s aggregating proprietary metrics, parsing unstructured data formats, or implementing advanced statistical algorithms, UDFs provide a powerful mechanism for extending BigQuery’s capabilities and unlocking new possibilities in data analytics.

Prerequisites

GCP account

Paste the below code in query editor.

CREATE TEMP FUNCTION multiplyput(x FLOAT64, y FLOAT64)

RETURNS FLOAT64

LANGUAGE js AS “””

     return x*y;

“””;

WITH numbers AS

     (SELECT 1 AS x,5 AS y

     UNION ALL

     SELECT 2 AS x, 10 AS y)

SELECT x, y, multiplyput(x,y) AS product

FROM numbers;

Click Run.

It will give the output.

 

User-defined functions in BigQuery

0
0

Executing queries of multiple GB of data in seconds

Analyze massive datasets with BigQuery

 

In the era of burgeoning digital information, the ability to efficiently process and derive insights from vast datasets is paramount. Enter BigQuery, Google’s fully managed, serverless data warehouse solution designed to tackle the challenges of analyzing massive datasets with unprecedented speed and scalability.

 

This introduction will explore the foundational principles and capabilities of BigQuery, empowering organizations to unlock the full potential of their data assets. From its seamless integration with Google Cloud Platform to its SQL-like querying capabilities and real-time data analytics features, BigQuery offers a robust framework for businesses to glean actionable intelligence swiftly and cost-effectively. Moreover, its inherent scalability allows for seamless expansion as data volumes grow, ensuring sustained performance and reliability. Through this exploration, we will delve into the mechanisms that underpin BigQuery’s efficiency, its practical applications across various industries, and the transformative impact it can have on data-driven decision-making processes. Join us as we embark on a journey to harness the power of BigQuery and unlock new insights from vast datasets.

Prerequisites

 

GCP account

Open Console.

Open Menu > Big Query > SQL Workspace.

 

In query editor. Paste the below code.

SELECT

     *

FROM

     bigquery-samples.wikipedia_benchmark.Wiki10B

LIMIT

     5

And click run.

Within seconds we will get the output. This query processed 692 GB in less than a second.

NB : We are accessing public dataset provided by google. Processing speed depends on networks.

Paste the code into query.

SELECT

     language,

     title,

     SUM(views) AS views

FROM

     bigquery-samples.wikipedia_benchmark.Wiki10B

WHERE

     title LIKE ‘%Google%’

GROUP BY

     language,

     title

ORDER BY

     views DESC;

then press Run.

It will execute 425 GB of data within 8.3 seconds.

Paste the below code in query

SELECT

     language,

     title,

     SUM(views) AS views

FROM

     ‘bigquery-samples.wikipedia_benchmark.Wiki100B’

WHERE

     title LIKE ‘%Google%’

GROUP BY

     language’

     title

ORDER BY

     views DESC;

click Run

It will execute 4.1 TB of data in 47.5 seconds.

 

 

 

0
0

Working with Data Studio

BigQuery user-defined functions

 

Prerequisites

GCP account.

It is an online tool to transform our data into reports, dashboards, and so on that are easy to read and share.

Click on the below link to open data studio.

https://datastudio.google.com

Click on Blank report

Enter the country and company name.

Tick the I agree of terms of service. Then Click Continue

Click Yes to all then Continue

Select Big Query.

Click Authorize.

Choose the Project > Dataset > Table.

Click Add

Sometimes public dataset we used in project may not be visible. In that time Add the table into our project BigQuery dataset and you can use it.

Open the table in BigQuery.

Click Copy table.

Give the Project name & dataset name.

Give the table name and press copy.The table will be copied into out project dataset.

When adding the table into Data Studio, Click on Add to report.

The table will be added to the Data Studio. In right side of Data Studio, we can see the table data.

To make the Charts,

Click on Add a Chart. Choose which type chart you need.

Just hover your cursor and click in someplace you want to add the chart. You can move it later.

If you want to create a chart, just drag the field into that graph. It will automatically make the chart.

Data Studio is more similar to google sheets. All the fuctionalities we can get in google sheets and google docs, we will get in here.

change the name of report

add the chart, control and other functionalities also.

 share the report with others also.

 

BigQuery user-defined functions

0
0

Scala – Var,Args

Understanding the VarArgs in Scala

 

Var-Args Functions(Methods) implies taking Variable Number of Arguments. As a Java Developer, We have effectively tasted use of Var-Args strategies in Java-based Applications.

Like Java, Scala likewise upholds “Variable Number Of Arguments” to Functions(Methods). Scala adheres to same guidelines like Java’s Var-Args Methods with some language structure changes.

In Scala, VarArgs (variable-length arguments) provide a convenient way to define functions that accept a variable number of arguments of the same type. VarArgs allow functions to be with any number of arguments, including none, making them flexible and versatile.

To declare a VarArgs parameter in Scala, use the syntax args: Type*, where Type is the type of the arguments accept. Within the function body, args behaves like a sequence (Seq) of elements of type Type, allowing you to iterate over the arguments using standard collection operations.

When invoking a function with VarArgs, you can pass zero or more arguments of the specifiey type, which will be automatically wrapp into a sequence by the Scala compiler. For example, a function sum(numbers: Int*) can be with sum(1, 2, 3) to calculate the sum of the provid integers.

Under the hood, VarArgs are implemented using arrays (Array) in Scala, providing efficient performance for functions with a varying number of arguments.Understanding the VarArgs in Scala

use case 1:

// Scala program of varargs
object prwatech {
  // Driver code
    def main(args: Array[String])
    { 
              
        // Calling the function 
        println("Sum is: " + sum(2, 3, 100, 200, 30)); 
    } 
      
      
    // declaration and definition of function 
    def sum(a :Int, b :Int, args: Int *) : Int =
    {
        var result = a + b
      
        for(arg <- args)
        {
            result += arg
        }
      
        return result
    }
}

output:

Sum is: 335

use case 2:

// Scala program of varargs
object prwatech {
 // Driver code
    def main(args: Array[String]) 
    {
      
        // calling of function     
        printPrwa("Prwatech", "Data", "Science")
  
    }
  
    // declaration and definition of function
    def printPrwa(strings: String*) 
    {
        strings.map(println)
    }
}

output:

Prwatech
Data
Science

use case 3:

object prwatech {
def main(args: Array[String]) {
      printStrings("Hello", "Scala", "Python");
   }
   
   def printStrings( args:String* ) = {
      var i : Int = 0;
      
      for( arg <- args ){
         println("Arg value[" + i + "] = " + arg );
         i = i + 1;
      }
   }
}

output:

Arg value[0] = Hello
Arg value[1] = Scala
Arg value[2] = Python
0
0

Scala – Reduce

Reduce function is applied on collection arrangement in scala that contains lists, sets, maps, sequence and tuples. Parameter within the reduce function may be a boolean operation which merges all elements from the gathering and returns one value. the primary two values is combined with the boolean operation and therefore the resultant of that operation combines with subsequent value of the gathering and atlast we obtain one value.

use case 1:

object prwatech {
// Main method
    def main(arg:Array[String])
    {
        // initialize a sequence of elements
        val seq_elements: Seq[Double] = Seq(1.5, 2.0, 0.5)
        println(s"Elements = $seq_elements") 
  
        // find the sum of the elements
        // using reduce function
        val sum: Double = seq_elements.reduce((a, b) => a + b)
        println(s"Sum of elements = $sum")
    }   
}

output:

Elements = List(1.5, 2.0, 0.5)
Sum of elements = 4.0

use case 2:

// Creating object
object prwatech {
 // Main method
    def main(arg:Array[String])
    {
        // initialize a sequence of elements
        val seq_elements : Seq[Double] = Seq(2.1, 2.0, 1.1)
        println(s"Elements = $seq_elements")
  
        // find the maximum element using reduce function
        val maximum : Double = seq_elements.reduce(_ max _)
        println(s"Maximum element = $maximum")
  
        // find the minimum element using reduce function
        val minimum : Double = seq_elements.reduce(_ min _)
        println(s"Minimum element = $minimum")
    }
}

output

Elements = List(2.1, 2.0, 1.1)
Maximum element = 2.1
Minimum element = 1.1

use case 3:

object prwatech {
// Main method
    def main(arg:Array[String])
    {
        // initialize a sequence of elements
        val seq_elements: Seq[Double] = Seq(1.5, 2.0, 0.5)
        println(s"Elements = $seq_elements") 
  
        // find the Multiplication of the elements
        // using reduce function
        val Mult: Double = seq_elements.reduce((a, b) => a * b)
        println(s"Multiplication of elements = $Mult")
    }   
}

output:

Elements = List(1.5, 2.0, 0.5)
Multiplication of elements = 1.5
0
0

Scala – Placeholder Syntax

How to use placeholder syntax in scala

 

The requirement for making everything brief lead Scala to raise something many refer to as the Placeholder sentence structure. Scala permits the utilization of underscore (meant as ‘_’) to be utilized as placeholders for at least one boundaries. we can consider the underscore to something that should be filled in with a worth.

In Scala, the placeholder syntax is a concise way to define anonymous functions, especially useful when passing functions as arguments to higher-order functions like map, filter, or reduce. Placeholder syntax allows developers to write compact and readable code by omitting explicit function parameters and using underscores (_) to represent placeholders for parameters.

For example, when using map on a collection, instead of explicitly defining a named function with a single parameter, you can use placeholder syntax like collection.map(_ * 2) to double each element of the collection. Here, _ represents the parameter passed to the anonymous function.

Placeholder syntax is particularly handy for simple, one-line transformations where the function logic is concise and easily understood. It promotes functional programming principles by emphasizing the behavior of functions rather than their specific parameter names.

When using placeholder syntax, it’s important to note that each underscore (_) represents a different parameter position in the function. For multiple parameters, you can use consecutive underscores (_ + _) to represent each parameter sequentially.

Use case 1:

scala> val somenumbers= List(1,2,3,4,5)
scala>  somenumbers.foreach(println _)

2Use case 2:

Suppose we want to check if the number is greater than 5.
scala> val somenumbers= List(3,4,5,8,10,9)
scala> somenumbers.filter(_ > 5)

Use case 3:

Suppose we want to add two numbers.
scala> val f = (_: Int) + (_: Int)
scala> f(15, 10)
0
0

Scala – Lazy Values

Guide to lazy val in Scala

 

In Scala Vals and Lazy vals are present . lazy keyword changes the val to get lazily initialized. Lazy initialization means that whenever an object creation seems costly, the lazy keyword can be stick before val. This gives it the advantage to get initialized in the first use i.e. the expression inbound is not evaluated immediately but once on the first access.

 

In Scala, lazy val is a language feature that delays the initialization of a value until it is accessed for the first time. This allows for deferred computation of values, improving performance and memory usage in scenarios where the value might not always be needed.

When a lazy val is defined, its initialization expression is not evaluated immediately during object creation like a regular val. Instead, the initialization expression is evaluated only when the lazy val is first accessed, and the computed value is stored for subsequent accesses.

The use of lazy val is particularly beneficial for initializing values that are expensive to compute or involve complex initialization logic. By deferring initialization until the value is actually needed, unnecessary computations are avoided, leading to more efficient resource utilization.

It’s important to note that lazy val is thread-safe by default, ensuring that the initialization expression is evaluated exactly once, even in concurrent environments.

use case 1:

object prwatech {
// Main method 
    def main(args:Array[String]) 
    { 
        lazy val prwatech = {
               
            println ("Initialization for the first time")
            2
        }
        // Part 1
        println(prwatech)
           
        // Part 2
        print(prwatech)
    }   
}
Output 
Initialization for the first time
2
2

In the code above ‘prwatech’ was a lazy val and so for the first when it is accessed, it returned

Initialization for the first time

2

But for the second time when it is printed, it only returned

2

Because it is a cached result.

use case 2:

object prwatech {
 // Main method 
    def main(args:Array[String]) 
    { 
        var a = 3
           
        def fun() = { 
            a += 1; 
            a 
        }
           
        lazy val prwatech = Stream.continually( fun() )
           
        (prwatech take 5) foreach {
            x => println(x)
        }
    } 
}

output:

4
5
6
7
8

use case 3:

scala>var x = { println("x"); 15 }
scala>lazy val y = { println("y"); x + 1 }
scala>println("-----")
scala>x = 17
scala>println("y is: " + y)

 

Guide to lazy val in Scala

0
0

Scala – Empty Values

Dealing with Null Values in Scala

 

The Empty values in Scala are addressed by Null, null, Nil, Nothing, None, and Unit. The explanation of these vacant qualities are as per the following:

null: The reference types, for example, Objects, and Strings can be invalid and the worth kinds like Int, Double, Long, and so forth, can’t be invalid, the invalid in Scala is practically equivalent to the invalid in Java.

Null: it is a Trait, which is a subset of every one of the reference types however isn’t at all a sub-kind of significant worth sorts and a solitary occurrence of Null will be invalid. The reference types can be alloted invalid yet the worth kinds can’t be allocated invalid.Dealing with Null Values in Scala

Use case 1:


object prwatech {
  // Main method
    def main(args: Array[String]) 
    {
        // Method that takes a parameter of type Null
        def usingnull(thing: Null): Unit = 
        { 
            println("Prwatech"); 
        }
  
        /*error: type mismatch;
  
        found   : java.lang.String("hey")
  
        required: Null*/
        //usingnull("hey")
  
        // passing null itself
        usingnull(null)
    }
}

output:

Prwatech

Here, method usingnull consists a parameter of type Null, here we can only pass two things. null itself or a reference of type Null. when a string is passed as argument it didn’t work and generated an error.

Nothing: Nothing is likewise a Trait, which has no occurrences. It is a subset of every one of the unmistakable sorts. The significant intention of this Trait is to supply a return type for the strategies which reliably tosses a special case i.e, not so much as a solitary time returns for the most part. It is additionally useful in giving a sort to Nil.

Unit: The Unit is Scala is similar to the void in Java, which is use as a return sort of a capacities that is utilize with a capacity when the expresse capacity doesn’t brings anything back.

Use case 2:

object prwatech {
  // Main method
    def main(args: Array[String]) 
    {
        // Method return type is unit
        def printNumber(num: (Int) => Unit) = 
        {
  
            num(1); 
            num(2); 
            num(3);
        }
          
        printNumber(println)
    }
}
1
2
3

Here, method printNumber takes a parameter called num, which has a type of (Int) => Unit. This means that num is a method that contains a single parameter of type Int. method printNumber return type of Unit, which means num should not return a value.

Nil: Nil is Consider as a List which has zero components in it. The kind of Nil is List[Nothing] and as express over, that Nothing has no occurrences, we can have a List which is affirm to be devastate.

Use case 3:

object prwatech {
 // Main method
    def main(args: Array[String]) 
    {
  
        // Displays empty list
        println(Nil)
    }
}

Output:

List()

Thus, we can see that an empty list is returned.
0
0

Scala – options

Scala Option: A Gentle Introduction

 

Scala Option[ T ] is a compartment for nothing or one component of a given kind. An Option[T] can be either Some[T] or No object, which addresses a missing worth. For example, the get technique for Scala’s Map produces Some(value) if a value comparing to a given key has been or None if the given key isn’t mention in the Map.

Scala’s Option type is a fundamental concept in functional programming that addresses the challenge of representing absence of a value in a type-safe manner. Option is used to encapsulate an optional value that may or may not exist, providing a safer alternative to null references commonly used in other languages.

The Option type in Scala is defined as a generic container that can hold either a Some value, representing the presence of a value, or None, representing the absence of a value. This approach eliminates the risk of null pointer exceptions and encourages safer and more predictable code.

Use case 1:

object prwatech {
   // Main method
    def main(args: Array[String])
    {
  
        // Creating a Map
        val name = Map("Abhi" -> "author", 
                        "Prabha" -> "coder")
  
        // Accessing keys of the map
        val x = name.get("Abhi")
        val y = name.get("Sandeep")
  
        // Displays Some if the key is
        // found else None 
        println(x)
        println(y)
    }
}	

Output:

Some(author)
None

Here, key of the value Abhi  so, Some is return for it but key of the value Sandeep so, None is return for it.

Use case 2:

Using Pattern Matching:
object prwatech {
   // Main method
    def main(args: Array[String])
    {
  
        // Creating a Map
        val name = Map("Abhi" -> "author", 
                        "Prabha" -> "coder")
  
        //Accessing keys of the map
        println(patrn(name.get("Abhi")))
        println(patrn(name.get("Rahul")))
    }
  
    // Using Option with Pattern 
    // matching
    def patrn(z: Option[String]) = z match 
    {
  
        // for 'Some' class the key for
        // the given value is displayed
        case Some(s) => (s)
  
        // for 'None' class the below string 
        // is displayed
        case None => ("key not found")
    }
}

Output:

author
key not found

Use case 3:

getOrElse() Method:

This technique returns either a value in the event that is available or a default value when its not available. Here, For Some class a value is return and for None class a default value returns.Scala Option: A Gentle Introduction

object prwatech {
   // Main method
    def main(args: Array[String])
    {
  
        // Using Some class
        val some:Option[Int] = Some(5)
  
        // Using None class
        val none:Option[Int] = None 
  
        // Applying getOrElse method
        val x = some.getOrElse(0)
        val y = none.getOrElse(1)
  
        // Displays the key in the 
        // class Some
        println(x)
  
        // Displays default value 
        println(y)
    }
}

Output:

5
1
0
0

Quick Support

image image