If you’ve already signed up for Amazon Web Services (AWS) account, you can start using Amazon Athena immediately.
Sign Up for AWS
When you sign up for AWS, your account is automatically signed up for all services in AWS, including Amazon Athena. You are charged only for the AWS services that you use. When you use Amazon Athena, you use Amazon S3 to store your data. Amazon Athena has no AWS Free Tier pricing.
If you have an AWS account, skip to the next task. If you don’t have an AWS account, please use the following procedure to create one.
To create an AWS account
- Open http://aws.amazon.com/, and then choose Create an AWS Account.
- Follow the online instructions. Part of the sign-up procedure involves receiving a phone call & entering a PIN using the phone keypad.
Note your AWS Account ID, because you need it for the next task.
This tutorial guide you through using Amazon Athena to query data. You’ll create a table based on sample data stored or loaded data in Amazon Simple Storage Service, query the table, and check the results of the query.
The tutorial is using live resources, so you are charged for the queries which you run. You aren’t charged for the sample datasets available that you use, but if you upload your own data files to Amazon S3, charges do apply.
Step 1: Create a Database
You first need to create a database in Amazon Athena.
To create a database
- Open the Athena console.
- If this is your first time visiting the Amazon Athena console, you’ll go to a Getting Started page. Click on Get Startedto open the Query Editor. If it isn’t your first time, the Amazon Athena Query Editor opens.
- In an Athena Query Editor, you see a query pane with an example query. Start typing your query anywhere in the query pane.
4. To create a database named sacramento, enter the following CREATE DATABASE statement, and then choose Run Query:
CREATE DATABASE sacramento;
Confirm that the catalog displays refreshes and sacramento appears in the DATABASE list in the Catalog dashboard on the left side.
Step 2: Create a Table
Now that you have a database, you’re ready to create a table that’s based on the sample csv data file. You define columns that map to the data, specify how the data is delimited, & provide the location in Amazon S3 for the file.
To create a table
- Make sure that sacramento is selected for DATABASEand then choose New Query.
- In the query pane, enter the following CREATE TABLE statement to create table, and then choose Run Query:
You can query data in regions other than the region where you run Athena means it is not regions specific. Standard inter-region data transfer rates for Amazon Simple Storage Service S3 apply in addition to standard Athena charges. To reduce data transfer charges, replace myregion in s3://athena-examples-myregion/path/to/data/ with the region identifier where you run Athena, e.g. s3://athena-examples-us-east-1/path/to/data/.
The table crime_data is created and appears in the Catalog dashboard for your database.
Step 3: Query Data
Now that you have the crime_data table created in Athena based on the data in Amazon S3, you can run queries on the table and see the results in Athena.
To run a query
- Choose New Query, enter the following statement anywhere in the query pane, and then choose Run Query:
The next step, creating the table, is more interesting: not only does Athena create the table, but it also learns where and how to read the data from my S3 bucket. There’s plenty of work going on behind the hood on this one. I’ve decided to call the table crime_data.
By running above query table crime_data will create under sacramento database.
The EXTERNAL keyword on line one dictates that our data is originating from a source outwith the database, specified by the LOCATION parameter on line 15. This points to the crime-data folder from my S3 bucket. To assist Athena with its data parsing, we indicate how lines are separated, and how values are separated within a line, using the LINES DELIMITED BY and FIELDS DELIMITED BY phrases, respectively. Furthermore, we exclude the first line of our CSV file => consisting of header values =>using TBLPROPERTIES on lines 16–18.
Each attribute from lines 2–10 corresponds to one column in the CSV file, with the first, ts, being timestamps of form “MM/DD/YY HH:MI”. In cases where there are leading zeroes in MM, DD, or HH, these are omitted. In other words, 1am on January 21st would look like 1/21/06 1:00, whilst 10pm on January 21st would like like 1/21/06 22:00. Given that all data for this file is from January, all the MMs are “1”!
With both database sacramento and table crime_data created, it’s time to analyse the latter to our heart’s content with some SQL SELECT statements. For the sake of demonstrating how this works, let’s write a query to fetch all crimes reported between January 1st and January 2nd in 2006.
As ever with programming, timestamps and dates prove interesting to work with. Critical to dealing with this data is knowing which query engine Athena uses under the hood.
The answer? Presto, an open-source distributed query engine. Following Presto’s documentation on dates and times, we find ourselves parsing the provided timestamps with format specifiers in the form of %c/%e/%y %k: %i
- To pull the data which are available in s3 location you have to specify path
Choose the file type loaded in s3 and want to process in Athena
The next step is to run the select query once the data pulled by Athena
Go to the query editor of Amazon Athena
After heating Run Query result are following:
You can save the query to run again
You can also view the results of previous queries or queries that may take some time to complete. Choose History then either search for your query or choose View/Download to view or download the results of previous completed queries. This also displays the status of queries that are currently running. Query history is retained for 45 days for view the results.
Running this query produced the expected results within a couple of seconds:
You can save the output result in the other folder in s3
Accessing Amazon Athena
You can access Amazon Athena using the AWS Management Console, the Amazon Athena API and the AWS CLI.
Using the Console
You can use the AWS Management Console for Amazon Athena to do the following:
To create or select a database.
Create, view, and delete tables.
To Filter tables by starting to type their names.
To Preview tables and generate CREATE TABLE DDL for them.
Show table properties.
To Run queries on tables, save and format queries, and view query history.
To Create up to ten queries using different query tabs in the query editor. To open a new query tab, click the plus sign.
To Display query results, save, and export them.
Access the AWS Glue Data Catalog.
View and change settings, such as view the query result location, configure auto-complete, and encrypt query results.
In the right pane, the Query Editor displays an introductory screen that prompts you to create your first table. You can view your tables under Tables in the left pane.
Here’s a high-level overview of the actions available for each table:
To Preview tables– View the query syntax in the Query Editor on the right.
Show properties– Show a table’s name, its location in Amazon S3, input and output formats, the serialization (SerDe) library used, and whether the table has encrypted data.
Delete table– Delete a table.
Generate CREATE TABLE DDL– To Generate the query behind a table and view it in the query editor.
Using the API
Amazon Athena enables application programming for Athena. For more information, see Amazon Athena API Reference. The latest AWS SDKs include support for the Athena API.
For examples of using the AWS SDK for Java with Athena.
Using the CLI
You can access Amazon Athena using the AWS CLI.
#Last but not least, always ask for help!