This topic contains 6 replies, has 6 voices, and was last updated by  suchitra mohanty 2 years ago.

Viewing 7 posts - 1 through 7 (of 7 total)
  • Author
    Posts
  • #1873 Reply

    somu s
    Member

    1.What is Hive?
    2.What is Hive Metastore?
    3.What kind of datawarehouse application is suitable for Hive?
    4.How can the columns of a table in hive be written to a file?
    5.CONCAT function in Hive with Example?
    6.REPEAT function in Hive with example?
    7.TRIM function in Hive with example?
    8.REVERSE function in Hive with example?
    9.LOWER or LCASE function in Hive with example?
    10.UPPER or UCASE function in Hive with example?
    11.Double type in Hive – Important points?
    12.Rename a table in Hive – How to do it?
    13.How to change a column data type in Hive?
    14.Difference between order by and sort by in hive?
    15.RLIKE in Hive?
    16.Difference between external table and internal table in HIVE ?

    #1940 Reply

    sweety jain
    Participant

    please tell me the diffrence between hive and pig?if we have pig then why are we using hive?

    #1941 Reply

    sweety jain
    Participant

    ans1)hive is a product of facebook it is used to deal with the structured data.hive is basically a dataware house on which we can run query language ie hql(hive query language.

    ans2)metastore stores the metadata
    The metadata which metastore stores contains things like :
    IDs of Database
    IDs of Tables
    IDs of Index
    The time of creation of an Index
    The time of creation of a Table
    IDs of roles assigned to a particular user
    InputFormat used for a Table
    OutputFormat used for a Table etc.

    ans3)Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems.

    ans5)CONCAT function will concat the input strings. You can specify any number of strings separated by comma.
    Example:
    CONCAT (‘Hive’,’-‘,’performs’,’-‘,’good’,’-‘,’in’,’-‘,’Hadoop’);
    Output:
    Hive-performs-good-in-Hadoop
    So, every time you delimit the strings by ‘-‘. If it is common for all the strings, then Hive provides another command CONCAT_WS. Here you have to specify the delimit operator first.
    CONCAT_WS (‘-‘,’Hive’,’performs’,’good’,’in’,’Hadoop’);
    Output: Hive-performs-good-in-Hadoop
    (refrence:haddop tutorial https://www.facebook.com/HadoopTutorial/posts/454851761311295)

    #1948 Reply

    faizan0607
    Participant

    Answers –
    1. It is a data Warehousing package built on top of Hadoop which is used for data analysis, targeted towards users comfortable with SQL. HIve was developed by Facebook. It is used to process structured data.

    2.The metastore is service and database that can be configured in different ways. The default Hive configuration is that Hive driver, metastore interface and the db (derby) all use the same JVM. Metastore has all the table schema and partition details.

    3. Data Warehouse application which supports web and JDBC clients is suitable for hive, and also the one which is similar to SQL languages.

    5. CONCAT( string str1, string str2… )

    The CONCAT function concatenates all the stings.
    Example: CONCAT(‘hadoop’,’-‘,’hive’) returns ‘hadoop-hive’

    6. REPEAT( string str, int n )

    The REPEAT function repeats the specified string n times.
    Example: REPEAT(‘hive’,2) returns ‘hivehive’

    7. TRIM( string str )

    The TRIM function removes both the trailing and leading spaces from the string.
    Example: LTRIM(‘ hive ‘) returns ‘hive’

    8. REVERSE( string str )

    The REVERSE function gives the reversed string
    Example: REVERSE(‘hive’) returns ‘evih’

    9. LOWER( string str ), LCASE( string str )

    The LOWER or LCASE function converts the string into lower case letters.
    Example: LOWER(‘HiVe’) returns ‘hive’

    10. UPPER( string str ), UCASE( string str )

    The UPPER or LCASE function converts the string into upper case letters.
    Example: UPPER(‘HiVe’) returns ‘HIVE’

    11. Double data type in hive – An 8-byte (double precision) floating-point data type used in CREATE TABLE and ALTER TABLE statements.
    Range: 4.94065645841246544e-324d .. 1.79769313486231570e+308, positive or negative, The data type REAL is an alias for DOUBLE

    12. Renaming a table in Hive –
    Syntax :
    ALTER TABLE name RENAME TO new_name
    For example if we have to rename a table from employee to emp,
    hive> ALTER TABLE employee RENAME TO emp;

    13. To change a cloumn data type in hive –
    Syntax –
    ALTER TABLE name CHANGE column_name new_name new_type

    14. Order by

    The ORDER BY clause is used to retrieve the details based on one column and sort the result set by ascending or descending order.
    SELECT [ALL | DISTINCT] select_expr, select_expr, …
    FROM table_reference
    [WHERE where_condition]
    [GROUP BY col_list]
    [HAVING having_condition]
    [ORDER BY col_list]]
    [LIMIT number]

    Sort by

    Hive uses the columns in SORT BY to sort the rows before feeding the rows to a reducer. The sort order will be dependent on the
    column types. If the column is of numeric type, then the sort order is also in numeric order. If the column is of string type,
    then the sort order will be lexicographical order

    SELECT key, value FROM src SORT BY key ASC, value DESC

    15. RLIKE (Right-Like) is a special function in Hive where if any substring of A matches with B then it evaluates to true.
    It also obeys Java regular expression pattern. Users don’t need to put % symbol for a simple match in RLIKE

    Reference – https://www.facebook.com/HadoopTutorial/posts/456214807841657

    16. The main difference is that when you drop an external table, the underlying data files stay intact. This is because
    the user is expected to manage the data files and directories. With a managed table, the underlying directories and
    data get wiped out when the table is dropped.

    Reference – http://docs.qubole.com/en/latest/faqs/hive/difference-external-table-managed-table.html

    #1962 Reply

    1> Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
    2> Metastore means specifying to Hive where the database is stored.

    3> Hive is best suited for data warehouse applications, where a large data set is maintained and mined for insights, reports, etc.

    4> By using awk command in shell, the output from HiveQL (Describe) can be written to a file.
    hive -S -e “describe table_name;” | awk -F” ” ’{print 1}’ > ~/output.

    5> CONCAT function will concat the input strings. You can specify any number of strings separated by comma.
    CONCAT( string str1, string str2… )
    example
    CONCAT (‘Hive’,’-‘,’performs’,’-‘,’good’,’-‘,’in’,’-‘,’Hadoop’);

    Output:

    Hive-performs-good-in-Hadoop

    6>REPEAT( string str, int n )
    example
    REPEAT( ‘hadoop’,3 );
    output
    hadoophadoophadoop

    7> TRIM function will remove the spaces associated with a string.
    TRIM (String str);
    TRIM(‘ Hadoop ‘);
    output
    Hadoop

    8> REVERSE function will reverse the characters in a string.

    REVERSE (String str);

    Example
    REVERSE (‘Hadoop’);

    output
    poodaH

    9> LOWER or LCASE function will convert the input string to lower case characters.
    LOWER(String str);
    Example
    LOWER(‘Hadoop’);
    output
    hadoop

    10> UPPER or UCASE function will convert the input string to upper case characters.
    UPPER (String str);
    Example
    UPPER (‘Hadoop’);

    output
    HADOOP

    11> It is important to know about the double type in Hive. Double type in Hive will present the data differently unlike RDBMS.
    The maximum value for a IEEE 754 double is about 2.22E308.

    12> Using ALTER command, we can rename a table in Hive. ALTER TABLE hive_table_name RENAME TO new_name;

    13> ALTER TABLE table_name CHANGE column_name column_name new_datatype,Example: If you want to change the data type of ID column from integer to bigint in a table called employee. ALTER TABLE employee CHANGE id id BIGINT;

    14>SORT BY will sort the data within each reducer. You can use any number of reducers for SORT BY operation.
    ORDER BY will sort all of the data together, which has to pass through one reducer. Thus, ORDER BY in hive uses single reducer.

    15>

    #1973 Reply

    chinni
    Participant

    1a)hive:Apache Hive is an open-source data warehouse system for querying and analyzing large data sets stored in Hadoop files. Hadoop is a framework for handling large data-sets in a distributed computing environment.Hive has three main functions: data summarization, query and analysis.
    2a)The Hive meta store service stores the metadata for Hive tables and partitions in a relational database, and provides clients (including Hive) access to this information via the meta-store service API.

    3a) Hive is not a full database. The design constraints and limitations of Hadoop and HDFS impose limits on what Hive can do.Hive is most suited for data warehouse applications, where1) Relatively static data is analyzed,2) Fast response times are not required, and 3) When the data is not changing rapidly.Hive doesn’t provide crucial features required for OLTP, Online Transaction Processing. It’s closer to being an OLAP tool, Online Analytic Processing.So, Hive is best suited for data warehouse applications, where a large data set is maintained and mined for insights, reports, etc.

    4a)By using awk command in shell, the output from HiveQL (Describe) can be written to a file.
    hive -S -e “describe table_name;” | awk -F” ” ’{print 1}’ > ~/output.

    5a)1.CONCAT:CONCAT function will concat the input strings. You can specify any number of strings separated by comma.
    Example:CONCAT (‘Hive’,’-‘,’performs’,’-‘,’good’,’-‘,’in’,’-‘,’Hadoop’);
    Output:Hive-performs-good-in-Hadoop
    So, every time you delimit the strings by ‘-‘. If it is common for all the strings, then Hive provides another command CONCAT_WS. Here you have to specify the delimit operator first.
    CONCAT_WS (‘-‘,’Hive’,’performs’,’good’,’in’,’Hadoop’);
    Output: Hive-performs-good-in-Hadoop

    6a)REPEAT function will repeat the input string n times specified in the command.
    Example:REPEAT(‘Hadoop’,3);
    Output:HadoopHadoopHadoop.
    Note: You can add a space with the input string also.

    7a)TRIM function will remove the spaces associated with a string.
    Example:TRIM(‘ Hadoop ‘);
    Output:Hadoop.
    Note: If you want to remove only leading or trialing spaces then you can specify the below commands respectively.
    LTRIM(‘ Hadoop’);
    RTRIM(‘Hadoop ‘);

    8a)REVERSE function will reverse the characters in a string.
    Example:REVERSE(‘Hadoop’);
    Output:poodaH

    9a)LOWER or LCASE function will convert the input string to lower case characters.
    Example:
    LOWER(‘Hadoop’);
    LCASE(‘Hadoop’);
    Output:hadoop
    Note:If the characters are already in lower case then they will be preserved.

    10a)UPPER or UCASE function will convert the input string to upper case characters.
    Example:
    UPPER(‘Hadoop’);
    UCASE(‘Hadoop’);
    Output:HADOOP
    Note:If the characters are already in upper case then they will be preserved.

    11a)It is important to know about the double type in Hive. Double type in Hive will present the data differently unlike RDBMS.
    See the double type data below:
    24624.0
    32556.0
    3.99893E5
    4366.0
    E5 represents 10^5 here. So, the value 3.99893E5 represents 399893. All the calculations will be accurately performed using double type. The maximum value for a IEEE 754 double is about 2.22E308.

    12a)Using ALTER command, we can rename a table in Hive.
    ALTER TABLE hive_table_name RENAME TO new_name;
    There is another way to rename a table in Hive.
    or
    In that case, Import and export options can be utilized. Here you are saving the hive data into HDFS and importing back to new table like below.
    EXPORT TABLE tbl_name TO ‘HDFS_location’;
    IMPORT TABLE new_tbl_name FROM ‘HDFS_location’;

    13a)ALTER TABLE table_name CHANGE column_name column_name new_datatype;
    Example: If you want to change the data type of ID column from integer to bigint in a table called employee.
    ALTER TABLE employee CHANGE id id BIGINT;

    14a)SORT BY will sort the data within each reducer. You can use any number of reducers for SORT BY operation.
    ORDER BY will sort all of the data together, which has to pass through one reducer. Thus, ORDER BY in hive uses single reducer.
    ORDER BY guarantees total order in the output while SORT BY only guarantees ordering of the rows within a reducer. If there is more than one reducer, SORT BY may give partially ordered final results

    15a)RLIKE (Right-Like) is a special function in Hive where if any substring of A matches with B then it evaluates to true. It also obeys Java regular expression pattern. Users don’t need to put % symbol for a simple match in RLIKE.
    Examples:‘Express’ RLIKE ‘Exp’ –> True

    16a)The main difference is that when you drop an external table, the underlying data files stay intact. This is because the user is expected to manage the data files and directories. With a managed table, the underlying directories and data get wiped out when the table is dropped.

    #1977 Reply

    suchitra mohanty
    Participant

    1) Apache Hive is an open-source data warehouse system for querying and analyzing large datasets stored in Hadoop files. Hadoop is a framework for handling large datasets in a distributed computing environment.

    2) The metastore is the central repository of Hive metadata.
    The metastore is divided into two pieces: a service and the backing store for the data.The Hive metastore service stores the metadata for Hive tables and partitions in a relational database, and provides clients access to this information via the metastore service API.
    16)Internal table-
    a) Hive internal table, store meta data and actual data in internal table. If table lost, both meta data and schema both lost.
    b)When you drop an internal table, it drops the data, and it also drops the metadata.
    External table-
    a) In external table, just lost only metastore but not actual data. it means hive ignore the data, but not delete actual data so its best.
    b) When you drop an external table, it only drops the meta data. That means hive is ignorant of that data now. It does not touch the data itself.

Viewing 7 posts - 1 through 7 (of 7 total)
Reply To: Hive Questions
Your information:




cf22

Your Name (required)

Your Email (required)

Subject

Phone No

Your Message

Cart

  • No products in the cart.