Forum

This topic contains 0 replies, has 1 voice, and was last updated by  hena 8 months, 2 weeks ago.

Viewing 1 post (of 1 total)
  • Author
    Posts
  • #3110 Reply

    hena

    1)hive.optimize.bucketmapjoin.sortedmerge
    answer) If the tables are bucketed by a particular column and these tables are being used in joins then we can enable bucketed map join to improve the performnce. SMB joins are used wherever the tables are sorted and bucketed. The join boils down to just merging the already sorted tables,allowing this operation to be faster than an ordinary map-join

    when we bucket the data by the join keys, you could use the Bucket Map Join. For that the amount of buckets in one table must be a multiple of the amount of buckets in the other table. It can be activated by executing set hive.optimize.bucketmapjoin=true; before the query. If the tables don’t meet the conditions, Hive will simply perform the normal Inner Join.
    If both tables have the same amount of buckets and the data is sorted by the bucket keys, Hive can perform the faster Sort-Merge Join. To activate it, you have to execute the following commands:

    set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
    set hive.optimize.bucketmapjoin=true;
    set hive.optimize.bucketmapjoin.sortedmerge=true;

    2)hive.index.compact.query.max.entries
    Answer) The maximum number of index entries to read during a query that uses the compact index. Negative value is equivalent to infinity.

    3)hive.join.emit.interval
    Answer)How many rows in the right-most join operand Hive should buffer before emitting the join result.

    4)hive.test.mode

    5)hive.zookeeper.quorum localhost.localdomain
    List of ZooKeeper servers to talk to. Used in connection string by JDBC/ODBC clients instead of URI of specific HiveServer2 instance.

    6)hive.input.format org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
    Input formats are playing very important role in Hive performance.Primary choices of Input Format are Text,Sequence File,RC File,ORC .Default is combinehiveinput format.
    InputFormat In Hive
    There are two places where we can specify InputFormat in hive, when creating table and before executing HQL, respectively.

    For the first case, we can specify InputFormat and OutputFormat when creating hive table
    CREATE TABLE example_tbl
    (
    id int,
    name string
    )
    STORED AS INPUTFORMAT ‘org.apache.hadoop.mapred.TextInputFormat’ OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’;

    For the second case, we could set ‘hive.input.format’ before invoking a HQL:
    hive> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
    hive> select * from example_tbl where id > 10000;
    If we set this parameter in hive-site.xml, it will be the default Hive InputFormat provided not setting ‘hive.input.format’ explicitly before the HQL.

    7)hive.metastore.server.max.threads
    Maximum number of worker threads in the Hive Metastore Server’s thread pool
    8)hive.default.fileformat
    The hive.default.fileformat configuration parameter determines the format to use if it is not specified in a CREATE TABLE or ALTER TABLE statement. Text file is the parameter’s default value.
    9)hive.metastore.server.min.threads

    Minimum number of worker threads in the Thrift server’s pool.

    10)hive.security.metastore.authenticator.manager
    The authenticator manager class name to be used in the metastore for authentication. The user-defined authenticator class should implement interface org.apache.hadoop.hive.ql.security.HiveAuthenticationProvider.

Viewing 1 post (of 1 total)
Reply To: CDH4 hive properties (40-50)
Your information:




cf22

Your Name (required)

Your Email (required)

Subject

Phone No

Your Message

Cart

  • No products in the cart.