$HIVE_HOME/bin/hive is a shell utility which can be used to run Hive queries in either interactive or batch mode. HiveServer2 (introduced in Hive 0.11) has its own CLI called Beeline, which is a JDBC client based on SQLLine.
Hive Command Line Options
To get help, run “hive -H” or “hive –help”. Usage (as it is in Hive 0.9.0)
usage: hive
Option | Explanation |
-d,–define <key=value> | Variable subsitution to apply to hive commands. e.g. -d A=B or –define A=B |
-e <quoted-query-string> | SQL from command line |
-f <filename> | SQL from files |
-H,–help | Print help information |
-h <hostname> | Connecting to Hive Server on remote host |
–hiveconf <property=value> | Use value for given property |
–hivevar <key=value> | Variable subsitution to apply to hive commands. e.g. –hivevar A=B |
-i <filename> | Initialization SQL file |
-p <port> | Connecting to Hive Server on port number |
-S,–silent | Silent mode in interactive shell |
-v,–verbose | Verbose mode (echo executed SQL to the console) |
Examples
- Example of running a query from the command line
$HIVE_HOME/bin/hive -e ‘select a.foo from pokes a’
- Example of setting Hive configuration variables
$HIVE_HOME/bin/hive -e ‘select a.foo from pokes a’ –hiveconf hive.exec.scratchdir=/opt/my/hive_scratch –hiveconf mapred.reduce.tasks=1
- Example of dumping data out from a query into a file using silent mode
$HIVE_HOME/bin/hive -S -e ‘select a.foo from pokes a’ > a.txt
- Example of running a script non-interactively from local disk
$HIVE_HOME/bin/hive -f /home/my/hive-script.sql
- Example of running a script non-interactively from a Hadoop supported filesystem (starting in Hive 0.14)
$HIVE_HOME/bin/hive -f hdfs://<namenode>:<port>/hive-script.sql
Hive CLI is a legacy tool which had two main use cases. The first is that it served as a thick client for SQL on Hadoop and the second is that it served as a command line tool for Hive Server (the original Hive server, now often referred to as “HiveServer1”). Hive Server has been deprecated and removed from the Hive code base as of Hive 1.0.0 and replaced with HiveServer2, so the second use case no longer applies. For the first use case, Beeline provides or is supposed to provide equal functionality, yet is implemented differently from Hive CLI.
Ideally, Hive CLI should be deprecated as the Hive community has long recommended using the Beeline plus HiveServer2 configuration; however, because of the wide use of Hive CLI, we instead are replacing Hive CLI’s implementation with a new Hive CLI on top of Beeline plus embedded HiveServer2 so that the Hive community only needs to maintain a single code path. In this way, the new Hive CLI is just an alias to Beeline at both the shell script level and the high code level. The goal is that no or minimal changes are required from existing user scripts using Hive CLI.
The hiverc File
The CLI when invoked without the -i option will attempt to load $HIVE_HOME/bin/.hiverc and $HOME/.hiverc as initialization files.
Hive Batch Mode Commands
When $HIVE_HOME/bin/hive is run with the -e or -f option, it executes SQL commands in batch mode.
hive -e ‘<query-string>’ executes the query string.
hive -f <filepath> executes one or more SQL queries from a file.
Hive Interactive Shell Commands
When $HIVE_HOME/bin/hive is run without either the -e or -f option, it enters interactive shell mode. Use “;” (semicolon) to terminate commands. Comments in scripts can be specified using the “–” prefix.
Command | Description |
quit exit | Use quit or exit to leave the interactive shell. |
reset | Resets the configuration to the default values (as of Hive 0.10). |
set <key>=<value> | Sets the value of a particular configuration variable (key). If you misspell the variable name, the CLI will not show an error. |
set | Prints a list of configuration variables that are overridden by the user or Hive. |
set -v | Prints all Hadoop and Hive configuration variables. |
add FILE[S] <filepath> <filepath>* add JAR[S] <filepath> <filepath>* add ARCHIVE[S] <filepath> <filepath>* | Adds one or more files, jars, or archives to the list of resources in the distributed cache. |
add FILE[S] <ivyurl> <ivyurl>* add JAR[S] <ivyurl> <ivyurl>* add ARCHIVE[S] <ivyurl> <ivyurl>* | As of Hive 1.2.0, adds one or more files, jars or archives to the list of resources in the distributed cache using an Ivy URL of the form ivy://group:module:version?query_string. |
list FILE[S]list JAR[S]list ARCHIVE[S] | Lists the resources already added to the distributed cache. |
list FILE[S] <filepath>* list JAR[S] <filepath>* list ARCHIVE[S] <filepath>* | Checks whether the given resources are already added to the distributed cache or not. |
delete FILE[S] <filepath>* delete JAR[S] <filepath>* delete ARCHIVE[S] <filepath>* | Removes the resource(s) from the distributed cache. |
delete FILE[S] <ivyurl> <ivyurl>* delete JAR[S] <ivyurl> <ivyurl>* delete ARCHIVE[S] <ivyurl> <ivyurl>* | As of Hive 1.2.0, removes the resource(s) which were added using the <ivyurl> from the distributed cache. |
! <command> | Executes a shell command from the Hive shell. |
dfs <dfs command> | Executes a dfs command from the Hive shell. |
<query string> | Executes a Hive query and prints results to standard output. |
source <filepath> | Executes a script file inside the CLI. |
Example
hive> set mapred.reduce.tasks=32;
hive> set;
hive> select a.* from tab1;
hive> !ls;
hive> dfs -ls;
Beeline – New Command Line Shell
HiveServer2 supports a new command shell Beeline that works with HiveServer2. It’s a JDBC client that is based on the SQLLine CLI. The Beeline shell works in both embedded mode as well as remote mode. In the embedded mode, it runs an embedded Hive (similar to Hive CLI) whereas remote mode is for connecting to a separate HiveServer2 process over Thrift. Starting in Hive 0.14, when Beeline is used with HiveServer2, it also prints the log messages from HiveServer2 for queries it executes to STDERR.
Beeline Command Options
Option | Description |
-u <database URL> | The JDBC URL to connect to. Usage: beeline -u db_URL |
-n <username> | The username to connect as. Usage: beeline -n valid_user |
-p <password> | The password to connect as. Usage: beeline -p valid_password |
-d <driver class> | The driver class to use. Usage: beeline -d driver_class |
-e <query> | Query that should be executed. Double or single quotes enclose the query string. This option can be specified multiple times. Usage: beeline -e “query_string“ |
-f <file> | Script file that should be executed. Usage: beeline -f filepath |
–hiveconf property=value | Use value for the given configuration property. Properties that are listed in hive.conf.restricted.list cannot be reset with hiveconf. Usage: beeline –hiveconf prop1=value1 |
–hivevar name=value | Hive variable name and value. This is a Hive-specific setting in which variables can be set at the session level and referenced in Hive commands or queries. Usage: beeline –hivevar var1=value1 |