Sqoop官方文档学习笔记02 Sqoop Tools

Sqoop官方文档学习笔记02 Sqoop Tools

文章目录

6. Sqoop Tools

这是每一个部分大概讲了什么内容,方便我之后复习索引。
6.1. Using Command Aliases 使用命令行别名
6.2. Controlling the Hadoop Installation 指定Hadoop版本
6.3. Using Generic and Specific Arguments 通用参数和特定参数的用法示例
6.4. Using Options Files to Pass Arguments 使用包含参数的脚本文件来传递参数
6.5. Using Tools 欲知后事如何,且看下篇博客

Sqoop is a collection of related tools. To use Sqoop, you specify the tool you want to use and the arguments that control the tool.

Sqoop是一系列工具的集合,你需要学习每个工具的参数。

If Sqoop is compiled from its own source, you can run Sqoop without a formal installation process by running the bin/sqoop program. Users of a packaged deployment of Sqoop (such as an RPM shipped with Apache Bigtop) will see this program installed as /usr/bin/sqoop. The remainder of this documentation will refer to this program as sqoop. For example:

$ sqoop tool-name [tool-arguments]

Note
The following examples that begin with a $ character indicate that the commands must be entered at a terminal prompt (such as bash). The $ character represents the prompt itself; you should not start these commands by typing a $. You can also enter commands inline in the text of a paragraph; for example, sqoop help. These examples do not show a $ prefix, but you should enter them the same way. Don’t confuse the $ shell prompt in the examples with the $ that precedes an environment variable name. For example, the string literal $HADOOP_HOME includes a “$”.

别把终端的$和变量之前的$搞混了

Sqoop ships with a help tool. To display a list of all available tools, type the following command:

$ sqoop help
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  					 生成与数据库记录交互的代码
  					 
  create-hive-table  Import a table definition into Hive
  					 将一个表定义导入到hive中(在hive里创建表)
  					 
  eval               Evaluate a SQL statement and display the results
  					 执行SQL语句并打印结果
  					 
  export             Export an HDFS directory to a database table
  					 导出HDFS目录到数据库表
  					 
  help               List available commands
  					 列出可用的命令
  					 
  import             Import a table from a database to HDFS
  					 从数据库导出一个表到HDFS
  					 
  import-all-tables  Import tables from a database to HDFS
  					 从数据库导出所有表到HDFS
  					 
  import-mainframe   Import mainframe datasets to HDFS
  					 导入主机数据集到HDFS
  					 
  list-databases     List available databases on a server
  					 展示服务器上可用的数据库
  
  list-tables        List available tables in a database
  					 展示数据库中可用的表
  					 
  version            Display version information
  					 展示Sqoop版本信息

See 'sqoop help COMMAND' for information on a specific command.

You can display help for a specific tool by entering: sqoop help (tool-name); for example, sqoop help import.

You can also add the --help argument to any command: sqoop import --help.

6.1. Using Command Aliases

In addition to typing the sqoop (toolname) syntax, you can use alias scripts that specify the sqoop-(toolname) syntax. For example, the scripts sqoop-import, sqoop-export, etc. each select a specific tool.

6.2. Controlling the Hadoop Installation

You invoke Sqoop through the program launch capability provided by Hadoop. The sqoop command-line program is a wrapper which runs the bin/hadoop script shipped with Hadoop. If you have multiple installations of Hadoop present on your machine, you can select the Hadoop installation by setting the $HADOOP_COMMON_HOME and $HADOOP_MAPRED_HOME environment variables.

For example:

$ HADOOP_COMMON_HOME=/path/to/some/hadoop \
  HADOOP_MAPRED_HOME=/path/to/some/hadoop-mapreduce \
  sqoop import --arguments...

or:

$ export HADOOP_COMMON_HOME=/some/path/to/hadoop
$ export HADOOP_MAPRED_HOME=/some/path/to/hadoop-mapreduce
$ sqoop import --arguments...

If either of these variables are not set, Sqoop will fall back to $HADOOP_HOME. If it is not set either, Sqoop will use the default installation locations for Apache Bigtop, /usr/lib/hadoop and /usr/lib/hadoop-mapreduce, respectively.

The active Hadoop configuration is loaded from $HADOOP_HOME/conf/, unless the $HADOOP_CONF_DIR environment variable is set.

6.3. Using Generic and Specific Arguments

To control the operation of each Sqoop tool, you use generic and specific arguments.
为了更好的掌握Sqoop,你需要了解更多更具体的参数。

For example 比如下面是关于import的参数:

$ sqoop help import
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:
   --connect <jdbc-uri>     Specify JDBC connect string
   --connect-manager <class-name>     Specify connection manager class to use
   --driver <class-name>    Manually specify JDBC driver class to use
   							手动指定JDBC驱动
   --hadoop-mapred-home <dir>      Override $HADOOP_MAPRED_HOME
   --help                   Print usage instructions
   --password-file          Set path for file containing authentication password
   							设置包括密码的验证文件
   -P                       Read password from console
   --password <password>    Set authentication password
   --username <username>    Set authentication username
   --verbose                Print more information while working
   							是否打印运行中日志信息
   --hadoop-home <dir>     Deprecated. Override $HADOOP_HOME

[...]

Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
							   指定namenode的ip和端口
-jt <local|jobtracker:port>    specify a job tracker
							   指定job tracker的ip和端口
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster 
										  指定文件地址,用逗号分隔
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath. 
										  指定jar包地址,用逗号分隔
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

You must supply the generic arguments -conf, -D, and so on after the tool name but before any tool-specific arguments (such as --connect). Note that generic Hadoop arguments are preceeded by a single dash character (-), whereas tool-specific arguments start with two dashes (–), unless they are single character arguments such as -P.

The -conf, -D, -fs and -jt arguments control the configuration and Hadoop server settings. For example, the -D mapred.job.name=<job_name> can be used to set the name of the MR job that Sqoop launches, if not specified, the name defaults to the jar name for the job - which is derived from the used table name.

The -files, -libjars, and -archives arguments are not typically used with Sqoop, but they are included as part of Hadoop’s internal argument-parsing system.

6.4. Using Options Files to Pass Arguments

When using Sqoop, the command line options that do not change from invocation to invocation can be put in an options file for convenience. An options file is a text file where each line identifies an option in the order that it appears otherwise on the command line. Option files allow specifying a single option on multiple lines by using the back-slash character at the end of intermediate lines. Also supported are comments within option files that begin with the hash character. Comments must be specified on a new line and may not be mixed with option text. All comments and empty lines are ignored when option files are expanded. Unless options appear as quoted strings, any leading or trailing spaces are ignored. Quoted strings if used must not extend beyond the line on which they are specified.

为了方便,你可以把命令写在脚本文件里方便再次使用。

Option files can be specified anywhere in the command line as long as the options within them follow the otherwise prescribed rules of options ordering. For instance, regardless of where the options are loaded from, they must follow the ordering such that generic options appear first, tool specific options next, finally followed by options that are intended to be passed to child programs.

To specify an options file, simply create an options file in a convenient location and pass it to the command line via --options-file argument.

Whenever an options file is specified, it is expanded on the command line before the tool is invoked. You can specify more than one option files within the same invocation if needed.

For example, the following Sqoop invocation for import can be specified alternatively as shown below:

$ sqoop import --connect jdbc:mysql://localhost/db --username zhangsan --table TEST

$ sqoop --options-file /users/homer/work/import.txt --table TEST

where the options file /users/homer/work/import.txt contains the following:

import
--connect
jdbc:mysql://localhost/db
--username
zhangsan

The options file can have empty lines and comments for readability purposes. So the above example would work exactly the same if the options file /users/homer/work/import.txt contained the following:
为了方便阅读,通常这种脚本会有很多空行和注释

#
# Options file for Sqoop import
#

# Specifies the tool being invoked
import

# Connect parameter and value
--connect
jdbc:mysql://localhost/db

# Username parameter and value
--username
foo

#
# Remaining options should be specified in the command line.
#

6.5. Using Tools

The following sections will describe each tool’s operation. The tools are listed in the most likely order you will find them useful.

后续的博客将会记录每个工具的操作,并将部分重点部分,生僻英文标注中文。