sqoop常用导数据命令集合

1. 列出mysql数据库中的所有数据库
2. 连接mysql并列出数据库中的表
3. 将关系型数据的表结构复制到hive中
4. 将数据从关系数据库导入文件到hive表中
5. 将hive中的表数据导入到mysql数据库表中
6. 将数据从关系数据库导入文件到hive表中，–query语句使用
7. 将数据从关系数据库导入文件到hive表中，–columns –where语句使用
8. sqoop help import : common arguments

列出mysql数据库中的所有数据库

sqoop list-databases –connect jdbc:mysql://localhost:3306/ –username dyh –password 000000

连接mysql并列出数据库中的表

sqoop list-tables –connect jdbc:mysql://localhost:3306/ –username dyh –password 000000

将关系型数据的表结构复制到hive中

sqoop create-hive-table –connect jdbc:mysql://localhost:3306/test –table users –username dyh

–password 000000 –hive-table users –fields-terminated-by “\0001” –lines-terminated-by “\n”;

参数说明：

–fields-terminated-by “\0001” 是设置每列之间的分隔符，”\0001”是ASCII码中的1，它也是hive的默认行内分隔符，而sqoop的默认行内分隔符为”，”

–lines-terminated-by “\n” 设置的是每行之间的分隔符，此处为换行符，也是默认的分隔符；

注意：只是复制表的结构，表中的内容没有复制

将数据从关系数据库导入文件到hive表中

sqoop import –connect jdbc:mysql://localhost:3306/test –username dyh –password 000000

–table users –hive-import –hive-table users -m 2 –fields-terminated-by “\0001”;

参数说明： -m 2 表示由两个map作业执行 –fields-terminated-by “\0001” 需同创建hive表时保持一致。
导入格式：在默认的情况下，sqoop会将我们导入的数据保存为逗号分隔的文件。如果导入数据的字段内容中存在分隔符，我们可以制定分隔符、字段保卫字符和转移字符。使用命令行参数可以制定分隔符、文件格式、压缩以及对导入过程进行更细粒度的控制。
导入控制： Sqoop 可以指定导入表得部分列。用户也可以在查询中加入where自字句，一次来限定需要导入的记录。用户提供得where字句会在任务分解之前执行。

将hive中的表数据导入到mysql数据库表中

sqoop export –connect jdbc:mysql://192.168.20.3306/test –username dyh –password 000000 –table users –export-dir /usr/hive/warehouse/users/part-m-00000 –input-fields-terminated-by ‘\0001’

注意：
1. 在进行导入之前，mysql中的表userst必须已经提前创建好了。
2. jdbc:mysql://192.168.118:3306/test中的IP地址改成localhost会报异常

将数据从关系数据库导入文件到hive表中，–query语句使用

sqoop import –append –connect jdbc:mysql://192.168.20.118:3306/test –username dyh –password 000000 –query “select id,age,name from userinfos where \$CONDITIONS” -m 1 –target-dir /user/hive/warehouse/userinfos2 –fields-terminated-by “,”;

将数据从关系数据库导入文件到hive表中，–columns –where语句使用

sqoop import –append –connect jdbc:mysql://192.168.20.118:3306/test –username dyh –password 000000 –table userinfos –columns “id,age,name” –where “id > 3 and (age = 88 or age = 80)” -m 1 –target-dir /user/hive/warehouse/userinfos2 –fields-terminated-by “,”;

注意：–target-dir /user/hive/warehouse/userinfos2 可以用 –hive-import –hive-table userinfos2 进行替换

数据导入和一致性

在向HDFS导入数据时，重要的是要确保访问的数据源的一致性快照。从一个数据库中并行读取数据的Map任务分别运行在不同得进程中，因此他们不肯呢过共享同一个数据库食物。保证一致性的最好方法就是在导入数据的时候不允许任何对表中现有数据进行更新的过程。

直接模式导入

例如MySql的mysqldump能够以大于JDBC的吞吐率从表中读取数据。在Sqoop的文档中将这种使用外部工具的方法成为“直接模式”

导入的数据与HIVE

如果想直接从数据库将数据导入到Hive，可以将三个步骤（将数据导入HDFS；创建HIVE表；将HDFS中的数据导入HIVE）缩短为一个步骤。在进行导入时，Sqoop可以生成Hive表得定义，然后直接将数据导入Hive表。如果我们还没有执行过导入操作，就可以使用下面得命令，根据MySQL中的数据直接穿件Hive中的widgests表。 % sqoop import –connect jdbc:mysql://localhost/hadoopguide > –table widgets -m 1 –hive-import 在数据导出的时候，却没有这样方便的一次性操作：虽然Sqoop可以推断出哪一个Java类型适合存储SQL数据类型，但是反过来确实行不通的。

sqoop help import : common arguments

</tr> </tr> </tr> </tr> </tr> </tr> </tr> </tbody> </table>

argu	description
–connect </td>	Specify JDBC connect string
–connection-manager </td>	Specify connection manager class to use
–driver </td>	Manually specify JDBC driver class to use
–hadoop-home </td>	Overrite $HADOOP_HOME
–help	Print usage instructions
-P	Read password from console
–password </td>	Set authentication password
–username </td>	Set authentication username
–verbose	Print more information while working
–connection-param-file </td>	Optional properties file that provide connection parameters