Wednesday, November 13, 2013

Export more as 10k records with Sqoop

If you want to export more as 10k records out of a RDBMS via sqoop, you have some settings and properties you can tweak.

Parameter --num-mapper
Number of simultaneous connections that will be opened against database. Sqoop will use that many processes to export data (each process will export slice of the data). Here you have to take care about the max open connections to your RDBMS, since this can overwhelm the RDBMS easily.

Parameter --batch
Enabling batch mode on the JDBC driver. Here you use the JDBC batching mode, which queues the queries and deliver batched results

Property sqoop.export.records.per.statement
Number of rows that will be created for single insert statement, e.g. INSERT INTO xxx VALUES (), (), (), ...
Here you have to know which VALUES you want to catch, but this

Property export.statements.per.transaction
Number of insert statements per single transaction. e.g BEGIN; INSERT, INSERT, .... COMMIT

You can specify the properties (even both at the same time) in the HADOOP_ARGS section of the command line, for example: 
sqoop export -Dsqoop.export.records.per.statement=X --connect ...