Monday, May 25, 2015

Sqoop import failed java.io.CharConversionException

Problem:
Sqoop import job failed to read data from DB2 database which has UTF8 encoding. Essentially, even the data cannot be read at DB2 with select queries as there are some characters which are not in UTF8.
Sqoop job will throw an error similar to below:
Error: java.io.IOException: SQLException in nextKeyValue
        at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:265)
..
..
Caused by: com.ibm.db2.jcc.am.SqlException: [jcc][t4][1065][12306][4.19.26] Caught java.io.CharConversionException.  See attached Throwable for details. ERRORCODE=-4220, SQLSTATE=null
        at com.ibm.db2.jcc.am.kd.a(Unknown Source)
        at com.ibm.db2.jcc.am.kd.a(Unknown Source)
..
..
Caused by: java.nio.charset.MalformedInputException: Input length = 527
        at com.ibm.db2.jcc.am.s.a(Unknown Source)
        ... 22 more
Caused by: sun.io.MalformedInputException
        at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:105)
        ... 23 more

Workaround:
Refer to the article: http://www-01.ibm.com/support/docview.wss?uid=swg21684365 for the workaround suggested by IBM to avoid the issue.
To summarize, after setting the parameter db2.jcc.charsetDecoderEncoder=3, an exception will not be thrown when a non-UTF8 character will be encountered, but it will be substituted by its equivalent Unicode replacement character.

How to use it with Sqoop / similar applications:
This property must be passed to the driver, however if you pass it by supplying "-D db2.jcc.charsetDecoderEncoder=3" to the sqoop import command, this parameter will be appended to the job configuration for the mapreduce job which is triggered to import the data and will not reach the db2 driver, ultimately the job will fail. 
Incorrect way:
sqoop import -Ddb2.jcc.charsetDecoderEncoder=3 --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://db2host:13610/database --username xxxxx --password xxxxx --verbose --target-dir xxx/xxxx --query "select col1 from foo.t1 -m 1 --append
In order to use this property, you can include it in the parameters which are used to define the JVM parameters for invoking java calls. For instance, you can modify the value of parameter mapreduce.map.java.opts in mapred-site.xml like below:
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024m -Ddb2.jcc.charsetDecoderEncoder=3</value>
</property>

No comments:

Post a Comment