我是cloudera环境的新手,我正在尝试使用sqoop从rdbms导入数据,并且需要在导入期间对数据应用一些转换。具体来说,我需要加密一些字段,然后再将它们存储在hadoop dfs上。为了实现这一点,我尝试使用codegen命令,它生成一个我可以修改的ormjava类。
假设我在mysql数据库上有一个表'products',我想使用sqoop在hdfs上导入它并加密'brand'字段。首先,我运行了以下命令:
sqoop codegen \
--connect jdbc:mysql://localhost/test \
--username username --password password \
--table products
这将在/tmp/sqoop training/compile/fc8868dda33ef703ad126583cf77477f文件夹中生成文件products.java、products.jar和products.class。
现在我修改了products.java中的readfields方法,如下所示:
// WARNING: This class is AUTO-GENERATED. Modify at your own risk.
//
// Debug information:
// Generated date: Thu Nov 16 06:55:13 PST 2017
// For connector: org.apache.sqoop.manager.MySQLManager
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.lib.db.DBWritable;
import com.cloudera.sqoop.lib.JdbcWritableBridge;
import com.cloudera.sqoop.lib.DelimiterSet;
import com.cloudera.sqoop.lib.FieldFormatter;
import com.cloudera.sqoop.lib.RecordParser;
import com.cloudera.sqoop.lib.BooleanParser;
import com.cloudera.sqoop.lib.BlobRef;
import com.cloudera.sqoop.lib.ClobRef;
import com.cloudera.sqoop.lib.LargeObjectLoader;
import com.cloudera.sqoop.lib.SqoopRecord;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.sql.Date;
import java.sql.Time;
import java.sql.Timestamp;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
public class products extends SqoopRecord implements DBWritable, Writable {
// [...]
public void readFields(ResultSet __dbResults) throws SQLException {
this.__cur_result_set = __dbResults;
this.prod_id = JdbcWritableBridge.readInteger(1, __dbResults);
this.brand = encrypt(JdbcWritableBridge.readString(2, __dbResults));
this.name = JdbcWritableBridge.readString(3, __dbResults);
this.price = JdbcWritableBridge.readInteger(4, __dbResults);
this.cost = JdbcWritableBridge.readInteger(5, __dbResults);
this.shipping_wt = JdbcWritableBridge.readInteger(6, __dbResults);
}
// [...]
}
我有两个问题:
1) 如何重新编译products.java以获得products.class和products.jar的更新版本?我试过了
javac products.java
但是jvm给出了82个错误,似乎找不到来自hadoop和cloudera命名空间的包:
error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.BytesWritable;
^
products.java:8: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Text;
^
products.java:9: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Writable;
^
products.java:10: error: package org.apache.hadoop.mapred.lib.db does not exist
import org.apache.hadoop.mapred.lib.db.DBWritable;
^
products.java:11: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.JdbcWritableBridge;
^
products.java:12: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.DelimiterSet;
^
products.java:13: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.FieldFormatter;
^
products.java:14: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.RecordParser;
^
products.java:15: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.BooleanParser;
^
products.java:16: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.BlobRef;
^
products.java:17: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.ClobRef;
^
products.java:18: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.LargeObjectLoader;
^
products.java:19: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.SqoopRecord;
2) 一旦我成功地编译了products.java,我如何使用sqoop使用我的定制orm类导入hdfs上的数据?
提前谢谢!
1条答案
按热度按时间tjrkku2a1#
关于第一个问题:
添加
然后再试一次。
另外,一般在架构上,对“在hadoop dfs上存储一些字段之前,我需要对它们进行加密”的小评论——为什么不使用hdfs透明加密呢?https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_sg_hdfs_encryption.html 您可以实现相同的无需任何编码。