通过创建新表,在customer表上选择transform。新的目标表应该只有三列, c_custkey
(无变化), c_address
,和 c_city
.
对于 c_address
列,将其缩短为5个字符。
为了 c_city
,添加空格和 #
表示结尾的数字(例如。, UNITED KI2
=> UNITED KI #2
,或 INDONESIA4
=> INDONESIA #4
).
create table customer (
c_custkey int,
c_name varchar(25),
c_address varchar(25),
c_city varchar(10),
c_nation varchar(15),
c_region varchar(12),
c_phone varchar(15),
c_mktsegment varchar(10)
);
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
create table customer_ty (
c_custkey int,
c_address STRING,
c_city STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
;
客户表数据
1|Customer#000000001|j5JsirBM9P|MOROCCO 0|MOROCCO|AFRICA|25-989-741-2988|BUILDING|
2|Customer#000000002|487LW1dovn6Q4dMVym|JORDAN 1|JORDAN|MIDDLE EAST|23-768-687-3665|AUTOMOBILE|
3|Customer#000000003|fkRGN8n|ARGENTINA7|ARGENTINA|AMERICA|11-719-748-3364|AUTOMOBILE|
4|Customer#000000004|4u58h f|EGYPT 4|EGYPT|MIDDLE EAST|14-128-190-5944|MACHINERY|
5|Customer#000000005|hwBtxkoBF qSW4KrI|CANADA 5|CANADA|AMERICA|13-750-942-6364|HOUSEHOLD|
6|Customer#000000006| g1s,pzDenUEBW3O,2 pxu|SAUDI ARA2|SAUDI ARABIA|MIDDLE EAST|30-114-968-4951|AUTOMOBILE|
7|Customer#000000007|8OkMVLQ1dK6Mbu6WG9|CHINA 0|CHINA|ASIA|28-190-982-9759|AUTOMOBILE|
8|Customer#000000008|j,pZ,Qp,qtFEo0r0c 92qo|PERU 6|PERU|AMERICA|27-147-574-9335|BUILDING|
9|Customer#000000009|vgIql8H6zoyuLMFN|INDIA 6|INDIA|ASIA|18-338-906-3675|FURNITURE|
10|Customer#000000010|Vf mQ6Ug9Ucf5OKGYq fs|ETHIOPIA 9|ETHIOPIA|AFRICA|15-741-346-9870|HOUSEHOLD|
1条答案
按热度按时间csga3l581#
c_city
使用regexp可以轻松完成转换,请参见配置单元示例:结果
目前还不清楚该如何缩短地址。请澄清规则,我或其他人也可以帮忙。