我对为xml创建模式很陌生。我以前使用xsd解析xml数据。
我正在尝试使用spark读取格式方法。但是我在模式中看不到卖家id。有没有办法让卖家id和交易id都进入我的数据。
df_trade_loan = spark.read.format("com.databricks.spark.xml").option("rowTag","trade").option("rootTag","loan").load("dbfs:/FileStore/shared_uploads/trades/*")
我的xml文件如下所示。
<loan>
<seller>
<id>11</id>
</seller>
<trade id="67" type="Standard">
<advance>
<date>2011-03-09</date>
<amount>16466.76</amount>
<amount_gbp>16466.76</amount_gbp>
<percentage>90.0</percentage>
</advance>
<discount>
<percentage>1.0</percentage>
<on>Facevalue</on>
</discount>
<expected_payment_date>2011-03-18 00:00:00 +0000</expected_payment_date>
<settlement_date>2011-03-25</settlement_date>
<arrears>
<in_arrears>No</in_arrears>
<in_arrears_on_date>nan</in_arrears_on_date>
</arrears>
<payment>
<state>Paid</state>
</payment>
<price_grade>6</price_grade>
<currency>GBP</currency>
<face_value>
<amount>18296.4</amount>
<amount_gbp>18296.4</amount_gbp>
</face_value>
<outstanding_principal>
<amount>0.0</amount>
<amount_gbp>0.0</amount_gbp>
</outstanding_principal>
<crystalised_loss>
<amount>nan</amount>
<date>nan</date>
</crystalised_loss>
<gross_yield>
<annualised>14.164038846995776</annualised>
</gross_yield>
</trade>
</loan>
当前模式如下所示
root
|-- _id: long (nullable = true)
|-- _type: string (nullable = true)
|-- advance: struct (nullable = true)
| |-- amount: double (nullable = true)
| |-- amount_gbp: double (nullable = true)
| |-- date: string (nullable = true)
| |-- percentage: double (nullable = true)
|-- arrears: struct (nullable = true)
| |-- in_arrears: string (nullable = true)
| |-- in_arrears_on_date: string (nullable = true)
|-- crystalised_loss: struct (nullable = true)
| |-- amount: string (nullable = true)
| |-- date: string (nullable = true)
|-- currency: string (nullable = true)
|-- discount: struct (nullable = true)
| |-- on: string (nullable = true)
| |-- percentage: double (nullable = true)
|-- expected_payment_date: string (nullable = true)
|-- face_value: struct (nullable = true)
| |-- amount: double (nullable = true)
| |-- amount_gbp: double (nullable = true)
|-- gross_yield: struct (nullable = true)
| |-- annualised: double (nullable = true)
|-- outstanding_principal: struct (nullable = true)
| |-- amount: double (nullable = true)
| |-- amount_gbp: double (nullable = true)
|-- payment: struct (nullable = true)
| |-- state: string (nullable = true)
|-- price_grade: long (nullable = true)
|-- settlement_date: string (nullable = true)
1条答案
按热度按时间cunj1qz11#
这个代码起作用了。