我编写了一个代码,读取数据并从元组中选取第二个元素。第二个元素恰好是json。获取json的代码:
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.conf.Configuration;
import com.amazon.traffic.emailautomation.cafe.purchasefilter.util.CodecAwareManifestFileSystem;
import com.amazon.traffic.emailautomation.cafe.purchasefilter.util.CodecAwareManifestInputFormat;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import amazon.emr.utils.manifest.input.ManifestItemFileSystem;
import amazon.emr.utils.manifest.input.ManifestInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat ;
import scala.Tuple2;
val configuration = new Configuration(sc.hadoopConfiguration);
ManifestItemFileSystem.setImplementation(configuration);
ManifestInputFormat.setInputFormatImpl(configuration, classOf[TextInputFormat]);
val linesRdd1 = sc.newAPIHadoopFile("location", classOf[ManifestInputFormat[LongWritable,Text]], classOf[LongWritable], classOf[Text], configuration).map(tuple2 => tuple2._2.toString());
下面是一个例子:
{"data": {"marketplaceId":7,"customerId":123,"eventTime":1471206800000,"asin":"4567","type":"OWN","region":"NA"},"uploadedDate":1471338703958}
现在,我想创建一个数据框架,其中json键如marketplaceid、customerid等作为列,行有其值。我不知道该怎么办?有人能帮我用指针吗?它能帮我达到同样的效果吗?
1条答案
按热度按时间wqsoz72f1#
您可以使用此链接创建一个scala对象来编组/解编组jsonhttps://coderwall.com/p/o--apg/easy-json-un-marshalling-in-scala-with-jackson
然后使用该对象在scala中使用case类读取json数据:
输出: