如何最好地从CSV文件的集合中创建摄入目录?

gijlo24d  于 11个月前  发布在  其他
关注(0)|答案(2)|浏览(142)

我试图找出从CSV文件集合中创建接收目录的最佳方法,我希望每个CSV文件都是一个单独的source
我可以通过执行以下操作为一个CSV创建catalog.yml

import intake
source1 = intake.open_csv('states_1.csv')
source1.name = 'states1'
with open('catalog.yml', 'w') as f:
    f.write(str(source1.yaml()))

字符串
它产生有效的:

sources:
  states1:
    args:
      urlpath: states_1.csv
    description: ''
    driver: intake.source.csv.CSVSource
    metadata: {}


但如果我

import intake
source1 = intake.open_csv('states_1.csv')
source1.name = 'states1'
source2 = intake.open_csv('states_2.csv')
source2.name = 'states2'
with open('catalog.yml', 'w') as f:
    f.write(str(source1.yaml()))
    f.write(str(source2.yaml()))


当然,这会失败,因为目录有一个重复的sources条目:

sources:
  states1:
    args:
      urlpath: states_1.csv
    description: ''
    driver: intake.source.csv.CSVSource
    metadata: {}
sources:
  states2:
    args:
      urlpath: states_2.csv
    description: ''
    driver: intake.source.csv.CSVSource
    metadata: {}


我猜一定有更好的方法来实现这一点,比如示例化一个目录对象,添加源对象,然后编写目录?但我找不到实现这一点的方法。
实现这一点的最佳做法是什么?

jfgube3f

jfgube3f1#

尝试使用intake.Catalog()并将您的源代码添加到其中。

import intake

description = "Simple catalog for multiple CSV sources"
catalog = {'metadata': {'version': 1,'description': description},'sources': {}}
with open('catalog.yml', 'w') as f:
    yaml.dump(catalog, f)

# Create a catalog object
catalog = intake.open_catalog('catalog.yml')

# Define your CSV sources
source1 = intake.open_csv('states_1.csv')
source1.name = 'states1'
source2 = intake.open_csv('states_2.csv')
source2.name = 'states2'

# Add the sources to the catalog
catalog = catalog.add(source1)
catalog = catalog.add(source2)

catalog.save('catalog.yml')

字符串

im9ewurl

im9ewurl2#

我想你的答案就在这条线上:
Extract file name from read_csv - Python
使用os模块将路径和文件名分配给变量。然后你可以将它们分配给python中的字典,并在过程结束时将整个内容转储到yaml中,如这里所详细介绍的。
How can I write data in YAML format in a file?

相关问题