numpy Xarray在连接数据集时不追加维度

ss2ws0br  于 2023-08-05  发布在  其他
关注(0)|答案(1)|浏览(87)

我有两个数据集要连接起来。它们各自包含两个阵列,一个二维 * 强度 * 阵列(尺寸=时间 * 波长)和一个具有 * 通道名称 * 的不相关1D阵列,这两个数据集中的通道名称相同。当我沿着时间维度连接时,一个额外的时间维度被添加到通道名称数组中。这有点道理,文档中也提到了,但我希望结果中的通道名称保持不变。我怎样才能避免这个额外的维度呢?
下面的例子演示了我想要的。

import numpy as np
import xarray as xr

DIM_TIME = "time"
DIM_CHANNEL_NR = 'channel_number'
DIM_WAVELENGTH = "wavelength"

def create_ds(intensity, time_local):
    wavelength = np.linspace(700.0, 800.0, 8)

    da_wav = xr.DataArray(wavelength, dims=[DIM_WAVELENGTH])
    da_time = xr.DataArray(time_local, dims=[DIM_TIME])
    da_chan_nr = xr.DataArray(np.array([1, 2]), dims=[DIM_CHANNEL_NR])

    da_intensity = xr.DataArray(
        intensity, name='intensity',
        dims=[DIM_TIME, DIM_WAVELENGTH],
        coords={DIM_TIME: da_time, DIM_WAVELENGTH: da_wav})

    da_chan_name = xr.DataArray(
        data = np.array(['UV', 'VIS']),
        name = 'chan_name',
        dims = [DIM_CHANNEL_NR])

    ds = xr.Dataset(
        data_vars={da_intensity.name: da_intensity},
        coords={
            DIM_TIME: da_time,
            DIM_WAVELENGTH: da_wav,
            DIM_CHANNEL_NR: da_chan_nr})

    ds[da_chan_name.name] = da_chan_name
    return ds

def main():
    ds1 = create_ds(
        intensity=np.arange(24).reshape((3, 8)),
        time_local = np.array([1e17, 2e17, 3e17]).astype('datetime64[ns]'))

    ds2 = create_ds(
        intensity=np.arange(24).reshape((3, 8)) + 24,
        time_local = np.array([4e17, 5e17, 6e17]).astype('datetime64[ns]'))

    print("---- concat ----\n{}\n".format(xr.concat([ds1, ds2], dim=DIM_TIME)))
    print("---- merged ----\n{}\n".format(xr.merge([ds1, ds2])))

if __name__ == "__main__":
    main()

字符串
当我运行这个程序时,连接的数据集如下所示。

---- concat ----
<xarray.Dataset>
Dimensions:         (time: 6, wavelength: 8, channel_number: 2)
Coordinates:
  * time            (time) datetime64[ns] 1973-03-03T09:46:40 ... 1989-01-05T...
  * wavelength      (wavelength) float64 700.0 714.3 728.6 ... 771.4 785.7 800.0
  * channel_number  (channel_number) int32 1 2
Data variables:
    intensity       (time, wavelength) int32 0 1 2 3 4 5 6 ... 42 43 44 45 46 47
    chan_name       (time, channel_number) <U3 'UV' 'VIS' 'UV' ... 'UV' 'VIS'


如您所见,chan_name数组现在是二维的; time维度已被前置。“
当我合并数据集时,结果与我想要的完全一样:

---- merged ----
<xarray.Dataset>
Dimensions:         (time: 6, wavelength: 8, channel_number: 2)
Coordinates:
  * time            (time) datetime64[ns] 1973-03-03T09:46:40 ... 1989-01-05T...
  * wavelength      (wavelength) float64 700.0 714.3 728.6 ... 771.4 785.7 800.0
  * channel_number  (channel_number) int32 1 2
Data variables:
    intensity       (time, wavelength) float64 0.0 1.0 2.0 ... 45.0 46.0 47.0
    chan_name       (channel_number) <U3 'UV' 'VIS'


这里的chan_name数组与原始数据集中的相同,是一维数组。
不幸的是,xr.mergexr.concat要慢得多。有没有一种方法可以连接不相关的数组?

7jmck4yq

7jmck4yq1#

试试这个:

print("---- concat ----\n{}\n".format(xr.concat([ds1, ds2], dim=DIM_TIME, data_vars="minimal")))

字符串

相关问题