在shell中将字母数字日期转换为数字

mwngjboj  于 2023-01-21  发布在  Shell
关注(0)|答案(3)|浏览(290)

我有一个数据文件,其中日期是字母数字,每10分钟一次。

00 hour 00 minute (00:00H)
00 hour 10 minute (00:10H)
00 hour 20 minute (00:20H)
and so on

$i文件. txt

00:00H01JUN2021 1.900
00:10H01JUN2021 2.400
00:20H01JUN2021 2.100
00:30H01JUN2021 2.300
00:40H01JUN2021 2.00
00:50H01JUN2021 2.300
01:00H01JUN2021 2.300
01:10H01JUN2021 0.000
01:20H01JUN2021 2.200
01:30H01JUN2021 0.100

要了解数据:
第1列为日期;第二列是当时的值
前6个字母YY:XXH表示YY -〉小时;XX -〉分钟(如开头所述)
我想把它转换成一个带有数字日期的CSV文件。
$文件. txt

yyyy-mm-dd hh-mn-sc,val
2021-06-01 00:00:00,1.900
2021-06-01 00:10:00,2.400
2021-06-01 00:20:00,2.100
2021-06-01 00:30:00,2.300
2021-06-01 00:40:00,2.000
2021-06-01 00:50:00,2.300
2021-06-01 01:00:00,2.300
2021-06-01 01:10:00,0.000
2021-06-01 01:20:00,2.200
2021-06-01 01:30:00,0.100

我的剧本是:

#!/bin/sh
gawk '
    BEGIN {
        month["Jan"] = "01"; month["Feb"] = "02"; month["Mar"] = "03";
        month["Apr"] = "04"; month["May"] = "05"; month["Jun"] = "06";
        month["Jul"] = "07"; month["Aug"] = "08"; month["Sep"] = "09";
        month["Oct"] = "10"; month["Nov"] = "11"; month["Dec"] = "12";
    }
    function timestamp_to_numeric(s) {
        # 00:00H01JUN2021 => 2021-06-01 00:00:00
        return substr(s,12,4)"-"month[substr(s,9,3)]"-"substr(s,7,2) substr(s,1,2)":"substr(s,4,2)":""00"
    }
    NR==1 {next}
    END {
            printf "%s",timestamp_to_numeric($1),$2
            printf "\n"
        }
   ' ifile.txt

这个脚本没有打印我想要的输出。

zphenhs4

zphenhs41#

变更

return substr(s,12,4)"-"month[substr(s,9,3)]"-"substr(s,7,2) substr(s,1,2)":"substr(s,4,2)":""00"

return substr(s,12,4)"-"month[substr(s,9,3)]"-"substr(s,7,2)" "substr(s,1,2)":"substr(s,4,2)":""00"
# .................................,........................^^^

这样在日期和时间之间就有了一个空格。
也许更易读的是:

return sprintf("%4d-%02d-%02d %02d:%02d:00", substr(s,12,4), month[substr(s,9,3)], substr(s,7,2), substr(s,1,2), substr(s,4,2))
lpwwtiir

lpwwtiir2#

使用GNU awk(因为您已经在使用它)作为split()的第4个参数:

$ cat tst.awk
function timestamp_to_numeric(s,        mthNr,t,m) {
    # 00:00H01JUN2021 => 2021-06-01 00:00:00
    split(s,t,/[[:alpha:]]+/,m)
    mthNr = index("  JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",m[2]) / 3
    return sprintf("%04d-%02d-%02d %s:00", t[3], mthNr, t[2], t[1])
}

BEGIN {
    OFS=","
    print "yyyy-mm-dd hh-mn-sc","val"
}
{ print timestamp_to_numeric($1), $2 }
$ awk -f tst.awk ifile.txt
yyyy-mm-dd hh-mn-sc,val
2021-06-01 00:00:00,1.900
2021-06-01 00:10:00,2.400
2021-06-01 00:20:00,2.100
2021-06-01 00:30:00,2.300
2021-06-01 00:40:00,2.00
2021-06-01 00:50:00,2.300
2021-06-01 01:00:00,2.300
2021-06-01 01:10:00,0.000
2021-06-01 01:20:00,2.200
2021-06-01 01:30:00,0.100
roejwanj

roejwanj3#

要将任何大小写的英语月份名称(full或abbr.)Map到month #,这个 * 极其 *-奇怪的查找字符串就足够了-

  • 它通过第二个字母是否为A|a-即Jan / March / May来预分离输入
  • 然后执行第三个字母的参考字符串位置查找
function month_name_to_num(__,_) {
    return \
    index(substr("n_r_yb_r_nlgptvc",
    ((_+=++_)-+-++_)^(__!~"^.[Aa]")),
       tolower(substr(__,_--,--_) ) )
}

OCT 10
AUGUST 8
March 3
May 5
October 10
November 11
February 2
JUNE 6
NOV 11
JUL 7
December 12
OCTOBER 10
FEBRUARY 2
JANUARY 1
MARCH 3
APRIL 4
June 6
April 4
September 9
NOVEMBER 11
January 1
FEB 2
MAY 5
DEC 12
MAY 5
JAN 1
JULY 7
SEP 9
August 8
SEPTEMBER 9
July 7
DECEMBER 12
MAR 3
APR 4
JUN 6
AUG 8

如果您不想使用regex,这个函数变体通过重新使用数据输入变量来绕过分配额外临时变量的需要--这在弱类型语言(如awk)中是非常方便的:

function monthname2num(_) {
    return \
    index("=anebarprayunulugepctovec",
    tolower(substr(_ "",_+=_^=_,_)))/_
}

相关问题