shell unix中日期字段的排序

8ftvxx2r  于 12个月前  发布在  Shell
关注(0)|答案(5)|浏览(131)

我有一个文本文件,其中包含数十万条记录。其中一个字段是日期字段。是否有任何方法可以根据日期字段对文件进行排序?

09-APR-12 04.08.43.632279000 AM
19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
19-MAR-12 03.54.32.595348000 PM
27-MAR-12 10.28.14.797580000 AM
28-MAR-12 12.28.02.652969000 AM
27-MAR-12 07.28.02.828746000 PM

字符串
输出应为

19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.54.32.595348000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
27-MAR-12 10.28.14.797580000 AM
27-MAR-12 07.28.02.828746000 PM
28-MAR-12 12.28.02.652969000 AM
09-APR-12 04.08.43.632279000 AM


我尝试使用sort命令对日期进行排序(将日期字段作为字符串),但它没有给出正确的输出。

nwsw7zdq

nwsw7zdq1#

Chronicle的解决方案很接近,但忽略了AM/PM的区别,将27-MAR-12 07.28.02.828746000 PM排序在27-MAR-12 10.28.14.797580000 AM之前。这可以修改:

sort -t- -k 3.1,3.2 -k 2M -k 1n -k 3.23,3.24

字符串
但这仍然是非常脆弱的。将日期转换为纪元时间并进行数字比较会更好。

bogh5gae

bogh5gae2#

试试这个:

Input.txt

09-APR-12 04.08.43.632279000 AM 
19-MAR-12 03.53.38.189606000 PM 
19-MAR-12 03.56.27.933365000 PM 
19-MAR-12 04.00.13.387316000 PM 
19-MAR-12 04.04.45.168361000 PM 
19-MAR-12 03.54.32.595348000 PM 
27-MAR-12 10.28.14.797580000 AM 
28-MAR-12 12.28.02.652969000 AM 
27-MAR-12 07.28.02.828746000 PM

字符串

验证码

sort -t "-"  -k 3 -k 2M -nk 1 Input.txt

输出

19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.54.32.595348000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
27-MAR-12 07.28.02.828746000 PM
27-MAR-12 10.28.14.797580000 AM
28-MAR-12 12.28.02.652969000 AM
09-APR-12 04.08.43.632279000 AM

koaltpgm

koaltpgm3#

Decorate-Sort-Undecorate习惯用法适用于任何awk、任何sort和任何cut:

$ awk -F',' -v OFS='\t' '{
    split($NF,t,/[- ]/)
    mthNr = (index("JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",t[2])+2)/3
    printf "%02d%02d%02d%s%s\t%s\n", t[3], mthNr, t[1], t[5], t[4], $0
}' file | sort -k1,1 | cut -f2-
19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.54.32.595348000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
27-MAR-12 10.28.14.797580000 AM
27-MAR-12 07.28.02.828746000 PM
28-MAR-12 12.28.02.652969000 AM
09-APR-12 04.08.43.632279000 AM

字符串
如果你不确定这是如何工作的,看看awk命令的输出,它在cut(undecorates)再次删除它之前,将键时间戳添加到(装饰)sort操作的输入:

$ awk -F',' -v OFS='\t' '{
    split($NF,t,/[- ]/)
    mthNr = (index("JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",t[2])+2)/3
    printf "%02d%02d%02d%s%s\t%s\n", t[3], mthNr, t[1], t[5], t[4], $0
}' file
120409AM04.08.43.632279000      09-APR-12 04.08.43.632279000 AM
120319PM03.53.38.189606000      19-MAR-12 03.53.38.189606000 PM
120319PM03.56.27.933365000      19-MAR-12 03.56.27.933365000 PM
120319PM04.00.13.387316000      19-MAR-12 04.00.13.387316000 PM
120319PM04.04.45.168361000      19-MAR-12 04.04.45.168361000 PM
120319PM03.54.32.595348000      19-MAR-12 03.54.32.595348000 PM
120327AM10.28.14.797580000      27-MAR-12 10.28.14.797580000 AM
120328AM12.28.02.652969000      28-MAR-12 12.28.02.652969000 AM
120327PM07.28.02.828746000      27-MAR-12 07.28.02.828746000 PM


注意它会按照所需的顺序进行排序。

s1ag04yj

s1ag04yj4#

此脚本以纳秒分辨率按纪元时间排序:

awk '{
  t = gensub(/\.([0-9]{2})\./, ":\\1:", 1, $0);
  command = "date +%s%N -d \x022" t "\x022";
  command | getline t;
  close(command);
  print t, $0;
}' unsorted.txt | sort -n -k 1 | cut -d ' ' -f 2- > sorted.txt

字符串

lnlaulya

lnlaulya5#

你可以使用date,这通常可能是一个不错的主意,特别是如果你不需要担心微秒的话,否则你可能会剪掉微秒,并将其作为次要的排序字段进行排序。

while read a; do   
grep "^${a}" input.txt; 
done < <(sed 's/\./:/;s/\./:/' input.txt | xargs -n3 -I{} date -d"{}" +%s | sort | xargs -n1 -I{} date -d @'{}' +'%d-%^h-%y %I.%M.%S')

字符串

相关问题