shell 使python sort / compare与GNU sort相同

7xllpg7q  于 12个月前  发布在  Shell
关注(0)|答案(1)|浏览(155)

经过初步测试后,Python似乎使用了与Linux sort(gnu sort)相同的排序顺序,即C排序顺序(如果区域设置为“C”)。
然而,我希望能够编写Python代码,根据语言环境,以与gnu排序相同的方式进行排序和比较。
小的示例代码来说明这个问题:

import os 
import subprocess

words = [
    "Abd",
    "éfg",
    "aBd",
    "aBd",
    "zzz",
    "ZZZ",
    "efg",
    "abd",
    "fff",
    ]

with open("tosort", "w") as fout:
    for word in words:
        fout.write(word + "\n")

os.environ["LC_ALL"] = "en_US.UTF-8" 
proc = subprocess.Popen(["sort", "tosort"], stdout=subprocess.PIPE)
sort_en_utf = proc.stdout.read().decode('utf-8').split()

os.environ["LC_ALL"] = "C" 
proc = subprocess.Popen(["sort", "tosort"], stdout=subprocess.PIPE) 
sort_c = proc.stdout.read().decode('utf-8').split()

os.environ["LC_ALL"] = "en_US.UTF-8"
sort_py = sorted(words)

for row in zip(sort_en_utf, sort_c, sort_py):
    print(" ".join(row))

如果上面的代码运行,我得到以下输出:

abd Abd Abd
aBd ZZZ ZZZ
aBd aBd aBd
Abd aBd aBd
efg abd abd
éfg efg efg
fff fff fff
zzz zzz zzz
ZZZ éfg éfg

列1是排序/比较的顺序,我想在我的python代码,如果区域设置为“en_US.UTF-8”列2和3显示,python排序的方式与Linux的排序相同,如果区域设置为“C”。
所以我也想知道,是否有一种方法:
"éfg" < "fff" yield True。我不坚持使用比较运算符,我也可以调用函数。但是排序结果应该考虑当前区域设置。

6ioyuze2

6ioyuze21#

Hmmm不知何故,我忽略了这一点:
python https://docs.python.org/3.5/howto/sorting.html的排序文档在最后一节“赔率和结束”中提到,函数locale.strxfrm()(参见https://docs.python.org/3.5/library/locale.html#locale.strxfrm)作为排序的关键函数,而locale.strcoll()作为比较函数。
因此,除了比较函数不直接返回true / false之外,以下修改后的代码几乎是可以的,但这在我的上下文中是可以的

import subprocess

words = [
    "Abd",
    "éfg",
    "aBd",
    "aBd",
    "zzz",
    "ZZZ",
    "efg",
    "abd",
    "fff",
    "sra",
    "ssa",
    "ssb",
    "stb",
    "ßaa",
    ]

val1 = "ßaa"
val2 = "ssb"

with open("tosort", "w") as fout:
    for word in words:
        fout.write(word + "\n")

os.environ["LC_ALL"] = "en_US.UTF-8"
proc = subprocess.Popen(["sort", "tosort"], stdout=subprocess.PIPE)
sort_en_utf = proc.stdout.read().decode('utf-8').split()

os.environ["LC_ALL"] = "C"
proc = subprocess.Popen(["sort", "tosort"], stdout=subprocess.PIPE)
sort_c = proc.stdout.read().decode('utf-8').split()

locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
sort_py1 = sorted(words, key=locale.strxfrm)
print("%r < %r = %s , but locale.strcoll(%r, %r) = %s for %s"
      % (val1, val2, val1 < val2, val1, val2,
         locale.strcoll(val1, val2), locale.getlocale())
      )

locale.setlocale(locale.LC_ALL, "C")
sort_py2 = sorted(words, key=locale.strxfrm)
print("%r < %r = %s , but locale.strcoll(%r, %r) = %s for %s"
      % (val1, val2, val1 < val2, val1, val2,
         locale.strcoll(val1, val2), locale.getlocale())
      )

for row in zip(sort_en_utf, sort_py1, sort_c, sort_py2):
    print(" ".join(row))

输出将是

'ßaa' < 'ssb' = False , but locale.strcoll('ßaa', 'ssb') = -1 for ('en_US', 'UTF-8')
'ßaa' < 'ssb' = False , but locale.strcoll('ßaa', 'ssb') = 1 for (None, None)
abd abd Abd Abd
aBd aBd ZZZ ZZZ
aBd aBd aBd aBd
Abd Abd aBd aBd
efg efg abd abd
éfg éfg efg efg
fff fff fff fff
sra sra sra sra
ssa ssa ssa ssa
ßaa ßaa ssb ssb
ssb ssb stb stb
stb stb zzz zzz
zzz zzz ßaa ßaa
ZZZ ZZZ éfg éfg

相关问题