shell 通过考虑参数对URL进行排序

ff29svar 于 2023-06-30 发布在 Shell

关注(0)|答案(2)|浏览(128)

我有以下URL列表：

https://example.com/a?one=1&two=2
https://example.com/a?two=2&one=1
https://example.com/b?two=2&one=1
https://example.com/b?one=1&two=2
https://example.com/b?one=1
https://example.net/a/x?two=2&one=1
https://example.net/a/x?two=2&one=1

我想得到的结果：

https://example.com/a?one=1&two=2
https://example.com/b?one=1&two=2
https://example.com/b?one=1
https://example.net/a/x?two=2&one=1

问题是，下面的环节“雷同”;唯一的区别是参数的顺序：

https://example.com/a?one=1&two=2
https://example.com/a?two=2&one=1
https://example.com/b?two=2&one=1
https://example.com/b?one=1&two=2

是否可以通过考虑参数对URL进行排序？不确定这里应该采用哪种方法，也许有人已经有了解决方案;我只能假设使用awk可以实现。

shell

来源：https://stackoverflow.com/questions/76574631/sort-urls-by-considering-parameters

2条答案

按热度按时间

sg24os4d1#

解决方案TXR Lisp。
我将这两行添加到示例数据中，以显示它们根据URL参数首先按键排序的顺序进行排序。它们以朴素的词典顺序出现：

https://example.net/z?a=tiger&b=zebra
https://example.net/z?b=bear&a=aarvark

但是参数a被认为比b更重要，所以a=aardvark必须在a=tiger之前排序。观察到：

$ txr sort.tl  < urls 
https://example.com/a?one=1&two=2
https://example.com/b?one=1
https://example.com/b?two=2&one=1
https://example.net/a/x?two=2&one=1
https://example.net/z?b=bear&a=aarvark
https://example.net/z?a=tiger&b=zebra

代码中的方法是将URL解析成片段并从中生成对象。在TXR Lisp中，有一个叫做“相等替换”的概念，通过这个概念，我们编程一个结构对象，让它在equal函数下使用一个替代键来表示它。我们通过编写一个名为equal的单参数方法来实现这一点。该方法接受对象本身（“self”），并被调用以传递相等替换。在我们的例子中，我们有这样的：

(:method equal (me) me.key)

这句话的意思是：“为了比较我和其他东西的平等性，不要真的比较我，而是用我的key插槽代替我。
我们使用规范化表示填充url结构的key插槽：由域、路径和URL参数的排序关联列表组成的向量。
使用相等替换，我们可以uniq URL对象，sort它们，然后再次将它们打印为字符串。

(defstruct url ()
  urlstr
  prot
  domain
  path
  paramstr
  params

  ;; slot computed in :postinit
  key

  (:postinit (me)
    (set me.params [nsort me.params : car]
         me.key (vec me.domain me.path me.params)))

  (:method equal (me) me.key)

  (:method print (me stream : pretty-p)
    (if pretty-p (put-string me.urlstr stream) :)))

(defun parse-url (str)
  (match `@prot://@domain/@path?@paramstr` str
    (let ((params (collect-each ((param (spl "&" paramstr)))
                    (match `@var=@val` param
                      (cons var val)))))
      (new url
           urlstr str
           prot prot
           domain domain
           path path
           paramstr paramstr
           params params))))

(flow (get-lines)
  (mapcar parse-url)
  uniq
  sort
  tprint)

赞(0）回复(0）举报 2023-06-30

eyh26e7m2#

Perl一行程序：

<url-list perl -lpE 's|\?\K.*|join"&",sort split/&/,$&|e' | sort -u

URL（存储在文件url-list中）被Perl规范化后，sort -u对列表进行排序并丢弃重复项。

-l - chomp the line-ending
-p-打印操作后的每行输入
-E-程序如下-在每一行输入上运行
s|RE|CMD|e-将RE匹配的内容替换为执行CMD的结果
\?\K.*- regex表示“文字?之后的所有内容”。匹配存储在$&中
split RE, STRING-在regex上拆分字符串（$&）（文字&）
sort LIST-对碎片进行排序
join SEPARATOR, LIST-使用&作为分隔符将它们再次连接起来

可以用gawk完成，但可能会更长一点。
POSIX awk不太支持排序。

赞(0）回复(0）举报 2023-06-30

我来回答

shell 通过考虑参数对URL进行排序

2条答案

相关问题

热门标签

最新问答