在Golang中按长度拆分字符串

tktrz96b  于 2023-04-03  发布在  Go
关注(0)|答案(7)|浏览(201)

有人知道如何用Golang语言按长度拆分字符串吗?
例如,在每3个字符后拆分“helloworld”,所以它应该理想地返回一个数组“hel”“low”“orl”“d”?
另一种可能的解决方案是在每3个字符后附加一个换行符。
所有的想法都非常赞赏!

ohtdti5x

ohtdti5x1#

确保将string转换为rune切片:参见“Slice string into letters”。
for会自动将string转换为rune,因此在这种情况下,不需要额外的代码将string转换为rune

for i, r := range s {
    fmt.Printf("i%d r %c\n", i, r)
    // every 3 i, do something
}

r[n:n+3]将最好的工作与一片符文。
The index will increase by one every rune, while it mightincrease by more than one for every bytein a slice of string: "世界": i would be 0 and 3: a character (rune) can be formed of multiple bytes.
例如,考虑s := "世a界世bcd界efg世":12符文。(参见play.golang.org
如果你试图一个字节一个字节地解析它,你会错过(在一个简单的每3个字符拆分一次的实现中)一些“index modulo 3”(等于2、5、8和11),因为索引会增加超过这些值:

for i, r := range s {
    res = res + string(r)
    fmt.Printf("i %d r %c\n", i, r)
    if i > 0 && (i+1)%3 == 0 {
        fmt.Printf("=>(%d) '%v'\n", i, res)
        res = ""
    }
}

输出:

i  0 r 世
i  3 r a   <== miss i==2
i  4 r 界
i  7 r 世  <== miss i==5
i 10 r b  <== miss i==8
i 11 r c  ===============> would print '世a界世bc', not exactly '3 chars'!
i 12 r d
i 13 r 界
i 16 r e  <== miss i==14
i 17 r f  ===============> would print 'd界ef'
i 18 r g
i 19 r 世 <== miss the rest of the string

但是如果你在rune(a := []rune(s))上迭代,你会得到你所期望的,因为索引会一次增加一个rune,这样就可以很容易地聚合3个字符:

for i, r := range a {
    res = res + string(r)
    fmt.Printf("i%d r %c\n", i, r)
    if i > 0 && (i+1)%3 == 0 {
        fmt.Printf("=>(%d) '%v'\n", i, res)
        res = ""
    }
}

输出:

i 0 r 世
i 1 r a
i 2 r 界 ===============> would print '世a界'
i 3 r 世
i 4 r b
i 5 r c ===============> would print '世bc'
i 6 r d
i 7 r 界
i 8 r e ===============> would print 'd界e'
i 9 r f
i10 r g
i11 r 世 ===============> would print 'fg世'
eqqqjvef

eqqqjvef2#

这是另一个变体playground。它在速度和内存方面都比其他答案更有效。如果你想在这里运行基准测试,它们是benchmarks。一般来说,它比以前的版本快5倍,无论如何都是最快的答案。

func Chunks(s string, chunkSize int) []string {
    if len(s) == 0 {
        return nil
    }
    if chunkSize >= len(s) {
        return []string{s}
    }
    var chunks []string = make([]string, 0, (len(s)-1)/chunkSize+1)
    currentLen := 0
    currentStart := 0
    for i := range s {
        if currentLen == chunkSize {
            chunks = append(chunks, s[currentStart:i])
            currentLen = 0
            currentStart = i
        }
        currentLen++
    }
    chunks = append(chunks, s[currentStart:])
    return chunks
}

请注意,在迭代字符串时,索引指向符文的第一个字节。符文需要1到4个字节。切片也将字符串视为字节数组。
先前较慢的算法
代码在这里playground。从字节到符文,然后再到字节的转换实际上需要很多时间。所以最好使用答案顶部的快速算法。

func ChunksSlower(s string, chunkSize int) []string {
    if chunkSize >= len(s) {
        return []string{s}
    }
    var chunks []string
    chunk := make([]rune, chunkSize)
    len := 0
    for _, r := range s {
        chunk[len] = r
        len++
        if len == chunkSize {
            chunks = append(chunks, string(chunk))
            len = 0
        }
    }
    if len > 0 {
        chunks = append(chunks, string(chunk[:len]))
    }
    return chunks
}

请注意,这两个算法以不同的方式处理无效的UTF-8字符。第一个算法将它们按原样处理,第二个算法将它们替换为utf8.RuneError符号('\uFFFD'),该符号在UTF-8中具有以下十六进制表示:efbfbd .

xkftehaa

xkftehaa3#

最近还需要一个函数来完成此操作,请参阅example usage here

func SplitSubN(s string, n int) []string {
    sub := ""
    subs := []string{}

    runes := bytes.Runes([]byte(s))
    l := len(runes)
    for i, r := range runes {
        sub = sub + string(r)
        if (i + 1) % n == 0 {
            subs = append(subs, sub)
            sub = ""
        } else if (i + 1) == l {
            subs = append(subs, sub)
        }
    }

    return subs
}
toiithl6

toiithl64#

下面是另一个例子(你可以尝试here):

package main

import (
    "fmt"
    "strings"
)

func ChunkString(s string, chunkSize int) []string {
    var chunks []string
    runes := []rune(s)

    if len(runes) == 0 {
        return []string{s}
    }

    for i := 0; i < len(runes); i += chunkSize {
        nn := i + chunkSize
        if nn > len(runes) {
            nn = len(runes)
        }
        chunks = append(chunks, string(runes[i:nn]))
    }
    return chunks
}

func main() {
    fmt.Println(ChunkString("helloworld", 3))
    fmt.Println(strings.Join(ChunkString("helloworld", 3), "\n"))
}
w3nuxt5m

w3nuxt5m5#

使用regex的简单解决方案
re:= regexp.MustCompile((\S{3}))x:= re.FindAllStringSubmatch(“HelloWorld”,-1)fmt.Println(x)
https://play.golang.org/p/mfmaQlSRkHe

xu3bshqb

xu3bshqb6#

我尝试了3个版本来实现该函数,名为“splitByWidthMake”的函数最快。
这些函数忽略unicode而只忽略ascii代码。

package main

import (
    "fmt"
    "strings"
    "time"
    "math"
)

func splitByWidthMake(str string, size int) []string {
    strLength := len(str)
    splitedLength := int(math.Ceil(float64(strLength) / float64(size)))
    splited := make([]string, splitedLength)
    var start, stop int
    for i := 0; i < splitedLength; i += 1 {
        start = i * size
        stop = start + size
        if stop > strLength {
            stop = strLength
        }
        splited[i] = str[start : stop]
    }
    return splited
}


func splitByWidth(str string, size int) []string {
    strLength := len(str)
    var splited []string
    var stop int
    for i := 0; i < strLength; i += size {
        stop = i + size
        if stop > strLength {
            stop = strLength
        }
        splited = append(splited, str[i:stop])
    }
    return splited
}

func splitRecursive(str string, size int) []string {
    if len(str) <= size {
        return []string{str}
    }
    return append([]string{string(str[0:size])}, splitRecursive(str[size:], size)...)
}

func main() {
    /*
    testStrings := []string{
        "hello world",
        "",
        "1",
    }
    */

    testStrings := make([]string, 10)
    for i := range testStrings {
        testStrings[i] = strings.Repeat("#", int(math.Pow(2, float64(i))))
    }

    //fmt.Println(testStrings)

    t1 := time.Now()
    for i := range testStrings {
        _ = splitByWidthMake(testStrings[i], 2)
        //fmt.Println(t)
    }
    elapsed := time.Since(t1)
    fmt.Println("for loop version elapsed: ", elapsed)

    t1 = time.Now()
    for i := range testStrings {
        _ = splitByWidth(testStrings[i], 2)
    }
    elapsed = time.Since(t1)
    fmt.Println("for loop without make version elapsed: ", elapsed)



    t1 = time.Now()
    for i := range testStrings {
        _ = splitRecursive(testStrings[i], 2)
    }
    elapsed = time.Since(t1)
    fmt.Println("recursive version elapsed: ", elapsed)

}
ujv3wf0j

ujv3wf0j7#

不是最有效的,将适用于大多数用例。
去Playground:https://play.golang.org/p/0JSqv3OMdCR

// splitBy splits a string s by int n.
func splitBy(s string, n int) []string {
    var ss []string
    for i := 1; i < len(s); i++ {
        if i%n == 0 {
            ss = append(ss, s[:i])
            s = s[i:]
            i = 1
        }
    }
    ss = append(ss, s)
    return ss
}
// test
s := "helloworld"
ss := splitBy(s, 3)
fmt.Println(ss)
# output
$ go run main.go
[hel low orl d]

相关问题