如何在Rust中的两个模式之间获得一个子字符串?

dldeef67  于 2023-01-17  发布在  其他
关注(0)|答案(3)|浏览(115)

我想在Rust中创建一个子字符串。它从一个字符串的出现开始,在字符串的末尾减去四个字符或某个字符处结束。
我的第一个方法是

string[string.find("pattern").unwrap()..string.len()-5]

这是错误的,因为Rust的字符串是有效的UTF-8,因此是基于字节而不是基于字符的。
我的第二种方法是正确的,但过于冗长:

let start_bytes = string.find("pattern").unwrap();
   let mut char_byte_counter = 0;
   let result = line.chars()
    .skip_while(|c| {
        char_byte_counter += c.len_utf8();
        return start_bytes > char_byte_counter;
    })
    .take_while(|c| *c != '<')
    .collect::<String>();

有更简单的方法来创建子字符串吗?标准库中有没有我没有找到的部分?

6pp0gazn

6pp0gazn1#

我不记得其他语言中有哪个内置库函数能完全按照你的要求工作(给予我两个模式之间的子字符串,或者如果第二个模式不存在,给我第一个和最后一个之间的子字符串),我想你无论如何都得写一些自定义逻辑。
与“子字符串”函数最接近的等价物是切片。(正如您所发现的)它处理字节,而不是unicode字符,所以您必须小心索引。(字节)索引4,而不是3(playground)。但是您仍然可以在您的情况下使用它,因为您不是直接使用索引(而是使用find来...找到您需要的索引)
下面是如何使用切片来实现这一点(额外的好处是,您不需要重新分配其他String):

// adding some unicode to check that everything works
// also ouside of ASCII
let line = "asdfapatterndf1老虎23<12";

let start_bytes = line.find("pattern").unwrap_or(0); //index where "pattern" starts
                                                     // or beginning of line if 
                                                     // "pattern" not found
let end_bytes = line.find("<").unwrap_or(line.len()); //index where "<" is found
                                                      // or end of line

let result = &line[start_bytes..end_bytes]; //slicing line, returns patterndf1老虎23
jjhzyzn0

jjhzyzn02#

尝试使用类似以下的方法:

//Return result in &str or empty &str if not found    
fn between<'a>(source: &'a str, start: &'a str, end: &'a str) -> &'a str {
    let start_position = source.find(start);

    if start_position.is_some() {
        let start_position = start_position.unwrap() + start.len();
        let source = &source[start_position..];
        let end_position = source.find(end).unwrap_or_default();
        return &source[..end_position];
    }
    return "";
}
kqlmhetl

kqlmhetl3#

考虑到字符和字素,这个方法近似于O(n)。它可以工作,但我不确定是否有任何bug。

fn between(str: &String, start: String, end: String, limit_one:bool, ignore_case: bool) -> Vec<String> {
    let mut result:Vec<String> = vec![];
    let mut starts = start.graphemes(true);
    let mut ends = end.graphemes(true);
    
    let sc = start.graphemes(true).count();
    let ec = end.graphemes(true).count();
   
    let mut m = 0;
    let mut started:bool = false;
   
    let mut temp = String::from("");
    let mut temp2 = String::from("");
    for c in str.graphemes(true) {
        if started == false {
            
            let opt = starts.next();
            match opt {
                Some(d) => {
                    if (ignore_case && c.to_uppercase().cmp(&d.to_uppercase()) == std::cmp::Ordering::Equal) || c == d  {
                        m += 1;
                        if m == sc {
                            started = true;
                            starts = start.graphemes(true);
                        } 
                    } else {
                        m = 0;
                        starts = start.graphemes(true);
                    }
                },
                None => {
                    starts = start.graphemes(true);
                    let opt = starts.next();
                    match opt {
                        Some(e) => {
                            if (ignore_case && c.to_uppercase().cmp(&e.to_uppercase()) == std::cmp::Ordering::Equal) || c == e {
                                m += 1;
                                if m == sc {
                                    started = true;
                                    starts = start.graphemes(true);
                                }
                            }
                        }, 
                        None => {}
                    }
                } 
            }
        }
        else if started == true {
            
                let opt = ends.next();
                match opt {
                    Some(e) => {
  
                        if (ignore_case && c.to_uppercase().cmp(&e.to_uppercase()) == std::cmp::Ordering::Equal) || c == e  {
                            m += 1;
                            temp2.push_str(e);
                        }
                        else {
                            temp.push_str(&temp2.to_string());
                            temp2 = String::from("") ;
                            temp.push_str(c);
                            ends = end.graphemes(true);
                        }
                    },
                    None => {
                        ends = end.graphemes(true);
                        let opt = ends.next();
                        match opt {
                        Some(e) => {
  
                            if (ignore_case && c.to_uppercase().cmp(&e.to_uppercase()) == std::cmp::Ordering::Equal) || c == e  {
                                m += 1;
                                temp2.push_str(e);
                            }
                            else {
                                temp.push_str(&temp2.to_string());
                                temp2 = String::from("") ;
                                temp.push_str(c);
                                ends = end.graphemes(true);
                            }
                        },
                        None => {
                        }
                   } 
                    }
                }
                
                if temp2.graphemes(true).count() == end.graphemes(true).count() {
                    temp2 = String::from("") ;
                    result.push(temp);
                    if limit_one == true { return result; } 
                    started = false;
                    temp = String::from("") ;
                }   
        }
    }
    return result;
}

相关问题