我之前发布了一个(糟糕的格式化和乏善可陈的)问题,询问如何将数组作为输入参数传递并返回修改后的数组。经过一番折腾之后,我发现这个函数对于单行输入很好用,但是当文件包含多行并由换行符分隔时就开始有问题了。
我感谢任何关于如何改进我的代码(和帖子)的建议。谢谢
到目前为止的代码:
#include <stdio.h>
#include <string.h>
#define MAX_LINE_LEN 1000
const char delimiter[] = " \t\r\n\v\f";
void tokenize(char *string, char *ret[MAX_LINE_LEN]) {
char *ptr;
ptr = strtok(string, delimiter);
int i = 0;
while (ptr != NULL) {
ret[i] = ptr;
i++;
ptr = strtok(NULL, delimiter);
}
}
int main(void) {
char line[MAX_LINE_LEN];
static char *temparr[MAX_LINE_LEN] = {0};
while (fgets(line, sizeof(line), stdin)) {
tokenize(line, temparr);
}
int i = 0;
while (temparr[i]) {
printf("%s\n", temparr[i]);
i++;
}
}
输入:
There was nothing else to do.
The deed had already been done and there was no going back.
It now had been become a question of how they were going to be able to get out of this situation and escape.
输出似乎是正确的:
There
was
nothing
else
to
do.
The
deed
had
already
been
done
and
there
was
no
going
back.
It
now
had
been
become
a
question
of
how
they
were
going
to
be
able
to
get
out
of
this
situation
and
escape.
但是当每一行都被换行符分隔时:
There was nothing else to do.
The deed had already been done and there was no going back.
It now had been become a question of how they were going to be able to get out of this situation and escape.
它只返回最后一行的标记化数组:
It
now
had
been
become
a
question
of
how
they
were
going
to
be
able
to
get
out
of
this
situation
and
escape.
当最后一行很短时:
There was nothing else to do.
The deed had already been done and there was no going back.
It now had been become a question of how they were going to be able to
get out of this situation and escape.
返回数组是:
get
out
of
this
situation
and
escape.
pe.
they
were
going
to
be
able
to
我假设我在循环fgets()函数时出错了,但我不确定为什么或如何继续获得第一个输出。我试过将“\n”作为分隔符之一,但它似乎没有任何作用。我还被告知strtok()是不安全的(不是线程安全的,修改原始字符串,...)。我不知道这是怎么回事,但还有别的选择吗?
(Test第一段摘自https://randomwordgenerator.com/paragraph.php)
2条答案
按热度按时间vfhzx4xs1#
存在多个问题:
MAX_LINE_LEN
,这有点令人困惑。使用一个更有说服力的名称,如MAX_TOKEN_COUNT
。类似地,如果MAX_LINE_LEN
是最大行长度,则输入缓冲区应该至少有2个额外的字节用于换行符和空终止符。以下是修改后的版本:
5uzkadbs2#
不能对多行输入使用同一个缓冲区。缓冲区将被每一个新的输入行覆盖,使以前分配的指针无效。您需要定义一个有限的缓冲区集合(有限),或者使用“堆存储”。
对C语言不熟悉?也许对“动态内存”的概念不熟悉。
其目标似乎是“缓冲”来自
stdin
的输入,直到达到最大值。您可以简单地累积所有输入并仅在最后进行 tokenise:这应该是不言自明的。如果有什么不明白的,请在下面的评论中提问。
“* 其他选择是什么?* ”
strtok()
适合这样的简单应用程序。它的弟弟strtok_r()
在内部不维护标记化的状态。strtok_r()
应该在多线程应用程序中使用,或者当你想对字符串进行“子标记化”时使用。例如,您可以对'\n'进行标记,以便从缓冲区中一次提取一个句子,然后使用其他空格字符对该句子中的每个单词进行标记。另一种选择是开发自己的
strtok()
版本,可能使用strspn()
和strcspn()
来快速定位字符串。这样做可以让您决定是否要像strtok()
那样 clobber 分隔符。注意:
strtok()
将连续的分隔符视为单个示例。它不会返回指向空字符串的指针。“a,B,c”将被视为“a,B,c”。作业:重构这个函数,将“输入”、“标记化”和“打印”分离到它们自己的函数中,并使用适当的参数和返回值。