Strtok奇怪的行为

问题描述

使用strtok函数时遇到了一些麻烦。作为练习，我必须处理文本文件，排除空白，将首字母转换为大写字母，并在一行中打印不超过20个字符。

这是我的代码的一部分：

fgets(sentence,SIZE,f1_ptr);
    char *tok_ptr = strtok(sentence," \n"); //tokenazing each line read
    tok_ptr[0] = toupper(tok_ptr[0]); //initials to capital letters

    int num = 0,i;

    while (!feof(f1_ptr)) {
        while (tok_ptr != NULL) {
            for (i = num; i < strlen(tok_ptr) + num; i++) {
                if (i % 20 == 0 && i != 0) //maximum of 20 char per line
                    fputc('\n',stdout);
                fputc(tok_ptr[i - num],stdout);
            }

            num = i;

            tok_ptr = strtok(NULL," \n");
            if (tok_ptr != NULL)
                tok_ptr[0] = toupper(tok_ptr[0]);
        }

        fgets(sentence,SIZE + 1,f1_ptr);
        tok_ptr = strtok(sentence," \n");
        if (tok_ptr != NULL)
            tok_ptr[0] = toupper(tok_ptr[0]);
    }

文字只是一小段，仅供参考：

Watch your thoughts ; they become words .
Watch your words ; they become actions .
Watch your actions ; they become habits .
Watch your habits ; they become character .
Watch your character ; it becomes your destiny .

这是我最后得到的：

WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;TheyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacteR.Wat
chYourCharacter;ItBe
comesYourDEstiny.Lao
-Tze

最终结果大部分是正确的，但是有时（例如，中的“他们”变成（并且仅在这种情况下）或“命运”）单词没有正确地被标记。因此，例如，在我进行操作之后，“ they”被分为“ t”和“ hey”，从而产生They（在另一种情况下为DEstiny）。是一些错误还是我错过了一些东西？可能我的代码效率不高，某些情况可能会变得很关键...

谢谢您的帮助，这没什么大不了的，我只是不明白为什么会发生这种行为。

解决方法

您的代码中存在大量错误，并且使问题变得更加复杂。最紧迫的错误是Why is while ( !feof (file) ) always wrong?为什么？在循环中跟踪执行路径。您尝试使用fgets()进行阅读，然后使用sentence却不知道EOF是否在致电tok_ptr = strtok(sentence," \n");之前就打电话给feof(f1_ptr) 1}} >

当您实际到达EOF时会发生什么？就是“为什么（！feof（file））总是错误的？” 相反，您总是想通过返回所使用的read函数来控制读取循环，例如while (fgets(sentence,SIZE,f1_ptr) != NULL)

您实际上需要代码做什么？

更大的问题是，为什么您要使问题strtok和数组（以及问题fgets()）变得过于复杂？考虑一下您需要做什么：

读取文件中的每个字符，
如果它是空格，请忽略它，设置字内标志false，
如果是非空格，则将单词中的第一个字符大写，将其输出，然后设置字内标志true并增加输出到当前行的字符数，最后
如果输出的是第20个字符，请输出换行符并将计数器重置为零。

您需要从C工具箱中获取的最低工具是fgetc()中的isspace()，toupper()和ctype.h，这是一个输出字符数的计数器，以及一个标志，以了解该字符是否是空格之后的第一个非空白字符。

实施逻辑

这使问题非常简单。读取一个字符，是否为空格？，设置您的单词标记false，否则，如果您的单词标记为false，将其大写，输出字符，设置您的单词标记{{ 1}}，增加您的字数。您需要做的最后一件事是检查字符数是否已达到限制，如果输出，则输出true并将字符数重置为零。重复直到字符用完。

您可以将其转换为类似于以下内容的代码：

'\n'

使用/输出示例

鉴于您的输入文件存储在#include <stdio.h> #include <ctype.h> #define CPL 20 /* chars per-line,if you need a constant,#define one (or more) */ int main (int argc,char **argv) { int c,in = 0,n = 0; /* char,in-word flag,no. of chars output in line */ /* use filename provided as 1st argument (stdin by default) */ FILE *fp = argc > 1 ? fopen (argv[1],"r") : stdin; if (!fp) { /* validate file open for reading */ perror ("file open failed"); return 1; } while ((c = fgetc(fp)) != EOF) { /* read / validate each char in file */ if (isspace(c)) /* char is whitespace? */ in = 0; /* set in-word flag false */ else { /* otherwise,not whitespace */ putchar (in ? c : toupper(c)); /* output char,capitalize 1st in word */ in = 1; /* set in-word flag true */ n++; /* increment character count */ } if (n == CPL) { /* CPL limit reached? */ putchar ('\n'); /* output newline */ n = 0; /* reset cpl counter */ } } putchar ('\n'); /* tidy up with newline */ if (fp != stdin) /* close file if not stdin */ fclose (fp); }的计算机中，您可以使用以下方法生成所需的输出：

dat/text220.txt

（代码的可执行文件已编译为$ ./bin/text220 dat/text220.txt WatchYourThoughts;Th eyBecomeWords.WatchY ourWords;TheyBecomeA ctions.WatchYourActi ons;TheyBecomeHabits .WatchYourHabits;The yBecomeCharacter.Wat chYourCharacter;ItBe comesYourDestiny.，我通常将bin/text220，dat和obj目录分开存放，以保留数据，目标文件和可执行文件按源代码目录清除）

注意：：默认情况下，如果没有提供文件名作为程序的第一个参数，则从bin进行读取，您可以使用程序直接读取输入，例如

stdin

不需要花哨的字符串函数，只需一个循环，一个字符，一个标志和一个计数器-其余的只是算术运算。总是值得尝试将编程问题归结为基本步骤，然后环顾C-toolbox并为每个基本步骤找到合适的工具。

使用strtok

不要误会我的意思，使用$ echo "my dog has fleas - bummer!" | ./bin/text220 MyDogHasFleas-Bummer !并没有错，在这种情况下，它提供了一个非常简单的解决方案-我的意思是，对于简单的面向字符的字符串处理，通常只是简单地循环一行中的字符。将strtok与数组和fgets()结合使用不会带来任何效率，从文件中读取的数据已放入strtok() ¹的缓冲区中。 / p>

如果您确实想使用BUFSIZ，则应控制strtok()的返回值来读取循环，然后可以使用fgets()进行标记化，并在每个点上检查返回值。具有strtok()的读取循环和具有fgets()的令牌化循环。然后处理首字符大写，然后将输出限制为每行20个字符。

您可以执行以下操作：

strtok()

（相同的输出）

#include <stdio.h> #include <string.h> #include <ctype.h> #define CPL 20 /* chars per-line,#define one (or more) */ #define MAXC 1024 #define DELIM " \t\r\n" void putcharCPL (int c,int *n) { if (*n == CPL) { /* if n == limit */ putchar ('\n'); /* output '\n' */ *n = 0; /* reset value at mem address 0 */ } putchar (c); /* output character */ (*n)++; /* increment value at mem address */ } int main (int argc,char **argv) { char line[MAXC]; /* buffer to hold each line */ int n = 0; /* no. of chars ouput in line */ /* use filename provided as 1st argument (stdin by default) */ FILE *fp = argc > 1 ? fopen (argv[1],"r") : stdin; if (!fp) { /* validate file open for reading */ perror ("file open failed"); return 1; } while (fgets (line,MAXC,fp)) /* read each line and tokenize line */ for (char *tok = strtok (line,DELIM); tok; tok = strtok (NULL,DELIM)) { putcharCPL (toupper(*tok),&n); /* convert 1st char to upper */ for (int i = 1; tok[i]; i++) /* output rest unchanged */ putcharCPL (tok[i],&n); } putchar ('\n'); /* tidy up with newline */ if (fp != stdin) /* close file if not stdin */ fclose (fp); }函数只是一个帮助程序，它检查是否已输出putcharCPL()个字符，如果输出，则输出一个20并重置计数器。然后，它输出当前字符并将计数器加1。传递了指向计数器的指针，以便可以在函数中对其进行更新，从而使更新后的值可以在'\n'中使用。

仔细检查一下，如果还有其他问题，请告诉我。

脚注：

1。。根据您所用的gcc版本，源设置读取缓冲区大小的常数可能为main()。 _IO_BUFSIZ在此处更改为_IO_BUFSIZ：glibc commit 9964a14579e5eef9对于Linux BUFSIZ被定义为BUFSIZE（在Windows上是8192）。

从专业的角度来看，这实际上是一个有趣的操作，而不是某些评论所建议的，尽管问题的“新手”方面有时可能会引起相当深刻的，被低估的问题。
有趣的是，在我的平台（W10，MSYS2，gcc v.10.2）上，您的代码可以正常运行并具有正确的结果：

WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;TheyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacter.Wat
chYourCharacter;ItBe
comesYourDestiny.

首先，恭喜，新手：您的编码还不错。
这说明了不同的编译器如何或可能无法防止有限的不适当编码或规范滥用，也可能会或可能不会保护堆栈或堆。
这就是说，@ Andrew Henle的评论指向有关feof的一个有启发性的答案很有意义。
如果您遵循它并检索feof测试，只需将其移至之后读取检查，然后移至不之前（如下所示）。您的代码应该会产生更好的结果（注意：我将只对您的代码进行最小程度的更改，故意忽略较小的问题）：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <ctype.h>
#define SIZE 100 // add some leeway to avoid off-by-one issues

int main()
{
    FILE* f1_ptr = fopen("C:\\Users\\Public\\Dev\\test_strtok","r");
    if (! f1_ptr)
    {
             perror("Open issue");
             exit(EXIT_FAILURE);
    }

char sentence[SIZE] = {0};

if (NULL == fgets(sentence,f1_ptr))
{
     perror("fgets issue"); // implementation-dependent
     exit(EXIT_FAILURE);
}
errno = 0;
char *tok_ptr = strtok(sentence," \n"); //tokenizing each line read

if (tok_ptr == NULL || errno)
{
     perror("first strtok parse issue");
     exit(EXIT_FAILURE);
}

tok_ptr[0] = toupper(tok_ptr[0]); //initials to capital letters

int num = 0;
size_t i = 0;

while (1) {
    while (1) {
        for (i = num; i < strlen(tok_ptr) + num; i++) {
            if (i % 20 == 0 && i != 0) //maximum of 20 char per line
                fputc('\n',stdout);
            fputc(tok_ptr[i - num],stdout);
        }

        num = i;

        tok_ptr = strtok(NULL," \n");
        if (tok_ptr == NULL) break;
        tok_ptr[0] = toupper(tok_ptr[0]);
    }

    if (NULL == fgets(sentence,f1_ptr)) // let's get away whith annoying +1,// we have enough headroom
    {
       if (feof(f1_ptr))
       {
           fprintf(stderr,"\n%s\n","Found EOF");
           break;
       }
       else
       {
           perror("Unexpected fgets issue in loop"); // implementation-dependent
           exit(EXIT_FAILURE);
       }
    }

    errno = 0;
    tok_ptr = strtok(sentence," \n");
    if (tok_ptr == NULL)
    {
        if (errno)
        {
          perror("strtok issue in loop");
          exit(EXIT_FAILURE);
        }

        break;
    }

    tok_ptr[0] = toupper(tok_ptr[0]);
}

return 0;

}

$ ./test

WatchYourThoughts;Th
eyBecomeWords.WatchY
ourWords;TheyBecomeA
ctions.WatchYourActi
ons;TheyBecomeHabits
.WatchYourHabits;The
yBecomeCharacter.Wat
chYourCharacter;ItBe
comesYourDestiny.
Found EOF

c strtok