c中字符串字符的唯一性

问题描述

有没有什么高效简单的方法来检查c中字符串字符的唯一性？就我而言，我必须检查用户输入的 ISBN 是否具有唯一字符且长度为 13 个字符。

解决方法

鉴于字符串应该只有 13 个字节，如果不是比更复杂的方法更快，并且更容易编写和证明检查，那么暴力方法似乎足够快。

这是一个例子：

#include <ctype.h>
#include <stdio.h>
#include <string.h>

int check_ISBN(const char *p) {
    if (strlen(p) != 13)
        return -1;
    while (*p) {
        unsigned char c = *p++;
        // characters must be digits or lowercase letters
        if (!isdigit(c) && !islower(c))
            return 2;
        // characters must not be duplicated
        if (strchr(p,c))
            return 1;
    }
    return 0;
}

int main(int argc,char *argv[]) {
    int status = 0;
    if (argc < 2) {
        fprintf(stderr,"Usage: %s ISBN ...\n",argv[0]);
        return 2;
    }
    for (int i = 1; i < argc; i++) {
        switch (check_ISBN(argv[i])) {
          case 0:
            printf("%s: valid ISBN.\n",argv[i]);
            break;
          case 1:
            printf("%s: invalid ISBN: duplicate digit.\n",argv[i]);
            status = 1;
            break;
          case 2:
            printf("%s: invalid ISBN: invalid character.\n",argv[i]);
            status = 1;
            break;
          default:
            printf("%s: invalid ISBN: bad character count.\n",argv[i]);
            status = 1;
            break;
        }
    }
    return status;
}

只需遍历字符串，在查找表中记录每个看到的字符。如果您看到重复，请中止。如果长度不是 13 则失败。例如：

id|at|cpu_values,cpu_core
1 | 2019-01-01-00:00|1|0
2 | 2019-01-01-00:01|1|0
3 | 2019-01-01-00:02|4|0
4 | 2019-01-01-00:03|1|0
5 | 2019-01-01-00:04|1|0
6 | 2019-01-01-00:05|1|0
7 | 2019-01-01-00:06|1|0
8 | 2019-01-01-00:07|1|0
9 | 2019-01-01-00:08|6|0
10 | 2019-01-01-00:00|1|1
11 | 2019-01-01-00:01|1|1
12| 2019-01-01-00:02|4|1
13 | 2019-01-01-00:03|1|1
14 | 2019-01-01-00:04|1|1
15 | 2019-01-01-00:05|1|1
16 | 2019-01-01-00:06|1|1
17 | 2019-01-01-00:07|1|1
18 | 2019-01-01-00:08|6|1

要添加到@WilliamPursell 答案中，这里有一种使用位掩码代替字符数组查找表的方法：

#include <stdio.h>
// Check if this is a ISBN and its 13 digits uniqueness.
// Parameters:
//    ISBN,ASCIIZ
// Returns: [int]
//     -1 :  Not a ISBN
//      0 :  ISBN with only unique digits
//      1 :  ISBN with duplicated digits
int isISBNDigitsUnique(char *pStr) {
    if (pStr == NULL) return -1;
    unsigned long long mask = 0,refmask; 
    unsigned char bit;
    int expected_length = 0,ret = 0;    
    while((bit = *pStr++) && expected_length++ < 14) {
      if (bit < '0' || bit > '9' && bit < 'A') return -1;
      if ((bit &= 0xDFu) > 'Z') return -1;
      if (ret==0) {
        refmask = mask; 
        if ((bit -= 0x10u) > 0x30u) bit -= 0x20u; 
        if ((mask |= (0x01u << bit)) == refmask) ret = 1;
      }
    }
    if (expected_length != 13) return -1;
    return ret;
}
int main(int nbargs,char *args[]) {
    if (nbargs != 2) {
       printf("Usage : %s ISBN\n",args[0]);
       return 1;
    }
    switch (isISBNDigitsUnique(args[1])) {
      case 1:
        printf("duplicate digit!\n");
        break;
      case 0:
        printf("all digits are unique.\n");
        break;
      default:
        printf("%s is not an ISBN.\n",args[1]);
   }
}

应用于每个数字的布尔逻辑：

我们认为 0-9、a-z 和 A-Z 是 ISBN 的唯一有效数字。因为我们有一个 ISBN 的 ASCIIZ 缓冲区，所以我们可以期望每个数字数字 ASCII 码的以下二进制值。

0011 0000 to 0011 1001 => digits 0-9
0100 0001 to 0101 1010 => digits A-Z
0110 0001 to 0111 1010 => digits a-z

我们首先检查数字是否满足以下要求：

如果数字代码低于 48，则它不是有效数字。
如果数字代码大于 57 且小于 65，则它不是有效的数字。

我们通过将第 7 位归零来减少大小写，因此是一个带有 0xDF 的 AND 掩码。然后我们有以下数字范围的值：

0001 0000 to 0001 1001 => digits 0-9
0100 0001 to 0101 1010 => digits A-Z
0100 0001 to 0101 1010 => digits a-z

如果数字代码大于 90，则它不是有效数字。

我们将数字代码减去 0x10，目标是为每个值拟合一个 64 位掩码。然后我们有以下数字范围的值：

0000 0000 to  0000 1001 => digits 0-9,decimal values 0-9
0011 0001 to  0100 1010 => digits A-Z,decimal values 49-74
0011 0001 to  0100 1010 => digits a-z,decimal values 49-74

我们可以将第 6 位归零，因为第 5 位足以将 0-9 与 A-Z 区分开来，但由于我们的结果范围将超过 64，如果数字代码大于 48，则减去 0x20 会更有效。然后我们有以下数字范围的值：

0000 0000 to 0000 1001 0-9 => digits 0-9,decimal values 0-9
0001 0001 to 0010 1010 A-Z => digits A-Z,decimal values 17-42
0001 0001 to 0010 1010 a-z => digits a-z,decimal values 17-42

我们的数字代码值范围现在是 0-9 和 17-42，这很容易适合 64 位掩码。所以我们只需要为每个数字标记每一位，如果一个数字代码没有改变掩码，那么我们就有了一个重复的数字。

P.S.：@interjay 对于数学方法，我希望检查数字总和和数字乘积，但这需要数学演示....

c char char loops string string substring substring