如何避免 TypeScript 模板文字类型推断中的歧义？

问题描述

我正在尝试编写一种类型来验证给定的输入字符串是否具有由 1 个或多个空格字符分隔的有效类名。输入也可能有前导或尾随空格。

我现在的类型已经很接近了，但是模板字面量可以被 TS 编译器以多种方式推断出来，这意味着语法有歧义。这会导致不必要的结果。

首先我们定义原始类型：

// To avoid recursion as much as possible
type Spaces = (
  | "     "
  | "    "
  | "   "
  | "  "
  | " "
);
type Whitespace = Spaces | "\n" | "\t";
type ValidClass = 'a-class' | 'b-class' | 'c-class';

然后是实用程序类型

// Utility type to provide nicer error messages
type Err<Message extends string> = `Error: ${Message}`;

type TrimEnd<T extends string> = (
  T extends `${infer Rest}${Whitespace}`
  ? TrimEnd<Rest>
  : T
);
type TrimStart<T extends string> = (
  T extends `${Whitespace}${infer Rest}`
  ? TrimStart<Rest>
  : T
);
type Trim<T extends string> = TrimEnd<TrimStart<T>>;

最后是检查输入字符串的实际类型：

// Forces the string to be trimmed before starting recursive loop.
type SplitToValidClasses<T extends string> = SplitToValidClassesInner<Trim<T>>;

// Splits the input string into array of `Array<Token | 'Error: ...'>`
// strings. The input is converted to an array format mostly because I found it
// easier to work with arrays in other TS generics,instead of e.g space separated
// values.
type SplitToValidClassesInner<T extends string> =
  // Does `T` contain more than one string? For example 'aaaa\n\n  bbbb'
  T extends `${infer Head}${Whitespace}${infer Tail}`
    // Yes,`T` could be infered into three parts.
    // Is `Head` a valid class name?
    ? Trim<Head> extends ValidClass
        // Yes,it's a valid name. Continue recursively with rest of the string
        // but trim white space from both sides.
        ? [Trim<Head>,...SplitToValidClassesInner<Trim<Tail>>]
        : [Err<`'${Head}' is not a valid class`>]
    : T extends `${infer Tail}`
      ? Tail extends ValidClass
        ? [Tail]
        : [Err<`'${Tail}' is not a valid class`>]
      : [never];

// This works
type CorrectResult = SplitToValidClasses<'a-class b-class c-class'>

但是当使用不同的输入进行测试时，我们会注意到不正确的结果：

// Should be ["a-class","b-class","c-class"]
type Input1 = `a-class b-class  c-class`;
type Result = SplitToValidClasses<Input1>;

// Should be ["a-class","c-class","a-class"]
type Result2 = SplitToValidClasses<`

  a-class    b-class
c-class

    a-class
`>;

// Should be ["a-class","Error: 'wrong-class' is not a valid class"]
type Result3 = SplitToValidClasses<`
  a-class
  wrong-class
  c-class
`>;

问题发生在模板推理中：

type SplitToValidClassesInnerFirstLevelDebug<T extends string> =
  T extends `${infer Head}${Whitespace}${infer Tail}`
    ? [Head,Whitespace,Tail]
    : never

// The grammar is ambiguous,leading to 
// "["a-class b-class" | "a-class","c-class" | "b-class  c-class"]
// Removing the ambiguousity should fix the issue
type Result4 = SplitToValidClassesInnerFirstLevelDebug<Input1>

Playground link

除了 Anders Hejlsberg 在 his PR 中解释的内容之外，我找不到很多关于如何推断模板文字的细节的文档：

为了使推理成功，目标的开始和结束文字跨度（如果有）必须与源的开始和结束跨度完全匹配。通过从左到右将每个占位符与源中的子字符串匹配来进行推理：通过从源中推断零个或多个字符来匹配后跟文字字符跨度的占位符，直到该文字字符跨度在源中第一次出现。通过从源中推断单个字符来匹配紧跟另一个占位符的占位符。

如何实现这种打字，而不会产生模棱两可的结果？我想到的一种方法是逐个字符地递归解析输入，但它很快就达到了 TS 中的递归限制。

解决方法

我想出了两种解决方案，但都不能解决最初的问题，因为类型变得过于复杂或递归。第二种解决方案肯定比第一种更具可扩展性。

方案一：递归解析

此解决方案递归解析输入字符串。 type Split 按空格分割输入字符串并返回标记（或单词）数组。

type EndOfInput = '';

// Validates given `UnprocessedInput` input string
// It recursively iterates through each character in the string,// and appends characters into the second type parameter `Current` until the
// token has been consumed. When the token is fully consumed,it is added to 
// `Result` and `Current` memory is cleared.
//
// NOTE: Do not pass anything else than the first type parameter. Other type
//       parameters are for internal tracking during recursive loop
//
// See https://github.com/microsoft/TypeScript/pull/40336 for more template literal
// examples.
type Split<UnprocessedInput extends string,Current extends string = '',Result extends string[] = []> =
  // Have we reached to the end of the input string ?
  UnprocessedInput extends EndOfInput
    // Yes. Is the `Current` empty?
    ? Current extends EndOfInput
      // Yes,we're at the end of processing and no need to add new items to result
      ? Result
      // No,add the last item to results,and return result
      : [...Result,Current]
    // No,use template literal inference to get first char,and the rest of the string
    : UnprocessedInput extends `${infer Head}${infer Rest}`
      // Is the next character whitespace?
      ? Head extends Whitespace
        // No,and is the `Current` empty?
        ? Current extends EndOfInput
          // Yes,continue "eating" whitespace
          ? Split<Rest,Current,Result>
          // No,it means we went from a token to whitespace,meaning the token
          // is fully parsed and can be added to the result
          : Split<Rest,'',[...Result,Current]>
        // No,add the character to Current 
        : Split<Rest,`${Current}${Head}`,Result>
      // This shouldn't happen since UnprocessedInput is restricted with
      // `extends string` type narrowing.
      // For example ValidCssClassName<null> would be a `never` type if it didn't
      // already fail to "Type 'null' does not satisfy the constraint 'string'"
      : [never]

这适用于较小的输入，但不适用于较大的字符串，因为 TS 递归限制：

type Result5 = Split<`
a   


b 

c`>

// Fails for larger string values,because of recursion limit
type Result6 = Split<`aaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbbb`

Playground link

解决方案 2：将类称为令牌

由于我们实际上将有效的类名作为字符串联合，我们可以将其用作模板文字类型的一部分来使用整个类名。

为了理解这个解决方案，让我们从部分构建它。首先让我们在模板文字中使用 ValidClass：

type SplitDebug1<T extends string> =
  T extends `${ValidClass}${Whitespace}${infer Tail}`
  ? [ValidClass,Whitespace,Tail]
  : never

// The grammar is not ambiguous anymore!
// [ValidClass,"b-class c-class"]
type Result1 = SplitDebug1<"a-class b-class c-class">

这解决了歧义问题，但现在我们不能再访问解析的 Head，因为 ValidClass 只是指类型 type ValidClass = "a-class" | "b-class" | "c-class"。不幸的是，TypeScript 不允许同时推断和限制令牌，因此这是不可能的：

type SplitDebug2<T extends string> =
  T extends `${infer Head extends ValidClass ? infer Head : never}${Whitespace}${infer Tail}`
  ? [Head,Tail]
  : never

// Still just [ValidClass,"b-class c-class"]
type Result2 = SplitDebug1<"a-class b-class c-class">

但是黑客来了。我们可以使用已知的 Tail 作为反转匹配以访问 Head 的一种方式：

type SplitDebug3<T extends string> =
  T extends `${ValidClass}${Whitespace}${infer Tail}`
    ? T extends `${infer Head}${Whitespace}${Tail}` 
      ? [Head,Tail]
      : never
    : never

// Now we now the first valid token aka class name!
// ["a-class","b-class c-class"]
type Result3 = SplitDebug3<"a-class b-class c-class">

这个技巧可以用来解析有效的类名，完整的解决方法：


// Demonstrating with large amount of class names
// Breaks to "too complex union type" with 20k class names
type Digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';
type ValidClass1000 = `class-${Digit}${Digit}${Digit}`;

type SplitToValidClasses<T extends string> = SplitToValidClassesInner<Trim<T>>;
type SplitToValidClassesInner<T extends string> =
  T extends `${ValidClass1000}${Whitespace}${infer Tail}`
    ? T extends `${infer Head}${Whitespace}${Tail}` 
      ? Trim<Head> extends ValidClass1000
          ? [Trim<Head>,...SplitToValidClassesInner<Trim<Tail>>]
          : [Err<`'${Head}' is not a valid class`>]
      : never
    : T extends `${infer Tail}`
      ? Tail extends ValidClass1000
        ? [Tail]
        : [Err<`'${Tail}' is not a valid class`>]
      : [never];

// ["class-001","class-002","class-003","class-004","class-000"]
type Result4 = SplitToValidClasses<`

    class-001 class-002 
  class-003
      class-004 class-000

  `>

Playground link

这是我能想到的最佳解决方案，也适用于相当大的联合类型。错误信息可以改进，但它仍然提示正确的位置。

虽然支持联合类型中的大量选择，但对于我们在单个类型联合中拥有约 40k Tailwind 类名称的实际用例，这并不适用。该类型表示在开发期间可能添加的所有可能的类名（未使用的在生产中被清除）。

ambiguous template-literals typescript

如何避免 TypeScript 模板文字类型推断中的歧义？

问题描述

解决方法

方案一：递归解析

解决方案 2：将类称为令牌

相关问答