使用 CloudFormation 创建的 AWS EKS 托管节点组失败

问题描述

我使用 CloudFormation 创建了 VPC 和 EKS 集群,当我尝试通过 CloudFormation 创建 AWS 托管节点组时,它无法创建并显示错误消息:Nodegroup test-ng Failed to stabilize: [{Code: NodeCreationFailure,Message: Unhealthy nodes in the kubernetes cluster。我无法确定确切的问题,但我的设置基于 AWS 文档。作为参考,我想要一个具有 3 个公共子网和 3 个私有子网的 VPC,托管节点组部署到私有子网。以下是我用来部署所有内容的模板:

VPC 模板:

---
AWstemplateFormatVersion: '2010-09-09'
Description: 'EKS VPC - Private and Public subnets'

Parameters:

  VpcName:
    Type: String
    Default: EKS-VPC
    Description: The name of the VPC

  VpcBlock:
    Type: String
    Default: 10.0.0.0/16
    Description: The CIDR range for the VPC. This should be a valid private (RFC 1918) CIDR range.

  Privatesubnet01Block:
    Type: String
    Default: 10.0.0.0/19
    Description: CidrBlock for private subnet 01 within the VPC

  Privatesubnet02Block:
    Type: String
    Default: 10.0.32.0/19
    Description: CidrBlock for private subnet 02 within the VPC
  
  Privatesubnet03Block:
    Type: String
    Default: 10.0.64.0/19
    Description: CidrBlock for private subnet 03 within the VPC

  Publicsubnet01Block:
    Type: String
    Default: 10.0.128.0/20
    Description: CidrBlock for public subnet 01 within the VPC

  Publicsubnet02Block:
    Type: String
    Default: 10.0.144.0/20
    Description: CidrBlock for public subnet 02 within the VPC

  Publicsubnet03Block:
    Type: String
    Default: 10.0.160.0/20
    Description: CidrBlock for public subnet 02 within the VPC

Metadata:
  AWS::CloudFormation::Interface:
    ParameterGroups:
      -
        Label:
          default: "Main"
        Parameters:
          - VpcName
      -
        Label:
          default: "Network Configuration"
        Parameters:
          - VpcBlock
          - Publicsubnet01Block
          - Publicsubnet02Block
          - Publicsubnet03Block
          - Privatesubnet01Block
          - Privatesubnet02Block
          - Privatesubnet03Block

Resources:
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock:  !Ref VpcBlock
      EnablednsSupport: true
      EnablednsHostnames: true
      Tags:
      - Key: Name
        Value: !Ref VpcName

  InternetGateway:
    Type: "AWS::EC2::InternetGateway"

  VPCGatewayAttachment:
    Type: "AWS::EC2::VPCGatewayAttachment"
    Properties:
      InternetGatewayId: !Ref InternetGateway
      VpcId: !Ref VPC

  PublicRouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
      - Key: Name
        Value: Public subnets RT
      - Key: Network
        Value: Public

  PrivateRouteTable01:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
      - Key: Name
        Value: Private subnet 01 RT
      - Key: Network
        Value: Private

  PrivateRouteTable02:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
      - Key: Name
        Value: Private subnet 02 RT
      - Key: Network
        Value: Private

  PrivateRouteTable03:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
      - Key: Name
        Value: Private subnet 03 RT
      - Key: Network
        Value: Private

  PublicRoute:
    DependsOn: VPCGatewayAttachment
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref PublicRouteTable
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId: !Ref InternetGateway

  PrivateRoute01:
    DependsOn:
    - VPCGatewayAttachment
    - NatGateway01
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref PrivateRouteTable01
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId: !Ref NatGateway01

  PrivateRoute02:
    DependsOn:
    - VPCGatewayAttachment
    - NatGateway02
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref PrivateRouteTable02
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId: !Ref NatGateway02

  PrivateRoute03:
    DependsOn:
    - VPCGatewayAttachment
    - NatGateway03
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref PrivateRouteTable03
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId: !Ref NatGateway03

  NatGateway01:
    DependsOn:
    - NatGatewayEIP1
    - Publicsubnet01
    - VPCGatewayAttachment
    Type: AWS::EC2::NatGateway
    Properties:
      AllocationId: !GetAtt 'NatGatewayEIP1.AllocationId'
      subnetId: !Ref Publicsubnet01
      Tags:
      - Key: Name
        Value: !Sub '${VpcName}-NatGateway01'

  NatGateway02:
    DependsOn:
    - NatGatewayEIP2
    - Publicsubnet02
    - VPCGatewayAttachment
    Type: AWS::EC2::NatGateway
    Properties:
      AllocationId: !GetAtt 'NatGatewayEIP2.AllocationId'
      subnetId: !Ref Publicsubnet02
      Tags:
      - Key: Name
        Value: !Sub '${VpcName}-NatGateway02'

  NatGateway03:
    DependsOn:
    - NatGatewayEIP3
    - Publicsubnet03
    - VPCGatewayAttachment
    Type: AWS::EC2::NatGateway
    Properties:
      AllocationId: !GetAtt 'NatGatewayEIP3.AllocationId'
      subnetId: !Ref Publicsubnet03
      Tags:
      - Key: Name
        Value: !Sub '${VpcName}-NatGateway03'

  NatGatewayEIP1:
    DependsOn:
    - VPCGatewayAttachment
    Type: 'AWS::EC2::EIP'
    Properties:
      Domain: vpc

  NatGatewayEIP2:
    DependsOn:
    - VPCGatewayAttachment
    Type: 'AWS::EC2::EIP'
    Properties:
      Domain: vpc

  NatGatewayEIP3:
    DependsOn:
    - VPCGatewayAttachment
    Type: 'AWS::EC2::EIP'
    Properties:
      Domain: vpc

  Publicsubnet01:
    Type: AWS::EC2::subnet
    Metadata:
      Comment: Public subnet 01
    Properties:
      MapPublicIpOnLaunch: true
      AvailabilityZone:
        Fn::Select:
        - '0'
        - Fn::GetAZs:
            Ref: AWS::Region
      CidrBlock:
        Ref: Publicsubnet01Block
      VpcId:
        Ref: VPC
      Tags:
      - Key: Name
        Value: !Sub "${VpcName}-Publicsubnet01"
      - Key: kubernetes.io/role/elb
        Value: 1

  Publicsubnet02:
    Type: AWS::EC2::subnet
    Metadata:
      Comment: Public subnet 02
    Properties:
      MapPublicIpOnLaunch: true
      AvailabilityZone:
        Fn::Select:
        - '1'
        - Fn::GetAZs:
            Ref: AWS::Region
      CidrBlock:
        Ref: Publicsubnet02Block
      VpcId:
        Ref: VPC
      Tags:
      - Key: Name
        Value: !Sub "${VpcName}-Publicsubnet02"
      - Key: kubernetes.io/role/elb
        Value: 1

  Publicsubnet03:
    Type: AWS::EC2::subnet
    Metadata:
      Comment: Public subnet 03
    Properties:
      MapPublicIpOnLaunch: true
      AvailabilityZone:
        Fn::Select:
        - '2'
        - Fn::GetAZs:
            Ref: AWS::Region
      CidrBlock:
        Ref: Publicsubnet03Block
      VpcId:
        Ref: VPC
      Tags:
      - Key: Name
        Value: !Sub "${VpcName}-Publicsubnet03"
      - Key: kubernetes.io/role/elb
        Value: 1

  Privatesubnet01:
    Type: AWS::EC2::subnet
    Metadata:
      Comment: Private subnet 01
    Properties:
      AvailabilityZone:
        Fn::Select:
        - '0'
        - Fn::GetAZs:
            Ref: AWS::Region
      CidrBlock:
        Ref: Privatesubnet01Block
      VpcId:
        Ref: VPC
      Tags:
      - Key: Name
        Value: !Sub "${VpcName}-Privatesubnet01"
      - Key: kubernetes.io/role/internal-elb
        Value: 1

  Privatesubnet02:
    Type: AWS::EC2::subnet
    Metadata:
      Comment: Private subnet 02
    Properties:
      AvailabilityZone:
        Fn::Select:
        - '1'
        - Fn::GetAZs:
            Ref: AWS::Region
      CidrBlock:
        Ref: Privatesubnet02Block
      VpcId:
        Ref: VPC
      Tags:
      - Key: Name
        Value: !Sub "${VpcName}-Privatesubnet02"
      - Key: kubernetes.io/role/internal-elb
        Value: 1
  
  Privatesubnet03:
    Type: AWS::EC2::subnet
    Metadata:
      Comment: Private subnet 03
    Properties:
      AvailabilityZone:
        Fn::Select:
        - '2'
        - Fn::GetAZs:
            Ref: AWS::Region
      CidrBlock:
        Ref: Privatesubnet03Block
      VpcId:
        Ref: VPC
      Tags:
      - Key: Name
        Value: !Sub "${VpcName}-Privatesubnet03"
      - Key: kubernetes.io/role/internal-elb
        Value: 1

  Publicsubnet01RouteTableAssociation:
    Type: AWS::EC2::subnetRouteTableAssociation
    Properties:
      subnetId: !Ref Publicsubnet01
      RouteTableId: !Ref PublicRouteTable

  Publicsubnet02RouteTableAssociation:
    Type: AWS::EC2::subnetRouteTableAssociation
    Properties:
      subnetId: !Ref Publicsubnet02
      RouteTableId: !Ref PublicRouteTable

  Publicsubnet02RouteTableAssociation:
    Type: AWS::EC2::subnetRouteTableAssociation
    Properties:
      subnetId: !Ref Publicsubnet03
      RouteTableId: !Ref PublicRouteTable

  Privatesubnet01RouteTableAssociation:
    Type: AWS::EC2::subnetRouteTableAssociation
    Properties:
      subnetId: !Ref Privatesubnet01
      RouteTableId: !Ref PrivateRouteTable01

  Privatesubnet02RouteTableAssociation:
    Type: AWS::EC2::subnetRouteTableAssociation
    Properties:
      subnetId: !Ref Privatesubnet02
      RouteTableId: !Ref PrivateRouteTable02

  Privatesubnet03RouteTableAssociation:
    Type: AWS::EC2::subnetRouteTableAssociation
    Properties:
      subnetId: !Ref Privatesubnet03
      RouteTableId: !Ref PrivateRouteTable03

  ControlPlanesecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Cluster communication with worker nodes
      VpcId: !Ref VPC

  WorkerNodeSshSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: SG for ssh access to worker nodes in managed nodegroup
      VpcId: !Ref VPC

Outputs:

  PublicsubnetIds:
    Description: Public subnets IDs in the VPC
    Value: !Join [ ",",[ !Ref Publicsubnet01,!Ref Publicsubnet02,!Ref Publicsubnet03 ] ]

  PrivatesubnetIds:
    Description: Private subnets IDs in the VPC
    Value: !Join [ ",[ !Ref Privatesubnet01,!Ref Privatesubnet02,!Ref Privatesubnet03 ] ]

  ControlPlanesecurityGroups:
    Description: Security group for the cluster control plane communication with worker nodes
    Value: !Join [ ",[ !Ref ControlPlanesecurityGroup ] ]

  WorkerNodeSshSecurityGroup:
    Description: SG for ssh access to worker nodes in managed nodegroup
    Value: !Ref WorkerNodeSshSecurityGroup

  VpcId:
    Description: The VPC Id
    Value: !Ref VPC

IAM 角色模板:

Mappings:
  ServicePrincipals:
    aws-cn:
      ec2: ec2.amazonaws.com.cn
    aws-us-gov:
      ec2: ec2.amazonaws.com
    aws:
      ec2: ec2.amazonaws.com

Resources:

  eksClusterRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - eks.amazonaws.com
          Action:
          - sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AmazonEKSClusterPolicy

  NodeInstanceRole:
    Type: "AWS::IAM::Role"
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - !FindInMap [ServicePrincipals,!Ref "AWS::Partition",ec2]
            Action:
              - "sts:AssumeRole"
      ManagedPolicyArns:
        - !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy"
        - !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
      Path: /

Outputs:

  eksClusterRoleArn:
    Description: The role that Amazon EKS will use to create AWS resources for Kubernetes clusters
    Value: !GetAtt eksClusterRole.Arn

  NodeInstanceRole:
    Description: The node instance role
    Value: !GetAtt NodeInstanceRole.Arn
  

EKS 集群模板:

---
AWstemplateFormatVersion: '2010-09-09'
Description: 'EKS Production-Grade Cluster'

Parameters:

  Kubernetesversion:
    Type: String
    Description: The EKS supported Kubernetes version for your cluster
    Default: 1.19
    AllowedValues:
      - 1.19
      - 1.18
      - 1.17

  ClusterName:
    Type: String
    Description: The name of the cluster

  ControlPlaneClusterRoleArn:
    Type: String
    Description: The eksClusterRole Arn to use for the eks cluster (control plane)

  ControlPlanesubnetIds:
    Type: List<AWS::EC2::subnet::Id>
    Description: Register your public and private subnets with your EKS managed control plane

  SecurityGroupIds:
    Type: List<AWS::EC2::SecurityGroup::Id>
    Description: The security group(s) for the cross-account elastic network interfaces that Amazon EKS creates to use to allow communication between your nodes and the Kubernetes control plane.

Resources:
  
  EKSManagedControlPlane:
    Type: AWS::EKS::Cluster
    Properties:
      KubernetesNetworkConfig: 
        ServiceIpv4Cidr: 172.16.0.0/12
      Name: !Ref ClusterName
      ResourcesVpcConfig:
        SecurityGroupIds: !Ref SecurityGroupIds
        subnetIds: !Ref ControlPlanesubnetIds
      RoleArn: !Ref ControlPlaneClusterRoleArn
      Version: !Ref Kubernetesversion

托管节点组模板:

---
AWstemplateFormatVersion: '2010-09-09'
Description: 'EKS Production-Grade Nodegroup'

Parameters:

  Kubernetesversion:
    Type: String
    Description: The EKS supported Kubernetes version for your cluster
    Default: 1.19
    AllowedValues:
      - 1.19
      - 1.18
      - 1.17

  ClusterName:
    Type: String
    Description: The name of the cluster

  NodeInstanceRoleArn:
    Type: String
    Description: The NodeInstanceRole Arn to use for the eks nodegroup (managed data plane)

  DataPlanePrivatesubnetIds:
    Type: List<AWS::EC2::subnet::Id>
    Description: Private subnets for your Amazon EKS data plane nodes

  WorkerNodeGroupName:
    Type: String
    Description: The name of the node group for the worker nodes in the data plane. Right Now we are only supporting 1 node group per cluster.

  WorkerNodesInstanceType:
    Type: String
    Description: The instance type for the worker nodes in the data plane. Right Now we are only supporting 1 instance type for all worker nodes.
    Default: t3.medium
    AllowedValues:
      - t3.small
      - t3.medium
      - t3.large
      - m5.large
      - m5.xlarge
      - c5.large
      - c5.xlarge

  WorkerNodesEc2SshKey:
    Type: AWS::EC2::KeyPair::KeyName
    Description: The Amazon EC2 SSH key that provides access for SSH communication with the nodes in the managed node group

  SourceSecurityGroupsForWorkerNodes:
    Type: List<AWS::EC2::SecurityGroup::Id>
    Description: The security groups that are allowed SSH access (port 22) to the nodes. If you specify an Amazon EC2 SSH key but do not specify a source security group when you create a managed node group,then port 22 on the nodes is opened to the internet.

Resources:

  EKSManagedDataPlane:
    Type: AWS::EKS::Nodegroup
    Properties:
      AmiType: AL2_x86_64
      CapacityType: ON_DEMAND
      ClusterName: !Ref ClusterName
      ForceUpdateEnabled: false
      InstanceTypes:
        - !Ref WorkerNodesInstanceType
      NodegroupName: !Ref WorkerNodeGroupName
      NodeRole: !Ref NodeInstanceRoleArn
      RemoteAccess: 
        Ec2SshKey: !Ref WorkerNodesEc2SshKey
        SourceSecurityGroups: !Ref SourceSecurityGroupsForWorkerNodes
      ScalingConfig:
        DesiredSize: 3
        MaxSize: 4
        MinSize: 3
      subnets: !Ref DataPlanePrivatesubnetIds
      Version: !Ref Kubernetesversion

对于 EKS 集群模板,我使用 eksClusterRole ARN 作为 EKS 集群参数,并在创建集群时从 VPC 模板输出中传入所有 6 个子网 ID(公共和私有)。我也从 VPC ControlPlanesecurityGroups 输出字段传入的 SG id。

对于托管节点组模板,我只向它传递来自 VPC 输出的私有子网 ID、来自 VPC 输出的 ssh 安全组 ID 和 NodeInstanceRole ARN。我已确保集群名称与我在创建时为 EKS 集群模板提供的名称相匹配。

我计划在设置集群和托管节点组后配置 VPC CNI 插件以将 IAM 角色用于服务帐户,这就是我将 CNI 的策略从该 IAM 角色中移除的原因。

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)