EKS Installer

Fury Kubernetes Installer - Managed Services - EKS - oss project.

EKS vs EC2 (managed vs self-managed)

Before continuing, you should understand what are the benefits and drawback of creating an EKS cluster instead of creating your Kubernetes control plane in AWS EC2 instances.

Price

EKS currently costs $0.10 per hour for a HA control plane.

An m5.large EC2 instance currently costs $0.096 per hour. Having an HA cluster with 3 x m5.large instances will cost: $0.096 x 3 instances = $0.288 per hour.

EKS is cheaper in most scenarios.

You can host these instances in reserved instances reducing the control-plane cost, but you have to upfront pay for it for months.

All the cost analysis was done in May 2020, all prices have been taken from the official Amazon pricing lists:

Management

EKS is a fully managed service provided by AWS meaning that you don't need to worry about backups, recoveries, availability, scalability, certificates… even authentication to the cluster is managed by AWS IAM service.

You'll have to set up these features if you choose to host your control-plane. Also, other features can be customized in a self-managed setup: audit-logs, enable Kubernetes API server feature flags, set up your own authentication provider and other platform services.

So, if you need to set up a non default cluster, you should consider going with the self-managed cluster. Otherwise, EKS is a good option.

Day two operations

As mentioned before, EKS is responsible for making Kubernetes control plane fully operational with a monthly uptime percentage of at least 99.95%.

source: https://aws.amazon.com/eks/sla

On the other side, in a self-managed setup you have to worry about backups, disaster recovery strategies, HA setup, certificate rotations, control-plane and worker updates.

Requirements

As mentioned in the common requirements the operator who is responsible for creating an EKS cluster has to have connectivity from the operator's machine (bastion host, laptop with configured VPN…) to the network where the cluster will be placed.

The machine used to create the cluster should have installed:

  • OS tooling like: git, ssh, curl and unzip.
  • terraform version > 0.12.
  • latest aws CLI version.

Cloud requirements

This installer requires to have mainly two requirements:

  • Enough permissions to create all resources surrounding the EKS cluster.
  • Read and make sure to be compliant with: Cluster VPC considerations.
    • Special attention to the <cluster-name> placeholder. We need this value to pass it as an input variable of the installer.

Gather all input values

Before starting to use this installer, you should know the value of the input variables:

  • cluster_name: Unique cluster name.
  • cluster_version: EKS version to use. Example: 1.14 or 1.15. Take a look to discover available Amazon EKS Kubernetes versions.
  • network: VPC id where the cluster will be created.
  • subnetworks: Private subnet ids where the cluster will be created. Should belong to network.
  • ssh_public_key: Cluster administrator public ssh key. Used to access cluster nodes with the operator_ssh_user
  • dmz_cidr_range: Network CIDR range from where the cluster's control plane will be accessible from.

Getting started

Make sure to set up all the pre-requirements before continuing including cloud credentials, VPN/Bastion/Network configuration and gathering all required input values.

Create a new directory to save all terraform files:

$ mkdir /home/operator/sighup/my-cluster-at-eks
$ cd /home/operator/sighup/my-cluster-at-eks

Create the following files:

main.tf

variable "cluster_name" {}
variable "cluster_version" {}
variable "network" {}
variable "subnetworks" { type = list }
variable "dmz_cidr_range" {}
variable "ssh_public_key" {}
variable "node_pools" { type = list }

module "my-cluster" {
  source  = "github.com/sighupio/fury-eks-installer//modules/eks?ref=v1.0.0"

  cluster_version = var.cluster_version
  cluster_name    = var.cluster_name
  network         = var.network
  subnetworks     = var.subnetworks
  ssh_public_key  = var.ssh_public_key
  dmz_cidr_range  = var.dmz_cidr_range
  node_pools      = var.node_pools
}

output "kube_config" {
  value = <<EOT
apiVersion: v1
clusters:
- cluster:
    server: ${module.my-cluster.cluster_endpoint}
    certificate-authority-data: ${module.my-cluster.cluster_certificate_authority}
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: aws
  name: aws
current-context: aws
kind: Config
preferences: {}
users:
- name: aws
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      command: aws
      args:
        - "eks"
        - "get-token"
        - "--cluster-name"
        - "${var.cluster_name}"
EOT
}

Create my-cluster.tfvars including your environment values:

cluster_name    = "my-cluster"
cluster_version = "1.14"
network         = "vpc-id0"
subnetworks = [
  "subnet-id1",
  "subnet-id2",
  "subnet-id3",
]
ssh_public_key = "ssh-rsa example"
dmz_cidr_range = "10.0.4.0/24"
node_pools = [
  {
    name : "m5-node-pool"
    version : null # To use same value as cluster_version
    min_size : 1
    max_size : 2
    instance_type : "m5.large"
    volume_size : 100
    labels : {
      "node.kubernetes.io/role" : "app"
      "sighup.io/fury-release" : "v1.2.0-rc1"
    }
    taints: [
      "sighup.io/role=app:NoSchedule"
    ]
  },
  {
    name : "t3-node-pool"
    version : "1.14" # To use the cluster_version
    min_size : 1
    max_size : 1
    instance_type : "t3.micro"
    volume_size : 50
    labels : {}
    taints: []
  }
]

With these two files, the installer is ready to create everything needed to set up an EKS Cluster with two different node pools (if you don't modify the node_pools variable example value) using Kubernetes 1.14.

$ ls -lrt
total 16
-rw-r--r--  1 sighup  staff  1171 27 abr 16:35 my-cluster.tfvars
-rw-r--r--  1 sighup  staff  1128 27 abr 16:36 main.tf
$ terraform init
Initializing modules...
Downloading fury-eks-installer 1.0.0 for my-cluster.cluster... # TODO DONT FAKE IT
- my-cluster in ../../modules/aws/eks-sighup/modules/eks
Downloading terraform-aws-modules/eks/aws 11.0.0 for my-cluster.cluster...
- my-cluster.cluster in .terraform/modules/my-cluster.cluster/terraform-aws-eks-11.0.0
- my-cluster.cluster.node_groups in .terraform/modules/my-cluster.cluster/terraform-aws-eks-11.0.0/modules/node_groups

Initializing the backend...

Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "null" (hashicorp/null) 2.1.2...
- Downloading plugin for provider "template" (hashicorp/template) 2.1.2...
- Downloading plugin for provider "random" (hashicorp/random) 2.2.1...
- Downloading plugin for provider "kubernetes" (hashicorp/kubernetes) 1.11.1...
- Downloading plugin for provider "aws" (hashicorp/aws) 2.59.0...
- Downloading plugin for provider "local" (hashicorp/local) 1.4.0...

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
$ terraform plan --var-file my-cluster.tfvars --out my-cluster.plan
<TRUNCATED OUTPUT>
Plan: 30 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

This plan was saved to: my-cluster.plan

To perform exactly these actions, run the following command to apply:
    terraform apply "my-cluster.plan"

Review carefully the plan before applying anything. It should create 30 resources.

$ terraform apply my-cluster.plan
<TRUNCATED OUTPUT>
Apply complete! Resources: 30 added, 0 changed, 0 destroyed.

Outputs:

kubeconfig = <sensitive>

To get your kubeconfig file follow these simple commands:

kubectl will use aws CLI to authenticate against the cluster.

$ terraform output kubeconfig > kube.config
$ kubectl cluster-info --kubeconfig kube.config
Kubernetes master is running at https://eks-id.yl4.eu-west-1.eks.amazonaws.com
CoreDNS is running at https://eks-id.yl4.eu-west-1.eks.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
$ kubectl get nodes --kubeconfig kube.config
NAME                                       STATUS   ROLES    AGE     VERSION
ip-10-0-3-130.eu-west-1.compute.internal   Ready    <none>   7m6s    v1.14.9-eks-1f0ca9
ip-10-0-3-236.eu-west-1.compute.internal   Ready    <none>   7m33s   v1.14.9-eks-1f0ca9

Update control plane

To update the control plane, just modify the cluster_version with the next version available

$ diff my-cluster.tfvars my-cluster-updated.tfvars
2c2
< cluster_version = "1.14"
---
> cluster_version = "1.15"

after that modifiying the cluster_version execute:

$ terraform plan --var-file my-cluster-updated.tfvars --out my-cluster.plan
<TRUNCATED OUTPUT>
Plan: 4 to add, 3 to change, 4 to destroy.

Please, read carefully the output plan. Once you understand the changes, apply it:

$ terraform apply my-cluster.plan
<TRUNCATED OUTPUT>
Apply complete! Resources: 4 added, 3 changed, 4 destroyed.

It can take up to 25-30 minutes.

After updating the control-plane you end up with:

  • EKS control plane updated from Kubernetes version 1.14 to 1.15
  • The m5-node-pool ready to roll out new nodes with 1.15 version. (Updated as it uses cluster_version)
  • The t3-node-pool remains in 1.14 Kubernetes version.

Update node pools

To update a node pool, just modify the node_pool's version attribute with the same version as the control-plane:

If you have set null, you don't need to do anything else, node_pools with null version are updated alongside the control-plane update procedure.

$ diff my-cluster.tfvars my-cluster-updated.tfvars
26c26
<     version : "1.14"
---
>     version : "1.15"

after that run:

$ terraform plan --var-file my-cluster-updated.tfvars --out my-cluster.plan
<TRUNCATED OUTPUT>
Plan: 2 to add, 1 to change, 2 to destroy.

Review the plan before applying anything:

$ terraform apply my-cluster.plan
<TRUNCATED OUTPUT>
Apply complete! Resources: 2 added, 1 changed, 2 destroyed.

You will need to roll out your nodes to get a new one with the new version installed.

Consider to increase the number of nodes, migrate workloads to the updated nodes and scale down to the orignal number of nodes.

Lift and Shift node pool update

You can apply another node pool update strategy named lift and shift. Create a new node pool with the new updated version then move all workloads to the new nodes and remove/set to 0 the number of instances in the old node pool.

Tear down the environment

If you don't need anymore the cluster, go to the terraform directory where create the cluster (cd /home/operator/sighup/my-cluster-at-eks) and type:

$ terraform destroy --var-file my-cluster.tfvars
<TRUNCATED OUTPUT>
Plan: 0 to add, 0 to change, 30 to destroy.

Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

Type yes and press intro to continue the destruction. It will take around 15 minutes.