AKS Installer

Fury Kubernetes Installer - Managed Services - AKS - oss project.

AKS vs Compute Instances (managed vs self-managed)

Before continuing, you should understand what are the benefits and drawback of creating an AKS cluster instead of creating your Kubernetes control plane in Microsoft Azure Compute instances.

Price

AKS is currently free for a HA control plane.

An DS2 v2 compute instance currently costs $0.1147 per hour. Having an HA cluster with 3 x DS2 v2 instances will cost: $0.1147 x 3 instances = $0.344 per hour.

AKS is cheaper in most scenarios.

You can host these instances using reserved instances reducing control-plane cost, but you have to upfront pay for it for months.

All the cost analysis was done in May 2020, all prices have been taken from the official Microsoft Azure pricing lists:

Management

AKS is a fully managed service provided by Microsoft Azure meaning that you don't need to worry about, backups, recoveries, availability, scalability, certificates… even authentication could be managed by Microsoft Azure Active Directory.

You'll have to set up these features if you choose to host your control-plane. Also, other features can be customized in a self-managed setup: audit-logs, enable Kubernetes API server feature flags, set up your own authentication provider and other platform services.

So, if you need to set up a non default cluster, you should consider going with the self-managed cluster. Otherwise, AKS is a good option.

Day two operations

As mentioned before, AKS is responsible for making Kubernetes control plane fully operational with a monthly uptime percentage of at least 99.95%.

source: https://azure.microsoft.com/en-us/support/legal/sla/kubernetes-service/v1_1/

On the other side, in a self-managed set up you have to worry about backups, disaster recovery strategies, HA setup, certificate rotations, control-plane and worker updates.

Requirements

As mentioned in the common requirements the operator who is responsible for creating an AKS cluster has to have connectivity from the operator's machine (bastion host, laptop with configured VPN…) to the network where the cluster will be placed.

The machine used to create the cluster should have installed:

  • OS tooling like: git, ssh, curl, jq and unzip.
  • terraform version > 0.12.
  • latest az CLI version.

Cloud requirements

This installer requires to have mainly one simple requirement:

  • Enough permissions to create all resources surrounding the AKS cluster.
    • Ensure to be logged in az login before continue

Gatter all input values

Before starting to use this installer, you should know the value of the input variables:

  • cluster_name: Unique cluster name.
  • cluster_version: AKS version to use. Example: 1.15.11 or 1.16.9. Take a look to discover available AKS Kubernetes versions.
  • network: Network name where the cluster will be created.
  • subnetworks: List of one subnetworks names:
    • index 0: The subnetwork to host the cluster. Subnetworks should belong to network.
  • ssh_public_key: Cluster administrator public ssh key. Used to access cluster nodes with the operator_ssh_user
  • dmz_cidr_range: Network CIDR range from where the cluster's control plane will be accessible from.

Getting started

Make sure to set up all the pre-requirements before continuing including cloud credentials, VPN/Bastion/Network configuration and gathering all required input values.

Create a new directory to save all terraform files:

$ mkdir /home/operator/sighup/my-cluster-at-aks
$ cd /home/operator/sighup/my-cluster-at-aks

Create the following files:

main.tf

variable "cluster_name" {}
variable "cluster_version" {}
variable "network" {}
variable "subnetworks" { type = list }
variable "dmz_cidr_range" {}
variable "ssh_public_key" {}
variable "node_pools" { type = list }
variable "resource_group_name" {}

module "my-cluster" {
  source = "github.com/sighupio/fury-aks-installer//modules/aks?ref=v1.0.0"

  cluster_version     = var.cluster_version
  cluster_name        = var.cluster_name
  network             = var.network
  subnetworks         = var.subnetworks
  ssh_public_key      = var.ssh_public_key
  dmz_cidr_range      = var.dmz_cidr_range
  node_pools          = var.node_pools
  resource_group_name = var.resource_group_name
}

data "azurerm_kubernetes_cluster" "aks" {
  name                = var.cluster_name
  resource_group_name = var.resource_group_name

  depends_on = [
    module.my-cluster,
  ]
}

output "kubeconfig" {
  sensitive = true
  value     = <<EOT
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: ${module.my-cluster.cluster_certificate_authority}
    server: ${module.my-cluster.cluster_endpoint}
  name: aks
contexts:
- context:
    cluster: aks
    user: aks
  name: aks
current-context: aks
kind: Config
preferences: {}
users:
- name: aks
  user:
    client-certificate-data: ${data.azurerm_kubernetes_cluster.aks.kube_config.0.client_certificate}
    client-key-data: ${data.azurerm_kubernetes_cluster.aks.kube_config.0.client_key}
EOT
}

Create my-cluster.tfvars including your environment values:

resource_group_name = "aks"
cluster_name        = "aks"
cluster_version     = "1.15.11"
network             = "aks-local"
subnetworks         = ["aks-main"]
ssh_public_key      = "ssh-rsa example
dmz_cidr_range      = "11.11.0.0/16"
node_pools = [
  {
    name : "nodepool1"
    version : null
    min_size : 1
    max_size : 1
    instance_type : "Standard_DS2_v2"
    volume_size : 100
    labels : {
      "sighup.io/role" : "app"
      "sighup.io/fury-release" : "v1.3.0"
    }
    taints : [
      "sighup.io/role=app:NoSchedule"
    ]
  },
  {
    name : "nodepool2"
    version : null
    min_size : 1
    max_size : 1
    instance_type : "Standard_DS2_v2"
    volume_size : 50
    labels : {}
    taints : []
  }
]

With these two files, the installer is ready to create everything needed to set up an AKS Cluster with two different node pools (if you don't modify the node_pools variable example value) using Kubernetes 1.15.

$ ls -lrt
total 16
-rw-r--r--  1 sighup  staff  1171 27 abr 16:35 my-cluster.tfvars
-rw-r--r--  1 sighup  staff  1128 27 abr 16:36 main.tf
$ terraform init
Initializing modules...
- my-cluster in modules/aks

Initializing the backend...

Successfully configured the backend "azurerm"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "azuread" (hashicorp/azuread) 0.8.0...
- Downloading plugin for provider "azurerm" (hashicorp/azurerm) 2.16.0...
- Downloading plugin for provider "random" (hashicorp/random) 2.2.1...
- Downloading plugin for provider "null" (hashicorp/null) 2.1.2...
- Downloading plugin for provider "kubernetes" (hashicorp/kubernetes) 1.11.1...

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
$ terraform plan --var-file my-cluster.tfvars --out my-cluster.plan
<TRUNCATED OUTPUT>
Plan: 18 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

This plan was saved to: my-cluster.plan

To perform exactly these actions, run the following command to apply:
    terraform apply "my-cluster.plan"

Review carefully the plan before applying anything. It should create 18 resources.

$ terraform apply my-cluster.plan
<TRUNCATED OUTPUT>
Error: creating Managed Kubernetes Cluster "<cluster_name>" (Resource Group "<resource_group_name>"): containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="The access token requested for audience https://graph.microsoft.com by application xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx in tenant xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx is missing the required claim role Directory.Read.All." Target="aadProfile.serverAppID"

Now you can compute again your Terraform plan:

$ terraform plan --var-file my-cluster.tfvars --out my-cluster.plan
<TRUNCATED OUTPUT>
Plan: 3 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

This plan was saved to: my-cluster.plan

To perform exactly these actions, run the following command to apply:
    terraform apply "my-cluster.plan"

Review carefully the plan before applying anything. It should create 3 additional resources.

$ terraform apply my-cluster.plan
<TRUNCATED OUTPUT>

To get your `kubeconfig` file follow these simple commands:

```bash
$ terraform output kubeconfig > kube.config
$ kubectl cluster-info --kubeconfig kube.config
Kubernetes master is running at https://aks-installer-44f514c5.53bb3242-f52f-4b7a-a187-7c345f5373a5.privatelink.westeurope.azmk8s.io:443
CoreDNS is running at https://aks-installer-44f514c5.53bb3242-f52f-4b7a-a187-7c345f5373a5.privatelink.westeurope.azmk8s.io:443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://aks-installer-44f514c5.53bb3242-f52f-4b7a-a187-7c345f5373a5.privatelink.westeurope.azmk8s.io:443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
$ kubectl get nodes --kubeconfig kube.config
NAME                                STATUS   ROLES   AGE    VERSION
aks-nodepool1-37845356-vmss000000   Ready    agent   5m6s   v1.15.11
aks-nodepool2-37845356-vmss000000   Ready    agent   98s    v1.15.11

Update the cluster

To update the cluster, just modify the cluster_version with the next version available

$ diff my-cluster.tfvars my-cluster-updated.tfvars
2c2
< cluster_version = "1.15.11"
---
> cluster_version = "1.16.9"

after that modifiying the cluster_version execute:

$ terraform plan --var-file my-cluster-updated.tfvars --out my-cluster.plan
<TRUNCATED OUTPUT>
Plan: 1 to add, 1 to change, 1 to destroy.

Please, read carefully the output plan. Once you understand the changes, apply it:

$ terraform apply my-cluster.plan
<TRUNCATED OUTPUT>
Apply complete! Resources: 1 added, 1 changed, 1 destroyed.

Outputs:

kubeconfig = <sensitive>

It can take up to 25-30 minutes.

After updating the cluster you will end up with the entire cluster updated including the defined node pools.

Update node pools

Currently AKS does not support updating the node pools in a separated way. Node pools are updated alongside the control-plane in one shoot.

Tear down the environment

If you don't need anymore the cluster, go to the terraform directory where create the cluster (cd /home/operator/sighup/my-cluster-at-aks) and type:

$ terraform destroy --var-file my-cluster.tfvars
<TRUNCATED OUTPUT>
Plan: 0 to add, 0 to change, 10 to destroy.

Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

Type yes and press intro to continue the destruction. It will take around 15 minutes.