vSphere Cluster

furyctl vSphere cluster provisioner

11 minute read

The vSphere provisioner has been developed to make it simple to deploy Kubernetes Clusters on your VMWare vSphere cluster. It enables the creation and basic operations of Kubernetes Clusters of all the Kubernetes infrastructural components with a simple yaml file. It provides some nice features:

  • Private Kubernetes Control Plane.
    • Requires to have connectivity from the furyctl to the network where the cluster will be placed.
  • Load Balancer for both the nodes and the control plane based on HAProxy and Keepalived.
    • Able to run in cluster mode.
  • Set an operator SSH key to enabling the troubleshooting of issues in the cluster nodes.
    • Enable Boundary autoconfiguration for the first time
  • Configures multiple node pools
    • Taints
    • Labels
    • Tags
    • Lift and Shift updates
    • Many more…

Configuration

kind: Cluster
metadata:
  name: # The name of the resources. Used to name the control plane and other resources...
provisioner: vsphere # set to `vsphere` to use this provisioner
spec:
  version: 1.20.5 # Place here the Kubernetes version you want to use
  etcd: # OPTIONAL
    version: v3.4.15 # OPTIONAL. Place there the ETCD version you want to use
  oidc: # OPTIONAL
    issuerURL: https://dex.internal.example.com/ # OPTIONAL. Place here the issuer URL of your oidc provider'
    clientID: oidc-auth-client # OPTIONAL. Place here the client ID
    caFile: /etc/pki/ca-trust/source/anchors/example.com.cer # OPTIONAL. The CA certificate to use'
  cri: # OPTIONAL
    version: 18.06.2.ce # OPTIONAL. This is the default value for oracle linux docker CRI'
    dns: # OPTIONAL. Set here your DNS servers
    - 1.1.1.1
    - 8.8.8.8
    proxy: '"HTTP_PROXY=http://systems.example.com:8080" "NO_PROXY=.example.com,.group.example.com"'
    mirrors: # OPTIONAL. Set here your dockerhub mirrors
    - https://mirror.gcr.io
  environmentName: production # The environment name of the cluster
  config:
    datacenterName: westeros # Get the name of datacenter from vShpere dashboard
    datastore: main # Get the name of datastore from vSphere dashboard
    esxiHosts: # Names of the hosts where the VMs are going to be created
    - host1
    - host2
    - host3
  networkConfig:
    name: main-network # The name of the vSphere network
    gateway: 10.0.0.1 # The IP of the network gateway
    nameservers: # Nameservers
    - 8.8.4.4
    - 1.1.1.1
    domain: localdomain # Domain name
    ipOffset: 0 # Number to sum at every IP calculation. Enable deploying multiple clusters in the same network
  boundary: true # Enable boundary target auto-configuration
  lbNode:
    count: 1 # Number of loadbalancer nodes
    template: ubuntu-20.04 # The name of the base image to use for the VMs
    customScriptPath: /home/user/do-something.sh # A script that you want to run after first boot
  masterNode:
    count: 1 # Number of master nodes
    cpu: 1 # Number of CPUs (cores)
    memSize: 4096 # Amount of memory (mb)
    diskSize: 100 # Amount of disk (gb)
    template: ubuntu-20.04 # The name of the base image to use for the VMs
    labels: # Node labels. Use it to tag nodes then use it on Kubernetes
      environment: production
    taints: # Kubernetes taints
    - key1=value1:NoSchedule
    customScriptPath: /home/user/do-something.sh # A script that you want to run after first boot
  infraNode:
    count: 1 # Number of infra nodes
    cpu: 1 # Number of CPUs (cores)
    memSize: 8192 # Amount of memory (mb)
    diskSize: 100 # Amount of disk (gb)
    template: ubuntu-20.04 # The name of the base image to use for the VMs
    labels: # Node labels. Use it to tag nodes then use it on Kubernetes
      environment: production
    taints: # Kubernetes taints
    - key1=value1:NoSchedule. As an example
    customScriptPath: /home/user/do-something.sh # A script that you want to run after first boot
  nodePools: # List of object. Contains node pool definitions
  - role: applications # Role of the node pool
    count: 1 # Number of nodes
    cpu: 1 # Number of CPUs (cores)
    memSize: 8192 # Amount of memory (mb)
    diskSize: 100 # Amount of disk (gb)
    template: ubuntu-20.04 # The name of the base image to use for the VMs
    labels: # Node labels. Use it to tag nodes then use it on Kubernetes
      environment: production
    taints: Kubernetes Taints
    - key1=value1:NoSchedule
    customScriptPath: /home/user/do-something.sh # A script that you want to run after first boot
  clusterPODCIDR: 172.21.0.0/16
  clusterSVCCIDR: 172.23.0.0/16
  clusterCIDR: 10.4.0.0/16
  sshPublicKeys:
  - /home/user/.ssh/id_rsa.pub

Important notes

To properly deploy on vSphere using this provider, you have to solve first the connectivity problem.

This provisioner requires interacting with the vSphere API. As it is a private endpoint, the furyctl has to be able to reach the private vSphere API via VPN or running the cli from a bastion host in the vSphere network.

This proviser has been tested against two different Linux distributions:

  • Ubuntu 20
  • Oracle Linux 7.9

The provisioner requires VM Templates based on one of the supported Linux distributions.

If you have any doubt about how to prepare them, contact us.

Some VM Template requirements are listed bellow:

  • cloud-init support
  • vmware guesttools installed
  • selinux disabled
  • firewalld | ufw disabled
PKI warning!

In addition, take care of cluster re-initialization. The PKI used to deploy the cluster is initialized during furyctl cluster init command. It is highly recommended saving the project in a versioning control system like git. furyctl automatically configures few files to encrypt sensitive directories in git using git-crypt.

Current status

This provisioner is currently on early stage. It means that many actions are not properly supported.

  • Cluster Upgrades: It could cause downtime and it has not been tested carefully.
  • Control Plane scale events: The control plane does not support to be scaled (up nor down).
  • There is no cluster autoscaller.
  • Many warnings

We recommend using the provisioner to create the cluster then manage the whole lifecycle as any other Kubernetes Cluster on premises.

Resources

This provisioner creates and configures:

  • vSphere folder: To maintain all resources well organize
  • LoadBalancer: A load balancer instance (or cluster) based on HAProxy and Keepalived.
  • K8S Control Plance: From one to n instances to deploy the Kubernetes Control Plane.
  • K8S Infra Nodes: From 0 to n instances to deploy infrastructural Kubernetes components.
  • K8S Worker Nodes: Based on the configuration of the node pools.

Diagram

Requirements

This provisioner requires to have previously configured:

  • A VM template based on the Linus distribution mentioned above.
  • A valid vSphere license.
  • vSphere Credentials in form of environment variables:
    • VSPHERE_USER: This is the username for vSphere API operations.
    • VSPHERE_PASSWORD: This is the password for vSphere API operations.
    • VSPHERE_SERVER: This is the vCenter server name for vSphere API operations.
    • VSPHERE_ALLOW_UNVERIFIED_SSL: Boolean that can be set to true to disable SSL certificate verification.
    • Enough permissions to deploy the above resources.
  • furyclt.
  • kubectl.
  • wget and/or curl.
  • ansible.
  • A configuration file with all the values in place. Please think it very carefully.
  • Network connetivity to the target network.

vsphere role and permissions

The vsphere user requires a set of permissions to properly run this provisioner.

Datastore
    Allocate space
    Browse datastore
    Low level file operations
    Remove file
    Update virtual machine files
    Update virtual machine metadata
Folder (all)
    Create folder
    Delete folder
    Move folder
    Rename folder
Network
    Assign network
Profile-driven storage
    Profile-driven storage view
Resource
    Apply recommendation
    Assign virtual machine to resource pool
Virtual Machine
    Configuration (all) - for now
    Guest Operations (all) - for now
    Interaction (all)
    Inventory (all)
    Provisioning (all)

Example execution

In the following lines, you will find an example execution to deploy a cluster.

First, ensure you pass the requirements. Then create a configuration file with the structure described above.

First, create a new directory:

$ mkdir demo
$ cd demo

Take this cluster.yml file as an example. We recommend you to use a non-default backend configuration:

kind: Cluster
metadata:
  name: furyctl
provisioner: vsphere
spec:
  version: "1.20.5"
  environmentName: "demo"
  config:
    datacenterName: "MYLAB"
    datastore: "datastore2"
    esxiHosts:
    - "esx2.your-server.de"
  networkConfig:
    name: "E2E"
    nameservers:
    - 1.1.1.1
    - 8.8.8.8
    domain: localdomain
    ipOffset: 2550
  boundary: true
  lbNode:
    count: 1
    template: "TEMPLATES/oraclelinux7.9-template-v20210413"
  masterNode:
    count: 1
    cpu: 2
    memSize: 8192
    diskSize: 100
    template: "TEMPLATES/oraclelinux7.9-template-v20210413"
  infraNode:
    count: 1
    cpu: 2
    memSize: 8192
    diskSize: 100
    template: "TEMPLATES/oraclelinux7.9-template-v20210413"
  nodePools: []
  clusterPODCIDR: 172.21.0.0/16
  clusterSVCCIDR: 172.23.0.0/16
  clusterCIDR: 10.4.0.0/16
  sshPublicKeys:
    - /Users/user/.ssh/id_rsa.pub
$ ls
cluster.yml

Init the cluster project

$ furyctl cluster init --reset
WARN[0000] Cleaning up the workdir
WARN[0000] Removing demo/./cluster directory
WARN[0000] could not find terraform executable
INFO[0005] Download furyagent: demo/cluster/furyagent
⣟ Preparing the provisioner environment INFO[0009] Configuring the NETRC environment variable: demo/cluster/configuration/.netrc
INFO[0009] Ansible roles download path: demo/cluster/provision/roles
INFO[0022] [INFO] running Terraform command: demo/cluster/bin/terraform init -no-color -force-copy -input=false -lock-timeout=0s -backend=true -get=true -get-plugins=true -lock=true -upgrade=false -verify-plugins=true
[VSphere] Fury

This provisioner creates a battle-tested Kubernetes vSphere Cluster
with a private and production-grade setup.

It will deploy all the components required to run a Kubernetes Cluster:
- Load Balancer (Control Plane & Infrastructure components)
- Kubernetes Control Plane
- Dedicated intrastructure nodes
- General node pools

Requires to connect to a VPN server to deploy the cluster from this computer.
Use a bastion host (inside the same vSphere network) as an alternative method to deploy the cluster.

The provisioner requires the following software installed:
- ansible

And internet connection to download remote repositories from the SIGHUP enterprise repositories.

[FURYCTL]

Init phase completed.

Project directory: demo/./cluster
Terraform logs: demo/./cluster/logs/terraform.logs

Everything ready to create the infrastructure; execute:

$ furyctl cluster apply

Once completed, a new directory cluster is available inside the current directory demo.

$ ls
cluster   cluster.yml

It contains all the terraform project and configuration to properly manage the cluster infrastructure.

$ tree cluster/
cluster/
├── backend.tf
├── bin
│   └── terraform
├── configuration
├── furyagent
│   ├── furyagent
│   ├── furyagent.yml
│   └── pki
│       ├── etcd
│       │   ├── ca.crt
│       │   └── ca.key
│       └── master
│           ├── ca.crt
│           ├── ca.key
│           ├── front-proxy-ca.crt
│           ├── front-proxy-ca.key
│           ├── sa.key
│           └── sa.pub
├── logs
│   └── terraform.logs
├── main.tf
├── output
├── output.tf
├── provision
│   ├── all-in-one.yml
│   ├── ansible.cfg
│   └── roles
│       ├── boundary
│       │   └── target
│       │       ├── defaults
│       │       │   └── main.yml
│       │       ├── tasks
│       │       │   ├── furyagent.yml
│       │       │   └── main.yml
│       │       └── templates
│       │           ├── furyagent.tpl.yml
│       │           ├── furyagent_ssh.tpl
│       │           └── update.tpl.sh
│       └── vsphere
│           ├── etcd
│           │   ├── README.md
│           │   ├── defaults
│           │   │   └── main.yml
│           │   ├── handlers
│           │   │   └── main.yml
│           │   ├── tasks
│           │   │   ├── install.yml
│           │   │   ├── main.yml
│           │   │   └── tls.yml
│           │   └── templates
│           │       ├── etcd.env.j2
│           │       ├── etcd.service.j2
│           │       ├── etcdctl.sh.j2
│           │       └── kubeadm-etcd.yml.j2
│           ├── haproxy
│           │   ├── README.md
│           │   ├── defaults
│           │   │   └── main.yml
│           │   ├── handlers
│           │   │   └── main.yml
│           │   ├── tasks
│           │   │   └── main.yml
│           │   └── templates
│           │       ├── check_apiserver.sh.j2
│           │       ├── haproxy.conf.j2
│           │       └── keepalived.conf.j2
│           ├── kube-control-plane
│           │   ├── README.md
│           │   ├── defaults
│           │   │   └── main.yml
│           │   ├── files
│           │   │   ├── audit.yml
│           │   │   └── kube.sh
│           │   ├── tasks
│           │   │   └── main.yml
│           │   └── templates
│           │       └── kubeadm.yml.j2
│           ├── kube-node-common
│           │   ├── defaults
│           │   │   └── main.yml
│           │   ├── handlers
│           │   │   └── main.yml
│           │   ├── tasks
│           │   │   ├── docker.yml
│           │   │   ├── kubelet.yml
│           │   │   ├── main.yml
│           │   │   ├── repo-Debian.yml
│           │   │   └── repo-RedHat.yml
│           │   ├── templates
│           │   │   ├── daemon.json.j2
│           │   │   └── proxy.conf.j2
│           │   └── vars
│           │       ├── Debian.yml
│           │       ├── RedHat.yml
│           │       └── main.yml
│           └── kube-worker
│               ├── defaults
│               │   └── main.yml
│               ├── tasks
│               │   └── main.yml
│               └── templates
│                   └── kubeadm.yml.j2
├── secrets
└── variables.tf

42 directories, 62 files

It didn’t create any infrastructure component, continue the example to deploy it.

Deploy the cluster project

NOTE: It can take up to 20 minutes.

$ furyctl cluster apply
ERRO[0000] Directory already exists
WARN[0000] error while initializing project subdirectories: Directory already exists
⣽ Initializing the terraform executor INFO[0000] terraform is up to date
INFO[0000] [INFO] running Terraform command: demo/cluster/bin/terraform init -no-color -force-copy -input=false -lock-timeout=0s -backend=true -get=true -get-plugins=true -lock=true -upgrade=false -verify-plugins=true
INFO[0002] Updating VSphere project
INFO[0002] [INFO] running Terraform command: demo/cluster/bin/terraform fmt -no-color -write=true -list=false -diff=false demo/./cluster/vsphere.tfvars
⣷ Applying terraform project INFO[0002] [INFO] running Terraform command: demo/cluster/bin/terraform apply -no-color -auto-approve -input=false -var-file=demo/./cluster/vsphere.tfvars -lock=true -parallelism=10 -refresh=true
⡿ Applying terraform project INFO[0300] [INFO] running Terraform command: demo/cluster/bin/terraform output -no-color -json
⣽ Applying terraform project INFO[0301] Run Ansible playbook in : demo/cluster/provision
⣟ Applying terraform project INFO[0824] VSphere Updated
INFO[0824] Gathering output file as json
INFO[0824] [INFO] running Terraform command: demo/cluster/bin/terraform output -no-color -json
INFO[0826] [INFO] running Terraform command: demo/cluster/bin/terraform output -no-color -json
[vSphere] Fury

All the cluster components are up to date.
vSphere Kubernetes cluster ready.

vSphere Cluster Endpoint: 10.4.10.0:6443
SSH Operator Name: sighup

Use the ssh sighup username to access the vSphere instances with the configured SSH key.
Boundary is enabled in this setup so you can use SIGHUP Boundary setup to access this cluster with the boundary-ops user

Discover the instances by running

$ kubectl get nodes

Then access by running:

$ ssh sighup@node-name-reported-by-kubectl-get-nodes


[FURYCTL]
Apply phase completed. The Kubernetes Cluster is up to date.

Project directory: demo/./cluster
Terraform logs: demo/./cluster/logs/terraform.logs
Output file: demo/./cluster/output/output.json
Kubernetes configuration file: demo/./cluster/secrets/kubeconfig

Use it by running:
$ export KUBECONFIG=demo/./cluster/secrets/kubeconfig
$ kubectl get nodes

Everything is up to date.
Ready to apply or destroy the infrastructure; execute:

$ furyctl cluster apply
or
$ furyctl cluster destroy

Once completed, everything is ready to start using the Kubernetes cluster along with other cluster components. In the output message there is enough information to start using the new infrastructure:

$ export KUBECONFIG=./cluster/secrets/kubeconfig
$ kubectl get nodes
NAME                             STATUS     ROLES                  AGE   VERSION
demo-demo-infra-1.localdomain    NotReady   <none>                 52s   v1.20.5
demo-demo-master-1.localdomain   NotReady   control-plane,master   72s   v1.20.5
Kubernetes vSphere Cloud Provider Interface (CPI) and Container Storage Interface (CSI)

This provisioner deploys a Kubernetes cluster in your vSphere environment. To properly interact with the vSphere environment you need to configure and deploy the vSphere Cloud Provider Interface.

This provisioner does not ship this cloud provider because of you must configure it with your own setup details. Please, ensure you read and undestard the vSphere cloud provider interface by reading its documentation site

Interesting docs available in the following link: Deploying a Kubernetes Cluster on vSphere with CSI and CPI

Modify the configuration

As an example modification of the stack, increase the number infra nodes in the infraNode.count from 1 to 2. Modify the cluster.yml file adding the right number of nodes.

kind: Cluster
metadata:
  name: furyctl
provisioner: vsphere
spec:
  version: "1.20.5"
  environmentName: "demo"
  config:
    datacenterName: "MYLAB"
    datastore: "datastore2"
    esxiHosts:
    - "esx2.your-server.de"
  networkConfig:
    name: "E2E"
    nameservers:
    - 1.1.1.1
    - 8.8.8.8
    domain: localdomain
    ipOffset: 2550
  boundary: true
  lbNode:
    count: 1
    template: "TEMPLATES/oraclelinux7.9-template-v20210413"
  masterNode:
    count: 1
    cpu: 2
    memSize: 8192
    diskSize: 100
    template: "TEMPLATES/oraclelinux7.9-template-v20210413"
  infraNode:
    count: 2
    cpu: 2
    memSize: 8192
    diskSize: 100
    template: "TEMPLATES/oraclelinux7.9-template-v20210413"
  nodePools: []
  clusterPODCIDR: 172.21.0.0/16
  clusterSVCCIDR: 172.23.0.0/16
  clusterCIDR: 10.4.0.0/16
  sshPublicKeys:
    - /Users/user/.ssh/id_rsa.pub

Then run again the furyctl cluster apply command.

$ furyctl cluster apply

The new node can take up to five minutes to be appear in the cluster.

Destroy the cluster project

If you need to destroy the cluster project, first ensure there is nothing blocking the destroy of this infrastructure.

Then run:

$ furyctl cluster destroy
  Are you sure you want to destroy the cluster?
  Write 'yes' to continue
yes
ERRO[0002] Directory already exists
WARN[0002] error while initializing project subdirectories: Directory already exists
INFO[0002] terraform is up to date
INFO[0002] [INFO] running Terraform command: cluster/bin/terraform init -no-color -force-copy -input=false -lock-timeout=0s -backend=true -get=true -get-plugins=true -lock=true -upgrade=false -verify-plugins=true
INFO[0004] Destroying VSphere project
INFO[0004] [INFO] running Terraform command: cluster/bin/terraform fmt -no-color -write=true -list=false -diff=false ./cluster/vsphere.tfvars
INFO[0004] [INFO] running Terraform command: cluster/bin/terraform destroy -no-color -auto-approve -input=false -lock-timeout=0s -var-file=./cluster/vsphere.tfvars -lock=true -parallelism=10 -refresh=true
⣻ Destroying terraform project INFO[0121] VSphere destroyed
TODO
[FURYCTL]
Destroy phase completed.

Project directory: ./cluster
Terraform logs: ./cluster/logs/terraform.logs