JupyterHub is a multi-user notebook that enables multiple users to develop, research, and create. In this post, I am going to cover deploying JupyterHub to Amazon EKS with single user persistent storage backed by Amazon EBS and TLS termintation using the AWS Certificate Manager (ACM).

Before we dive in, make sure you have eksctl, kubectl, and Helm installed on your local machine. We will be using these tools to deploy the Kubernetes cluster and JupyterHub. I have installed these tools using Homebrew on Mac OS.

brew install helm kubernetes-cli eksctl

Note: This guide is based on the Zero to JupyterHub with Kubernetes guide.

Now that we have necessary tools, it is time to deploy the cluster. We will be using eksctl, the official CLI tool of Amazon EKS, to deploy a managed Kubernetes cluster on AWS. The configuration below will configure a Kubernetes cluster, a set of managed node groups, and configure the Amazon EBS CSI Driver with a IAM Role for Service Accounts for least privileged containers. Feel free to modify this file to meet your needs.

We create a node group in each availability zone so that we always have capacity in each availability zone. This is important when using Amazon EBS, where disks are specific to an availability zone. We have pre-configured the cluster for Cluster AutoScaler, but will not be covering the deployment in this post. To deploy Cluster AutoScaler after this guide, see the Amazon EKS documentation for deployment.

# file: cluster.yml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: jupyterhub
  region: us-east-2

iam:
  withOIDC: true
  serviceAccounts:
    - metadata:
        name: cluster-autoscaler
        namespace: kube-system
        labels:
            aws-usage: "cluster-ops"
            app.kubernetes.io/name: cluster-autoscaler
      attachPolicy:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Action:
              - "autoscaling:DescribeAutoScalingGroups"
              - "autoscaling:DescribeAutoScalingInstances"
              - "autoscaling:DescribeLaunchConfigurations"
              - "autoscaling:DescribeTags"
              - "autoscaling:SetDesiredCapacity"
              - "autoscaling:TerminateInstanceInAutoScalingGroup"
              - "ec2:DescribeLaunchTemplateVersions"
            Resource: '*'
    - metadata:
        name: ebs-csi-controller-sa
        namespace: kube-system
        labels:
            aws-usage: "cluster-ops"
            app.kubernetes.io/name: aws-ebs-csi-driver
      attachPolicy:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Action:
              - "ec2:AttachVolume"
              - "ec2:CreateSnapshot"
              - "ec2:CreateTags"
              - "ec2:CreateVolume"
              - "ec2:DeleteSnapshot"
              - "ec2:DeleteTags"
              - "ec2:DeleteVolume"
              - "ec2:DescribeInstances"
              - "ec2:DescribeSnapshots"
              - "ec2:DescribeTags"
              - "ec2:DescribeVolumes"
              - "ec2:DetachVolume"
            Resource: '*'

managedNodeGroups:
  - name: ng-us-east-2a
    instanceType: m5.large
    volumeSize: 30
    desiredCapacity: 1
    privateNetworking: true
    availabilityZones:
      - us-east-2a
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/jupyterhub: "owned"
  - name: ng-us-east-2b
    instanceType: m5.large
    volumeSize: 30
    desiredCapacity: 1
    privateNetworking: true
    availabilityZones:
      - us-east-2b
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/jupyterhub: "owned"
  - name: ng-us-east-2c
    instanceType: m5.large
    volumeSize: 30
    desiredCapacity: 1
    privateNetworking: true
    availabilityZones:
      - us-east-2c
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/jupyterhub: "owned"

Next, deploy the cluster using this configuration.

eksctl create cluster -f ./cluster.yml

This will take 15-25 minutes to deploy. While this is deploying we can configure our values.yml file that we will use configure JupyterHub. To get started, we need to configure a secret token that is used by JupyterHub. Run the following command to generate this string.

# generate a secret token
openssl rand -hex 32

Next, we need configure the rest of our values.yml file. Start by replacing the value of secretToken with the value from the previous command. To get started, we are disabled HTTPS, don’t worry we will configure it later. Then we are specifying a dummy authentication provider, if you have an authentication provider such as OAuth, OIDC, or LDAP, feel free to replace the configuration to meet your needs. Finally, we are configuring JupyterHub to use a Kubernetes Storage Class to provision disks for our users. In this case, we will be using the Amazon EBS CSI Driver to provide EBS disks for each user.

# file: values.yml
proxy:
  secretToken: <replace with value from previous comamand>
  https:
    enabled: false

auth:
  type: dummy
  dummy:
    password: 'supersecretpassword!'
  whitelist:
    users:
      - admin

singleuser:
  storage:
    capacity: 4Gi
    dynamic:
      storageClass: gp2

Once the cluster creation has completed, we need to deploy the Amazon EBS CSI Driver. Run the following commands to deploy and configure the Amazon EBS CSI Driver. We are using the out of the box configuration, feel free to create additional Storage Classes with various EBS settings to meet your needs.

# deploy the ebs csi driver for persistent storage
kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=master"

We are now ready to deploy JupyterHub. First, we need to add the JupyterHub Helm chart to our local machine. Then we will deploy the chart using the values.yml file we configured earlier.

# setup helm
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update

# deploy jupyterhub
helm install jupyterhub jupyterhub/jupyterhub \
  --values values.yml

This will take a few minutes to settle. Once it is settled, run the following command to get the URL of the load balancer.

kubectl get svc proxy-public

Your output should look similar to this:

NAME           TYPE           CLUSTER-IP      EXTERNAL-IP                                                              PORT(S)        AGE
proxy-public   LoadBalancer   10.100.61.130   a86c837dfe75545c8b3e311621278e82-357827081.us-east-2.elb.amazonaws.com   80:31950/TCP   15m

Navigate to the http version of the load balancer URL output by the previous command, you should be able to login using the dummy credentials we configured above.

JupyterHub Dashboard

However, we can make this more secure. Next, let’s use AWS Certificate Manager (ACM) to configure an SSL certificate on the load balancer. To do this, we need to update our values.yml file like:

proxy:
  secretToken: <replace with token>
  https:
    enabled: true
    type: offload
  service:
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "<arn of certificate>"
      service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"
      service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https"
      service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "3600"

auth:
  type: dummy
  dummy:
    password: 'supersecretpassword!'
  whitelist:
    users:
      - admin

singleuser:
  storage:
    capacity: 4Gi
    dynamic:
      storageClass: gp2

Once you have updated your configuration, run the following command to apply the updates:

helm upgrade jupyterhub jupyterhub/jupyterhub --values values.yml

Now we need to create a DNS record that points our load balancer to the URL that matches our ACM certificate. In my case, this is jupyterhub.arhea.io. Optionally, you can configure External DNS with Kubernetes to automatically configure this for you.

That’s it! You now have a highly scalable deployment of JupyterHub on Amazon EKS. To improve this deployment, I recommend looking at External DNS to automatically register the load balancer with your DNS provider, Cluster AutoScaler to automatically scale your cluster based on usage, and the Amazon EFS CSI Driver to attach shared storage to all user environments.