JupyterHub is a multi-user notebook that enables multiple users to develop, research, and create. In this post, I am going to cover deploying JupyterHub to Amazon EKS with single user persistent storage backed by Amazon EBS and TLS termintation using the AWS Certificate Manager (ACM).
Before we dive in, make sure you have eksctl, kubectl, and Helm installed on your local machine. We will be using these tools to deploy the Kubernetes cluster and JupyterHub. I have installed these tools using Homebrew on Mac OS.
brew install helm kubernetes-cli eksctl
Note: This guide is based on the Zero to JupyterHub with Kubernetes guide.
Now that we have necessary tools, it is time to deploy the cluster. We will be using eksctl, the official CLI tool of Amazon EKS, to deploy a managed Kubernetes cluster on AWS. The configuration below will configure a Kubernetes cluster, a set of managed node groups, and configure the Amazon EBS CSI Driver with a IAM Role for Service Accounts for least privileged containers. Feel free to modify this file to meet your needs.
We create a node group in each availability zone so that we always have capacity in each availability zone. This is important when using Amazon EBS, where disks are specific to an availability zone. We have pre-configured the cluster for Cluster AutoScaler, but will not be covering the deployment in this post. To deploy Cluster AutoScaler after this guide, see the Amazon EKS documentation for deployment.
# file: cluster.yml apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: jupyterhub region: us-east-2 iam: withOIDC: true serviceAccounts: - metadata: name: cluster-autoscaler namespace: kube-system labels: aws-usage: "cluster-ops" app.kubernetes.io/name: cluster-autoscaler attachPolicy: Version: "2012-10-17" Statement: - Effect: Allow Action: - "autoscaling:DescribeAutoScalingGroups" - "autoscaling:DescribeAutoScalingInstances" - "autoscaling:DescribeLaunchConfigurations" - "autoscaling:DescribeTags" - "autoscaling:SetDesiredCapacity" - "autoscaling:TerminateInstanceInAutoScalingGroup" - "ec2:DescribeLaunchTemplateVersions" Resource: '*' - metadata: name: ebs-csi-controller-sa namespace: kube-system labels: aws-usage: "cluster-ops" app.kubernetes.io/name: aws-ebs-csi-driver attachPolicy: Version: "2012-10-17" Statement: - Effect: Allow Action: - "ec2:AttachVolume" - "ec2:CreateSnapshot" - "ec2:CreateTags" - "ec2:CreateVolume" - "ec2:DeleteSnapshot" - "ec2:DeleteTags" - "ec2:DeleteVolume" - "ec2:DescribeInstances" - "ec2:DescribeSnapshots" - "ec2:DescribeTags" - "ec2:DescribeVolumes" - "ec2:DetachVolume" Resource: '*' managedNodeGroups: - name: ng-us-east-2a instanceType: m5.large volumeSize: 30 desiredCapacity: 1 privateNetworking: true availabilityZones: - us-east-2a tags: k8s.io/cluster-autoscaler/enabled: "true" k8s.io/cluster-autoscaler/jupyterhub: "owned" - name: ng-us-east-2b instanceType: m5.large volumeSize: 30 desiredCapacity: 1 privateNetworking: true availabilityZones: - us-east-2b tags: k8s.io/cluster-autoscaler/enabled: "true" k8s.io/cluster-autoscaler/jupyterhub: "owned" - name: ng-us-east-2c instanceType: m5.large volumeSize: 30 desiredCapacity: 1 privateNetworking: true availabilityZones: - us-east-2c tags: k8s.io/cluster-autoscaler/enabled: "true" k8s.io/cluster-autoscaler/jupyterhub: "owned"
Next, deploy the cluster using this configuration.
eksctl create cluster -f ./cluster.yml
This will take 15-25 minutes to deploy. While this is deploying we can configure our
values.yml file that we will use configure JupyterHub. To get started, we need to configure a secret token that is used by JupyterHub. Run the following command to generate this string.
# generate a secret token openssl rand -hex 32
Next, we need configure the rest of our
values.yml file. Start by replacing the value of
secretToken with the value from the previous command. To get started, we are disabled HTTPS, don’t worry we will configure it later. Then we are specifying a dummy authentication provider, if you have an authentication provider such as OAuth, OIDC, or LDAP, feel free to replace the configuration to meet your needs. Finally, we are configuring JupyterHub to use a Kubernetes Storage Class to provision disks for our users. In this case, we will be using the Amazon EBS CSI Driver to provide EBS disks for each user.
# file: values.yml proxy: secretToken: <replace with value from previous comamand> https: enabled: false auth: type: dummy dummy: password: 'supersecretpassword!' whitelist: users: - admin singleuser: storage: capacity: 4Gi dynamic: storageClass: gp2
Once the cluster creation has completed, we need to deploy the Amazon EBS CSI Driver. Run the following commands to deploy and configure the Amazon EBS CSI Driver. We are using the out of the box configuration, feel free to create additional Storage Classes with various EBS settings to meet your needs.
# deploy the ebs csi driver for persistent storage kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=master"
We are now ready to deploy JupyterHub. First, we need to add the JupyterHub Helm chart to our local machine. Then we will deploy the chart using the
values.yml file we configured earlier.
# setup helm helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/ helm repo update # deploy jupyterhub helm install jupyterhub jupyterhub/jupyterhub \ --values values.yml
This will take a few minutes to settle. Once it is settled, run the following command to get the URL of the load balancer.
kubectl get svc proxy-public
Your output should look similar to this:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE proxy-public LoadBalancer 10.100.61.130 a86c837dfe75545c8b3e311621278e82-357827081.us-east-2.elb.amazonaws.com 80:31950/TCP 15m
Navigate to the http version of the load balancer URL output by the previous command, you should be able to login using the dummy credentials we configured above.
However, we can make this more secure. Next, let’s use AWS Certificate Manager (ACM) to configure an SSL certificate on the load balancer. To do this, we need to update our values.yml file like:
proxy: secretToken: <replace with token> https: enabled: true type: offload service: annotations: service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "<arn of certificate>" service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp" service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https" service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "3600" auth: type: dummy dummy: password: 'supersecretpassword!' whitelist: users: - admin singleuser: storage: capacity: 4Gi dynamic: storageClass: gp2
Once you have updated your configuration, run the following command to apply the updates:
helm upgrade jupyterhub jupyterhub/jupyterhub --values values.yml
Now we need to create a DNS record that points our load balancer to the URL that matches our ACM certificate. In my case, this is
jupyterhub.arhea.io. Optionally, you can configure External DNS with Kubernetes to automatically configure this for you.
That’s it! You now have a highly scalable deployment of JupyterHub on Amazon EKS. To improve this deployment, I recommend looking at External DNS to automatically register the load balancer with your DNS provider, Cluster AutoScaler to automatically scale your cluster based on usage, and the Amazon EFS CSI Driver to attach shared storage to all user environments.