Multi-tenancy in Elastic Cloud on Kubernetes deployments: Example architectures

Explore architectural strategies for multi-tenancy ECK deployments, including soft vs hard multi-tenancy, Kubernetes isolation, and Elastic operator considerations.

Tackling complex infrastructural and architectural challenges is one of the tasks we have the most fun solving here at Elastic. Designing scalable and efficient multi-tenant architectures can indeed be a hard challenge, especially in variegated enterprise environments… but which engineer doesn't love solving problems?

In a previous post, we mentioned how to get started with ECK on GKE, using infrastructure as code practices that allow us to control our infrastructure in a managed and controlled way using Git and GitOps. In this post, we'll explore various architectural approaches adopted by our customers for achieving multi-tenancy with Elastic Cloud on Kubernetes (ECK), outlining their respective strengths and weaknesses to help you find the best fit for your needs.

Disclaimer

This blog post will cover the topic of multi-tenant ECK deployments. In other words, we will focus on the allocation of resources within Kubernetes. For multi-tenancy within Elasticsearch, please refer to the official Elastic documentation.

Multi-tenancy: a definition

Multi-tenancy in Kubernetes refers to the ability to run multiple users, teams, or organizations (tenants) on a shared Kubernetes cluster while guaranteeing isolation, security, and resource fairness. Multi-tenancy enables efficient resource utilization and cost savings, alongside simplified operations, by avoiding the need for separate clusters for each tenant, which could end up in a so-called cluster sprawl.

We can also "split" multi-tenancy into two sub-categories: hard and soft multi-tenancy. Hard multi-tenancy can be seen as "physical" separation (e.g. host-level), whereas soft multi-tenancy is more of a "logical" separation (e.g. Kubernetes namespaces).

Why does multi-tenancy matter for ECK?

While multi-tenancy within Elasticsearch and Kibana is frequently discussed and well documented, the architectural task of deciding how to distribute Elastic Stack deployments on Kubernetes is oftentimes highly dependent on the user's specific environment and requirements. Defining a multi-tenancy strategy requires understanding and balancing numerous implications regarding cost optimization, efficient resource allocation, the chance of dealing with noisy neighbors, the availability of the necessary hardware, and many other small yet important details.

Reference architecture #0: all-in-one

As someone testing out ECK for the first time, this will likely be the architecture you land upon for a quick test run. For now, this deployment model is absolutely fine and provides a highly instructive setup to learn more about the platform and its features. Just run helm install or kubectl apply as explained in the docs and start learning about ECK. We will have plenty of time to make this more resilient and performant anyway.

In this simplified architecture, we have a single Kubernetes cluster running:

  • Two monitoring Elasticsearch clusters: one for production deployments of the Elastic Stack, and one for non-production deployments of the Stack
  • A given number of production and non-production Elasticsearch clusters

Of course, in a real-world scenario, we would most likely also have Kibana deployments and much more. For now, we are focusing on Elasticsearch as the core of our platform, and because it is a stateful application, it requires a tad more care than Kibana or Logstash.

While this architecture works perfectly fine for very simple, non-mission-critical deployments, its shortcomings will appear quickly when trying to apply good engineering practices to it.

In particular, we can clearly see the pros:

  • Easy management: one Kubernetes cluster, one elastic-operator
  • Cost optimization: due to only having one Kubernetes cluster packed with pods

However, the cons will soon outweigh the pros and make you deeply resent your decision:

  • There is no environment in which to test Kubernetes and elastic-operator upgrades, which means each upgrade is going to be fire-and-pray.
  • Depending on the implementation, this architecture could become a noisy-neighbors party. For instance, a misconfigured development cluster could saturate the underlying host's resources or bandwidth, thus degrading the performance of the pods deployed on the same host.

Reference architecture #1: production and non-production

Given the shortcomings of the previous architecture, let's move on to something that allows us to separate production workloads from the rest. This sounds like a reasonable next step to improve our platform design while resisting the temptation to over-engineer.

In this setup, we will have:

  • One "data plane" Kubernetes cluster for production workloads
  • One "data plane" Kubernetes cluster for non-production workloads

We can think about different ways to distribute the monitoring clusters.

The two main options are:

  • Both the production and non-production monitoring clusters live in a single, separate Kubernetes cluster
  • Each monitoring cluster lives in its own Kubernetes cluster

Regardless of which option is chosen, it is fundamental to separate the monitoring Elasticsearch clusters from the other clusters. We do not want to end up in a situation where, due to some issues, we lose both our "workload" Elasticsearch clusters and their respective monitoring clusters.

This architecture seems like a good compromise as it allows us to fully separate productive and non-productive workloads. This means that we can test numerous upgrades (Kubernetes, Elasticsearch, and elastic-operator) in a safe environment before promoting them to production. This level of isolation also guarantees that a misconfigured low environment (e.g. a development cluster) will not affect production clusters.

In the next sections, we will present two different iterations of this architecture - one relying on soft multi-tenancy, and one relying on hard multi-tenancy. An interesting by-product of such architectures is also that, given the structure of Elastic's licensing (one Enterprise license can be shared across multiple operators). With the elastic-operator requiring low resources (e.g., limits of 1Gi and 1vCPU in the Helm chart), the cost overhead for both resources and licenses is minimal.

Reference architecture #1 with team-based soft multi-tenancy

In this architecture, the high-level setup remains exactly the same. What changes is the allocation of the pods in the data plane clusters. Let's focus on it:

In this architecture, we use Kubernetes namespaces to achieve "soft" multi-tenancy. This means that each node in the Kubernetes cluster can host pods (i.e. Elasticsearch nodes) belonging to different Elasticsearch clusters and different teams. However, namespaces limit the number of people that can access them via RBAC, so that each team can only access their own workloads.

The architecture presented here is most likely the sweet spot between complexity and flexibility for most multi-tenancy use cases within ECK deployments. This particular design helps in scenarios where tenants require a degree of independence in their Elastic Stack configurations, while still leveraging the operational efficiencies provided by a shared Kubernetes infrastructure.

Reference architecture #1 with team-based hard multi-tenancy

This architecture is a bit more convoluted than the previous one, but it might be interesting in case there is a need to "physically" separate the various Elasticsearch clusters. Similarly to before, the high-level setup remains exactly the same as Reference Architecture #1. What changes is the allocation of the pods in the data plane clusters.

In this architecture, we use Kubernetes namespaces to achieve "soft" multi-tenancy, and pair that with Kubernetes taints, tolerations, and nodeAffinity to ensure "hard" multi-tenancy, so that a node in the Kubernetes cluster will only host pods for Elasticsearch clusters belonging to the same team. This scenario enforces stricter separation of concerns, but comes at a cost: it is, in fact, highly unlikely that such a deployment would allow for a similar level of resource and cost efficiency, since it probably requires more nodes to be added to the Kubernetes cluster than strictly necessary, likely ending up with some of them being under-utilized.

As mentioned, this architecture is not always needed nor recommended. Such a strong level of isolation could, however, make sense in scenarios such as:

  • PCI-DSS (Payment Card Industry Data Security Standard), where it may help with requirements for restricting access to cardholder data by business "need to know"
  • HIPAA (Health Insurance Portability and Accountability Act), which mandates strict data segregation, especially for ePHI (electronic protected health information)
  • Government or Defense Environments where compartmentalization between projects or groups might be mandatory (e.g. FedRAMP)

An example of Elasticsearch CR follows. We assume that nodes are tainted and labeled as shown, for demo purposes.

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: cluster-a
  namespace: team-a
spec:
  [...]
  podTemplate:
    metadata:
      labels:
        elasticsearch.k8s.elastic.co/cluster-name: cluster-a
    spec:
      tolerations: # I "accept" the nodes defined for my team
        - key: "taints.demo.elastic.co/team"
          operator: "Equal"
          value: "team-a"
          effect: "NoSchedule"
      affinity: # I want the nodes defined for my project
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: "labels.demo.elastic.co/team"
                    operator: "In"
                    values:
                      - "team-a"
        podAntiAffinity: # Try to not put me on the same host where other pods for the same ES cluster are running
          preferredDuringSchedulingIgnoredDuringExecution: # or requiredDuring...
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    elasticsearch.k8s.elastic.co/cluster-name: cluster-a
                topologyKey: kubernetes.io/hostname

It would even be possible to take this architecture one step further, enforcing a node in the Kubernetes cluster to only host pods for a single Elasticsearch cluster, guaranteeing an even higher level of resource isolation - but also a lower level of resource efficiency.

Reference architecture #2: one Kubernetes cluster per Elasticsearch deployment

Many Elastic Stack admins opt for having a 1:1 mapping between Elasticsearch clusters and Kubernetes clusters, meaning that each Kubernetes cluster is fully dedicated to one single Elasticsearch cluster. This allows for even stronger hard multi-tenancy and does not require configurations such as taints and tolerations, but requires the capability to run a fleet of Kubernetes clusters, which is a task on its own. As mentioned earlier in this article, you really don't want to deal with a cluster sprawl; hence, your automation and fleet management need to be mature and rock-solid.

Such an architecture is 100% valid for use-cases in which full isolation is needed - again, think about environments adhering to particularly strict resource segregation requirements (e.g. FedRAMP), where not even hard multi-tenancy might be considered a viable option for particularly sensitive use-cases. This architecture, however, is not covered in depth in this article, as such a design does not pose many multi-tenancy challenges - it is one of the strongest multi-tenancy mechanisms you can expect.

In the sample diagram, you will see several Kubernetes clusters, each of which will run the elastic-operator and exactly one Elasticsearch cluster. Freedom is left as to how to manage monitoring clusters: for example, an approach based on a shared monitoring cluster and then some additional dedicated monitoring clusters for highly sensitive workloads could be designed.

Bonus section: elastic-operator multi-tenancy

So far, we have mostly covered multi-tenancy with a focus on the Elasticsearch clusters themselves. However, it can be argued that the Elastic operator's multi-tenancy should also be kept in mind. For instance, some elastic-operator upgrades might lead to rolling restarts of the Elasticsearch clusters managed by that operator, especially in a specific set of versions.

To be able to have finer-grained control over the operator upgrade process, it is possible to think about architectures in which one operator only belongs to a tenant, such as a development team or business area within the company.

This type of segregation also allows for having namespaced CRDs, enabling a more granular rollout of newer versions. The result is being able to further control the rolling restart of Elasticsearch clusters, in case the target elastic-operator version happens to require them (e.g. a new, major version).

Multiple elastic-operator deployments enable you to apply promotion logic not just to entire environments, but to specific tenants within them, effectively creating a sort of canary deployment model.

Conclusions

Takeaway #1

There are numerous layers at which we can tackle ECK multi-tenancy:

  • Within the Elasticsearch cluster
  • Within the Kubernetes cluster, logically
  • Within the Kubernetes cluster, "physically"
  • At the cloud provider layer (security groups, NACLs, …)
  • At the operator and control-plane layer

Ideally, a combination of the above will allow us to achieve our desired goal

Takeaway #2

As always, there’s no such thing as a free lunch:

  • More nodes => more "wasted" resources
    • DaemonSets have a higher impact on the percentage of resources used
    • It becomes harder to do bin-packing
  • More Load Balancers => higher base cost
  • More Kubernetes clusters => higher base cost

Takeaway #3

As often said, practicality beats purity:

  • Some architectures that look perfect on paper can be hard to manage and lead to way-too-high costs. These architectures will end up creating a worse experience than simpler, slightly less perfect solutions
  • Kubernetes was born to run tens of pods on the same node. Forcing a node to run just 1 or 2 actual workloads because we don't trust Kubernetes to handle them properly might not be a great idea…unless we actually don't trust ourselves to be able to come up with a proper configuration.

Test Elastic's leading-edge, out-of-the-box capabilities. Dive into our sample notebooks, start a free cloud trial, or try Elastic on your local machine now.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself