KubernetesLagom

Lagom kubernetes setup considerations

By 17.02.2019. June 7th, 2019 10 Comments

Problem

Running and testing services in Lagom development environment, with its support for all needed tooling, is a smooth and straight forward process. Everything is prepared and ready to use out-of-the-box.

Deploying and running it in production requires complete environment setup and it is not as straight forward at first (at least it was not for me :)).

Lagom (>1.5.1) comes with out-of-the box support for running on kubernetes.

Kubernetes cluster setup depends on the chosen kubernetes implementation and it is out of the scope of this blog. Personally, I’m using Amazon EKS.

So question is, when you have kubernetes cluster running, what else is required to deploy and run your Lagom microservice system.

Solution

Kubernetes basics

I will assume you have a basic knowledge of kubernetes and if not I strongly recommend going through kubernetes official documentation with focus on:

Kubernetes cluster management access

Kubernetes cluster is managed via kubectl CLI tool that needs to be preconfigured to access certain kubernetes cluster.

When you have multiple kubernetes clusters running (test, production #1, production #2,….) switching between kubectl configurations is prone to errors, ending up in performing operations on the wrong cluster. To avoid it I tend to use a bastion host based solution.

Depending on the kubernetes cluster network location you could:

  • dedicate bastion host per kubernetes cluster
  • dedicate bastion host OS user per kubernetes cluster

Kubernetes namespace organization

Kubernetes uses namespaces to support multiple virtual clusters on one physical cluster. It can also be used for multi-tenant deployments but I like to avoid it to keep the setup as simple as possible.

By default, Kubernetes comes with 3 preconfigured namespaces: default, kube-public, kube-system.

I use kube-system for deploying and running kubernetes resources unrelated to my Lagom system.

For Lagom system you could use default but I like to create a separate namespace for grouping my Lagom system services in one logical group.

Example namespace resource configuration:


apiVersion: v1
kind: Namespace
metadata:
  name: lagom
  labels:
    name: lagom

Helm

Helm is kubernetes package manager. I see it as kubernetes APT based tool.

My personal Helm benefits are:

  • using different Helm repositories to get access to official and community created kubernetes tools (you will see in later which are those) in purpose of simplifying its configuration and deployment
  • template Lagom kubernetes resources to simplify its configuration and deployment when having high count of services

Lagom kubernetes support

Deploying Lagom on kubernetes requires manual creation of kubernetes resources (deployment, service, ingress).

For kubernetes resource maintenance optimization I would recommend creating your own Helm Chart that would be reused (by supplying service specific values.yaml) for all your Lagom services.

Example of deployment.yaml:

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: "account-v1-0-0"
  labels:
    app: "account"
    appVersion: "account-v1-0-0"
  namespace: lagom
spec:
  replicas: {initialNumberOfReplicas}
  selector:
    matchLabels:
      appVersion: "account-v1-0-0"
  template:
    metadata:
      labels:
        app: "account"
        appVersion: "account-v1-0-0"
    spec:
      restartPolicy: Always
      containers:
        - name: "account"
          image: "{dockerRepositoryUrl}/account_impl:1.0.0"
          imagePullPolicy: Always
          env:
            - name: "JAVA_OPTS"
              value: "
-Dconfig.resource=production.conf                       
-Dplay.http.secret.key={playSecret}                  
-Dlogger.resource=logback-prod.xml                 
-Dplay.server.pidfile.path=/dev/null            
-Dlagom.akka.discovery.service-name-mappings.elastic-search.lookup=
  _http._tcp.elasticsearch.lagom.svc.cluster.local               
-Dlagom.akka.discovery.service-name-mappings.cas_native.lookup=
  _cql._tcp.cassandra.lagom.svc.cluster.local        
-Dlagom.akka.discovery.service-name-mappings.kafka_native.lookup=
  _broker._tcp.kafka.lagom.svc.cluster.local
"
            - name: "REQUIRED_CONTACT_POINT_NR"
              value: "{initialNumberOfReplicas}"
            - name: "SERVICE_NAMESPACE"
              value: "lagom"
            - name: "SERVICE_NAME"
              value: "account"
            - name: "KUBERNETES_POD_IP"
              valueFrom:
                fieldRef:
                  fieldPath: "status.podIP"
          ports:
            - containerPort: 9000
              name: http
            - containerPort: 2552
              name: remoting
            - containerPort: 8558
              name: management
          readinessProbe:
            httpGet:
              path: "/ready"
              port: "management"
            periodSeconds: 10
            initialDelaySeconds: 20
            failureThreshold: 4
          livenessProbe:
            httpGet:
              path: "/alive"
              port: "management"
            periodSeconds: 10
            initialDelaySeconds: 60
            failureThreshold: 2
          resources:
            requests:
             cpu: 0.5
             memory: "512Mi"

Note: JAVA_OPTS have been formatted for better visibility, so in case of copy/paste format it in one line.

Example of service.yaml:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: "account"
  name: "account"
  namespace: "lagom"
spec:
  ports:
    - name: http
      port: 9000
      protocol: TCP
      targetPort: 9000
    - name: remoting
      port: 2552
      protocol: TCP
      targetPort: 2552
    - name: management
      port: 8558
      protocol: TCP
      targetPort: 8558
  selector:
    app: "account"

Example of ingress.yaml:

apiVersion: "extensions/v1beta1"
kind: Ingress
metadata:
  name: "account-internal-ingress"
  annotations:
    kubernetes.io/ingress.class: "nginx-internal"
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
    ingress.kubernetes.io/ssl-redirect: "false"
  namespace: "msrp"
spec:
  rules:
    - http:
        paths:
          - path: "/api/account"
            backend:
              serviceName: "account"
              servicePort: 9000

---
apiVersion: "extensions/v1beta1"
kind: Ingress
metadata:
  name: "account-external-ingress"
  annotations:
    kubernetes.io/ingress.class: "nginx-external"
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
    ingress.kubernetes.io/ssl-redirect: "false"
  namespace: "msrp"
spec:
  rules:
    - http:
        paths:
          - path: "/api/external/account"
            backend:
              serviceName: "account"
              servicePort: 9000

Example of production.conf (Java):

 
include "application.conf"

lagom.cluster.exit-jvm-when-system-terminated = on

akka {

  actor {
    provider = cluster
  }

  cluster {
    shutdown-after-unsuccessful-join-seed-nodes = 60s
  }
  
  discovery {
  	method = akka-dns
  	kubernetes-api {
	    pod-namespace = ${SERVICE_NAMESPACE}
	    pod-label-seleactor = "app=%s"
	    pod-port-name = management
	}
  }
  
  management {
   cluster {
      bootstrap {
        contact-point-discovery {
          discovery-method = kubernetes-api
          service-name = ${SERVICE_NAME}
          required-contact-point-nr = ${REQUIRED_CONTACT_POINT_NR}
          protocol = "tcp"
          kubernetes-api {
            pod-namespace = ${SERVICE_NAMESPACE}
            pod-port-name = management
            pod-label-selector = "app=%s"
          }
        }
      }
    }
	http {
       port = 8558
       bind-hostname = ${KUBERNETES_POD_IP}
    }
  }
}

Check official doc for more details: Running Lagom in production

I also recommend checking blog CPU considerations for Java applications running in Docker and Kubernetes.

Lagom service call  access control

Service API calls can be categorized, depending on the access control requirement, in:

  • internal calls
  • external calls

Internal calls are accessed internally by the trusted callers (service to service running in same Kubernetes cluster) and by that does not require any access control. Communication is not required to be encrypted and caller does not require authentication (simple caller identification could be used if required).

In kubernetes cluster, service to service communication is done by directly accessing service POD IPs and port. When one service wants to connect to another, caller service uses Lagom ServiceLocator to locate, via kubernetes API/Akka DNS, called service POD IPs and port. Lagom services are located based on the service name specified in API descriptor. Service name is deployed as kubernetes service resource.


named("account-service")

Kubernetes resource names are restricted and by that service names are restricted. So it is important to follow the rules of these restrictions when defining service name in API descriptor!

External calls are accessed externally (from Internet) by the un-trusted callers and should require communication encryption and caller authentication.

In kubernetes, service external access is configured using ingress resource.

One Lagom service can have one or more ingress resources deployed, configuring what ACL are used.

In order for the ingress resource to work, the cluster must have an ingress controller deployed and running.

Mostly used ingress controller implementation is NGINX ingress controller. It is mostly used because it is supported and maintained by the kubernetes projects itself and can be deployed on almost all kubernetes implementations.

NGINX ingress controller enables you to manage the entire lifecycle of NGINX by subscribing to ingress resource events (ADD/REMOVE), via Kubernetes API, based on which NGINX location configuration is updated automatically, in runtime.

NGIX ingress controller can be deployed using NGINX Ingress controller HELM chart.  Be sure to specify namespace (kube-system) and ingress controller name:

--namespace kube-system --set controller.ingressClass=nginx-external

Ingress resource can be configured to target specific ingress controller, if multiple controllers are running, by specifying ingress controller name in ingress resource annotation:

kubernetes.io/ingress.class: nginx-external

You can check annotation usage in Lagom kubernetes support section ingress.yaml example.

So ingress resource is used to expose service external access. If service requires both internal and external access it is required to differentiate it.

This can be done by specifying different URL based contexts.

For example:


/api/accounts # with internal context

/api/external/accounts # with external context

For /api/external/accounts it is required to deploy ingress resource to allow external access and for /api/accounts ingress resource is NOT required because it is used as internal access.

Kubernetes SSL/TLS encryption support

In kubernetes, SSL/TLS configuration is configured by using specific annotations in ingress resource. Ingress controller, based on this configuration, configures and implements SSL/TLS termination.

By this definition we need to apply SSL/TLS configuration for every external accessible service Ingress resource that will not be convenient to maintain. In most use cases the same SSL/TLS configuration (single SSL/TLS termination point) is used for accessing all external accessible services.

To resolve this we could use these two solutions (that I’m aware of):

  1. Use and configure NGINX controller with Default SSL configuration
  2. Deploy additional Ingress controller dedicated for SSL/TLS termination

Solution #1 is explained in referenced documentation.

Solution #2 is to deploy one extra Ingress controller dedicated for SSL/TLS termination and deploy “singleton” Ingress resource that will configure SSL/TLS for SSL/TLS termination and forward all traffic to already created NGINX controller. Singleton in this context means that only one ingress resource is deployed.

With this solution we “extracted” SSL/TLS termination point from already created Ingress controller and by that avoided configuring SSL/TLS configuration per service ingress resource.

For SSL/TLS dedicated ingress controller we could use:

  • NGINX Ingress controller
  • depending on the kubernetes implementation used, cloud provider specific Ingress controller. I use Amazon EKS ALB Ingress controller that leverages Amazon Application Loadbalancer.

If your kubernetes implementation allows only NGINX Ingress controller, solution #2, in general, does not make sense and I would suggest going with solution #1.

In case of using cloud provider, cloud provider specific Ingress controller brings advantage of securing external access outside of your kubernetes environment. In opposite of NGINX Ingress controller running inside of kubernetes cluster.

Example deploying Amazon EKS ALB Ingress controller using aws-alb-ingress-controller helm chart

 
helm install incubator/aws-alb-ingress-controller --name=external-alb-ingress-controller --namespace kube-system --set autoDiscoverAwsRegion=true --set autoDiscoverAwsVpcID=true --set clusterName=myK8s 

AWS ALB singleton ingress resource:

apiVersion: extensions/v1beta1 
kind: Ingress metadata: 
name: "ssl-alb" 
namespace: kube-system
labels: 
   app: "sslAlb" 
   annotations:
     kubernetes.io/ingress.class: "alb" 
     alb.ingress.kubernetes.io/scheme: "internet-facing" 
     alb.ingress.kubernetes.io/target-type: "instance" 
     alb.ingress.kubernetes.io/security-groups:{mySecurityGroupIds}, ...  
     alb.ingress.kubernetes.io/subnets: {myVPCSubnetIds}, ... 
     alb.ingress.kubernetes.io/certificate-arn: {mYAcmCertificateArn}
     alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80,"HTTPS": 443}]' 
     alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}' 
     alb.ingress.kubernetes.io/healthcheck-path: "/" 
     alb.ingress.kubernetes.io/success-codes: "200,404" 
spec: 
rules: 
- http: paths: 
 - path: /* 
   backend: 
     serviceName: ssl-redirect 
     servicePort: use-annotation 
 - path: /* 
   backend: 
     serviceName: "external-nginx-ingress-controller-controller" 
     servicePort: 80

Solution #1:

Solution #2:

External access authentication

For authentication different methods are applicable (HTTP auth, JWT, mutual SSL,..).

Lagom access to “external services” (Cassandra and Kafka)

Lagom services require access to Cassandra (or any other journal store) and, depending on the service use case, optionally access to Kafka. In Lagom this falls in category of “external services”.

Cassandra and Kafka can be deployed, depending on the preferences:

  • in kubernetes cluster
  • on dedicated hosts
  • SAAS

I personally use dedicated hosts for these reasons:

  • prior running Lagom system on kubernetes I was running it on Lightbend ConductR where recommend Cassandra and Kafka deployment was on dedicated hosts. When migrating Lagom system to kubernetes it was not possible to migrate Cassandra and Kafka because of to long required downtime
  • when starting with Lagom there were not so many SAAS options available

Lagom uses kubernetes DNS SRV method for allocating external service endpoints.

DNS SRV records, in kubernetes, are generated from kubernetes service resource. In sense of external services, kubernetes service resource is abstracting external service access and by that its deployment type.

If external services are deployed in kubernetes cluster, Cassandra and/or Kafka kubernetes service resource will be deployed and by that DNS SRV records will be generated.

If external services are deployed outside of kubernetes cluster (dedicated hosts or SAAS), kubernetes Headless service can be used to configured it. Headless service, as “regular” kubernates service resource, generates DNS SRV records.

DNS SRV record needs to be configured using configuration parameter:

lagom.akka.discovery.service-name-mappings.{externalServiceName}.lookup

Example of cassandra headless service resource:

apiVersion: v1
kind: Service
metadata:
  name: cassandra
  namespace: lagom
spec:
  ports:
  - name: "cql"
    protocol: "TCP"
    port: 9042
    targetPort: 9042
    nodePort: 0

---
apiVersion: v1
kind: Endpoints
metadata:
 name: cassandra
 namespace: lagom
subsets:
 - addresses:
     - ip: 10.0.1.85
     - ip: 10.0.2.57
     - ip: 10.0.3.106
   ports:
     - name: "cql"
       port: 9042

DNS SRV record example:

_cql._tcp.cassandra.lagom.svc.cluster.local

Lagom external service DNS SRV name setup configuration (deployment.yaml ENV JAVA_OPTS parameter):

-Dlagom.akka.discovery.service-name-mappings.cas_native.lookup=_cql._tcp.cassandra.lagom.svc.cluster.local

Example of kafka headless service resource:

apiVersion: v1
kind: Service
metadata:
  name: kafka
  namespace: lagom
spec:
  ports:
  - name: "broker"
    protocol: "TCP"
    port: 9092
    targetPort: 9092
    nodePort: 0

---
apiVersion: v1
kind: Endpoints
metadata:
 name: kafka
 namespace: lagom
subsets:
 - addresses:
     - ip: 10.0.1.85
     - ip: 10.0.2.57
     - ip: 10.0.3.106
   ports:
     - name: "broker"
       port: 9092

DNS SRV record example:

_broker._tcp.kafka.lagom.svc.cluster.local

Lagom external service DNS SRV name setup configuration (deployment.yaml ENV JAVA_OPTS parameter):

 
-Dlagom.akka.discovery.service-name-mappings.kafka_native.lookup=_broker._tcp.kafka.lagom.svc.cluster.local 

Hope you found this useful. Please share your feedback in form of comment or like. Tnx

10 Comments

  • lynxpluto says:

    Hi! Does this setting -Dlagom.akka.discovery.service-name-mappings.cas_native.lookup=_cql._tcp.cassandra.lagom.svc.cluster.local require some additional configuration in application.conf and/or Module ? It seems without it the default “lookup” skips this setting, i.e. uses cassandra.default.contact-points which is ‘127.0.0.1’ by default. It is described here https://www.lagomframework.com/documentation/1.5.x/java/ProductionOverview.html#Using-static-Cassandra-contact-points

    • klikix says:

      Hi,

      This setting is used in akka-service-locator Lagom module:
      https://github.com/lagom/lagom/blob/628805dc6da419d0866ee9d1581838d5cf8f3157/akka-service-locator/core/src/main/scala/com/lightbend/lagom/internal/client/ServiceNameMapper.scala#L46

      You need to have AkkaDiscoveryServiceLocatorModule Play module enabled in your conf:
      play.modules.enabled += “com.lightbend.lagom.javadsl.akka.discovery.AkkaDiscoveryServiceLocatorModule”

      Check comment from Renato for 1.5.1: https://discuss.lightbend.com/t/lagom-1-4-12-and-1-5-1-releases/4105

      Hope this helps.

      BR,
      Alan

      • lynxpluto says:

        Ok. I found only that is said that AkkaDiscoveryServiceLocatorModule is added by default to your project and will be bind only in production mode 🙂 And this configuration play.modules.enabled += com.lightbend.lagom.javadsl.akka.discovery.AkkaDiscoveryServiceLocatorModule is done already in reference.conf. So if I understand it correctly nothing else should be done. But in my case the way to lookup Cassandra/Kafka via lagom.akka.discovery.service-name-mappings..lookup worked only for Kafka and Cassandra fell back to cassandra.default.contact-points. Of course it is possible that I’ve done mistake somewere else 🙁
        Anyway, thanks a lot! Nice article

        • klikix says:

          I do not think it should fallback to default contact point configuration if ServiceLocator is configured correctly. I would assume AkkaDiscoveryServiceLocator is not used.
          Can you check, in runtime what implementation of ServiceLocator interface is used (You can inject interface and print out implementation)?

          • lynxpluto says:

            My resulting application.conf contains

            akka.discovery.kubernetes-api.pod-domain = “k8s.test”
            play.modules.enabled += com.lightbend.lagom.javadsl.akka.discovery.AkkaDiscoveryServiceLocatorModule
            cassandra.default.contact-points = [“127.0.0.1”] # It is default
            lagom.akka.discovery.service-name-mappings.cas_native.lookup=_cql._tcp.reactive-sandbox-test-reactive-sandbox-cassandra.test # test is namespace of the k8s.

            Ping by SRV name is successfull
            # ping _cql._tcp.reactive-sandbox-test-reactive-sandbox-cassandra.test
            PING _cql._tcp.reactive-sandbox-test-reactive-sandbox-cassandra.test (10.233.108.161): 56 data bytes
            64 bytes from 10.233.108.161: seq=0 ttl=64 time=0.033 ms
            64 bytes from 10.233.108.161: seq=1 ttl=64 time=0.096 ms
            64 bytes from 10.233.108.161: seq=2 ttl=64 time=0.053 ms
            ^C

            Log output
            2019-06-07T17:02:04.843+0300 [error] akka.actor.OneForOneStrategy [sourceThread=delivery-service-akka.actor.default-dispatcher-15, akkaTimestamp=14:02:04.842UTC, akkaSource=akka://delivery-service/user/cassandraOffsetStorePrepare-singleton/singleton/cassandraOffsetStorePrepare, sourceActorSystem=delivery-service] – All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.TransportException: [/127.0.0.1:9042] Cannot connect))
            com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.TransportException: [/127.0.0.1:9042] Cannot connect))
            at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:268)
            at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:107)
            at com.datastax.driver.core.Cluster$Manager.negotiateProtocolVersionAndConnect(Cluster.java:1652)
            at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1571)
            at com.datastax.driver.core.Cluster.init(Cluster.java:208)
            at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:376)
            at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:355)
            at akka.persistence.cassandra.ConfigSessionProvider.$anonfun$connect$1(ConfigSessionProvider.scala:48)
            at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
            at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
            at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
            at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
            at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:92)
            at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
            at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85)
            at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:92)
            at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)

        • klikix says:

          If you check here: https://github.com/lagom/lagom/blob/628805dc6da419d0866ee9d1581838d5cf8f3157/persistence-cassandra/core/src/main/resources/reference.conf#L25
          default cassandra session provider is com.lightbend.lagom.internal.persistence.cassandra.ServiceLocatorSessionProvider.
          Only if you explicitly set it to: akka.persistence.cassandra.ConfigSessionProvider you will use static config.
          So by this there is no contact point fallback but one or another.
          Did you maybe set explicitly akka.persistence.cassandra.ConfigSessionProvider? If so remove it.

  • Awesome post! Keep up the great work! 🙂

  • Great content! Super high-quality! Keep it up! 🙂

  • Anonymous says:

    Great content! You made my day! Thank you.

Leave a Reply