Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

State of the document: Draft (It will updated to reflect current deployment practices after discussion with DevOps) 

Problem statement

There should be platform wide solution for logging aggregation in order to quickly identify and find the root causes of issues

There should be configurable alerting in order to get information about new the issues as soon as possible

Log entries for the same business transaction in different microservices should have unique correlation id in order to find all log entries for the same business transaction.

Assumption: Folio should be deployed in cluster for production

Current state

Okapi aggregates logs from all applications into single file.

Proposed Solution

Request-Id inclusion

The Mapped Diagnostic Context (MDC) can be leveraged in cases when it is possible. There should be implementation of mechanism in order to supply request-id to MDC. (e.g. MDC.put("requestId", "some-id");)

When user request routed to multiple, different microservices and something goes wrong and a request fails, this request-id that will be included with every log message and allow request tracking. 

Common JSON format

For any logging aggregation solution json format of logs is preferable over plain text in terms of performance and simplicity of parsing. All Folio modules should have the same json format for logs, e.g. the following:

Code Block
<configuration>
   <appender name="FILE" class="ch.qos.logback.core.FileAppender">
        <file>log/application.log</file>
        <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
            <layout class="ch.qos.logback.contrib.json.classic.JsonLayout">
                <jsonFormatter class="ch.qos.logback.contrib.jackson.JacksonJsonFormatter"/>
                <appendLineSeparator>true</appendLineSeparator>
            </layout>
        </encoder>
    </appender>

    <root level="debug">
        <appender-ref ref="FILE"/>
    </root>
</configuration>

Common library for logging

Common logging library should include common logback.xml and will have mechanism in order to supply request-id to MDC.

EFK

Cons:

  • There tools with richer functionality (e.g. datadog)

Alerting

There are many plugins available for watching and alerting on Elasticsearch index in Kibana e.g. X-Pack, SentiNL, ElastAlert. Alerting can be easily implemented in Kibana (see: https://www.elastic.co/blog/creating-a-threshold-alert-in-elasticsearch-is-simpler-than-ever)

Elastalert is open source simple and popular open source tool for alerting on anomalies, spikes, or other patterns of interest found in data stored in Elasticsearch. Elastalert works with all versions of Elasticsearch.

Deployment options

K8s deployment

Using a node level logging agent


Separate kube-logging namespace should be created into which EFK stack components should be installed. This Namespace will also allow one to quickly clean up and remove the logging stack without any loss of function to the Kubernetes cluster. For cluster high-availability 3 Elasticsearch

Pods should be deployed to avoid the “split-brain” issue (see A new era for cluster coordination in Elasticsearch and Voting configurations).

K8s deployment: Kibana

To launch Kibana on Kubernetes Service called kibana should be created in the kube-logging namespace. Deployment consists of one Pod replica. Latest kibana docker image located at: docker.elastic.co/kibana/. Range of 0.1 vCPU - 1 vCPU should be guaranteed to the Pod. 

K8s deployment: Fluentd 

Fluentd should be deployed as a DaemonSet, which is a Kubernetes workload type that runs a copy of a given Pod on each  node in the Kubernetes cluster (see: https://kubernetes.io/docs/concepts/cluster-administration/logging/#using-a-node-logging-agent).

Folio modules should use single common slf4j configuration, for writing JSON files on the nodes. The Fluentd Pod will tail these log files, filter log events, transform the log data, and ship it off to the Elasticsearch.  Fluentd DaemonSet spec provided by the Fluentd maintainers should be used along with docs provided by the Fluentd maintainers: Kuberentes Fluentd.

Service Account called fluentd that the Fluentd pods will use to access the Kubernetes API should be created in the kube-logging namespace with label app: fluentd (see: Configure Service Accounts for Pods in the official Kubernetes docs). ClusterRole with getlist, and watch permissions on the pods and namespaces objects should be created.

NoSchedule toleration should be defined to match the equivalent taint on Kubernetes master nodes. This will ensure that the DaemonSet also gets rolled out to the Kubernetes masters (see: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/). 

https://hub.docker.com/r/fluent/fluentd-kubernetes-daemonset/ provided by the Fluentd maintainers should be used. This Dockerfile and contents of this image are available in Fluentd’s fluentd-kubernetes-daemonset Github repo.

The following environment variables should be configured for Fluentd:

  • FLUENT_ELASTICSEARCH_HOST: Elasticsearch headless Service address defined earlier: elasticsearch.kube-logging.svc.cluster.local. This will resolve to a list of IP addresses for the 3 Elasticsearch Pods. The actual Elasticsearch host will most likely be the first IP address returned in this list. To distribute logs across the cluster, you will need to modify the configuration for Fluentd’s Elasticsearch Output plugin (see: Elasticsearch Output Plugin).
  • FLUENT_ELASTICSEARCH_PORT: 9200.
  • FLUENT_ELASTICSEARCH_SCHEME: http.
  • FLUENTD_SYSTEMD_CONF: disable.




Implementation steps 

StepStatus
Common logging format for all backend modules and infrastructure components (Okapi, PubSub)Agreed
The same format for all frontend modules
Format: json (configuration mentioned above)POC for performance (plain vs json)
Including properties: log level 
Including (if possible) request-id to all logs entries
Logging to files

Should be discussed with Dev-Ops (John Malconian )



Making optional artifact with EFK included