AdvancedStrimziApache KafkaKubernetesk8sMessagingEvent StreamingKafka OperatorCNCFData StreamingMicroservices

Strimzi: Apache Kafka on Kubernetes Setup

Complete setup guide for Strimzi - an open-source Kubernetes operator for running Apache Kafka clusters in a cloud-native way. Originally created by Red Hat, now a CNCF Sandbox project. Includes installation, cluster configuration, topic management, and production best practices for running Kafka workloads on Kubernetes.

Step 1

Understanding Strimzi and Kafka on Kubernetes

Strimzi provides a way to run Apache Kafka on Kubernetes using custom resources and operators. It manages Kafka brokers, ZooKeeper (or KRaft mode), Kafka Connect, MirrorMaker, and Kafka Bridge as native Kubernetes resources. The operator handles deployment, scaling, configuration changes, and upgrades declaratively. This eliminates the operational complexity of running distributed Kafka clusters manually.

Strimzi Components:
- Cluster Operator: Manages Kafka clusters, topics, and users
- Entity Operator: Manages topics and users within clusters
  - Topic Operator: Synchronizes KafkaTopics with Kafka
  - User Operator: Manages KafkaUsers and ACLs
- Bridge: HTTP-based Kafka client for browser/IoT devices

Kafka Resources Managed:
- Kafka: Broker cluster configuration
- KafkaTopic: Topic definitions
- KafkaUser: User credentials and ACLs
- KafkaConnect: Connect cluster for integrations
- KafkaMirrorMaker2: Cross-cluster replication
- KafkaBridge: HTTP REST API gateway

Step 2
Prerequisites
You need a running Kubernetes cluster (1.23+) with kubectl configured. Minimum cluster resources: 4 CPU cores, 8GB RAM for a test cluster; production needs vary by workload. Persistent storage is required (StorageClass with dynamic provisioning recommended). You'll need cluster-admin or equivalent permissions to install CRDs and create namespaces.
```
# Verify Kubernetes cluster access
kubectl version --client
kubectl cluster-info

# Check available resources
kubectl top nodes

# Verify StorageClass exists
kubectl get storageclass

# Create dedicated namespace
kubectl create namespace kafka
kubectl config set-context --current --namespace=kafka

# Verify namespace
kubectl get namespace kafka
```
⚠ Heads up: Strimzi requires Kubernetes 1.23 or later. For production use, ensure your cluster has monitoring (Prometheus), logging, and backup strategies in place before deploying Kafka.

Step 3

Install Strimzi Operator via YAML Manifests

The fastest way to install Strimzi is applying the release manifests directly. This creates all necessary CRDs (Custom Resource Definitions), the Cluster Operator deployment, and RBAC resources. The operator watches for Kafka custom resources and manages their lifecycle automatically.

# Install Strimzi operator (latest stable release)
VERSION=0.44.0
kubectl create -f https://github.com/strimzi/strimzi-kafka-operator/releases/download/$VERSION/strimzi-cluster-operator-$VERSION.yaml

# Verify CRDs are installed
kubectl get crd | grep strimzi
# Should show: kafkas, kafkatopics, kafkausers, kafkaconnects, etc.

# Wait for operator to be ready
kubectl wait --for=condition=available --timeout=300s deployment/strimzi-cluster-operator -n kafka

# Check operator logs
kubectl logs -l name=strimzi-cluster-operator -n kafka -f

# Verify operator is running
kubectl get deployment strimzi-cluster-operator -n kafka

⚠ Heads up: The operator installation creates cluster-wide CRDs. If you want namespace-scoped installation, edit the YAML to remove ClusterRole resources and adjust RBAC accordingly.

Step 4

Alternative: Install via Helm

Helm provides a more flexible installation method with configurable values. This is recommended for production deployments where you need to customize operator settings, resource limits, or watchNamespaces. Helm also simplifies upgrades and rollbacks.

# Add Strimzi Helm repository
helm repo add strimzi https://strimzi.io/charts/
helm repo update

# Search available versions
helm search repo strimzi/strimzi-kafka-operator --versions

# Install with default values
helm install strimzi-kafka-operator strimzi/strimzi-kafka-operator \
  --namespace kafka \
  --create-namespace

# Install with custom values
cat <<EOF > values.yaml
watchNamespaces:
  - kafka
  - production
resources:
  limits:
    memory: 512Mi
    cpu: 500m
  requests:
    memory: 256Mi
    cpu: 100m
logLevel: INFO
EOF

helm install strimzi-kafka-operator strimzi/strimzi-kafka-operator \
  --namespace kafka \
  --values values.yaml

# Verify installation
helm list -n kafka
kubectl get pods -n kafka

Step 5

Deploy Your First Kafka Cluster (Ephemeral)

An ephemeral cluster stores data in emptyDir volumes - data is lost when pods restart. This is perfect for development and testing. Strimzi deploys Kafka with ZooKeeper by default (KRaft mode is also supported). The cluster operator watches this custom resource and creates all necessary StatefulSets, Services, and ConfigMaps.

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
  namespace: kafka
spec:
  kafka:
    version: 3.9.0
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
      inter.broker.protocol.version: "3.9"
    storage:
      type: ephemeral
  zookeeper:
    replicas: 3
    storage:
      type: ephemeral
  entityOperator:
    topicOperator: {}
    userOperator: {}

Step 6

Apply Kafka Cluster Configuration

Save the Kafka resource definition to a file and apply it. The Cluster Operator detects the new resource and provisions the entire cluster. This creates StatefulSets for Kafka brokers and ZooKeeper nodes, plus Services for access. Initial deployment takes 2-5 minutes.

# Save the YAML from previous step as kafka-ephemeral.yaml

# Apply the configuration
kubectl apply -f kafka-ephemeral.yaml -n kafka

# Watch cluster creation progress
kubectl get kafka my-cluster -n kafka -w
# Wait until STATUS shows Ready

# Check all created resources
kubectl get all -l app.kubernetes.io/instance=my-cluster -n kafka

# Verify Kafka pods are running
kubectl get pods -l strimzi.io/cluster=my-cluster -n kafka
# Should show: 3 kafka pods, 3 zookeeper pods, entity-operator

# Check Kafka service endpoints
kubectl get svc -l strimzi.io/cluster=my-cluster -n kafka

# View cluster status
kubectl describe kafka my-cluster -n kafka

Step 7

Production-Ready Persistent Cluster

For production use, configure persistent storage with appropriate IOPS and throughput. Use separate storage classes for Kafka (high IOPS) and ZooKeeper (lower latency). Add resource requests/limits based on your workload. Enable metrics collection and configure JVM tuning for optimal performance.

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: production-cluster
  namespace: kafka
spec:
  kafka:
    version: 3.9.0
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
        authentication:
          type: tls
      - name: external
        port: 9094
        type: loadbalancer
        tls: true
        configuration:
          bootstrap:
            loadBalancerIP: <your-load-balancer-ip>
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
      log.retention.hours: 168
      log.segment.bytes: 1073741824
      compression.type: producer
    storage:
      type: persistent-claim
      size: 100Gi
      class: fast-ssd
      deleteClaim: false
    resources:
      requests:
        memory: 4Gi
        cpu: "2"
      limits:
        memory: 8Gi
        cpu: "4"
    jvmOptions:
      -Xms: 2048m
      -Xmx: 4096m
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-metrics
          key: kafka-metrics-config.yml
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 10Gi
      class: fast-ssd
      deleteClaim: false
    resources:
      requests:
        memory: 1Gi
        cpu: "500m"
      limits:
        memory: 2Gi
        cpu: "1"
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-metrics
          key: zookeeper-metrics-config.yml
  entityOperator:
    topicOperator:
      resources:
        requests:
          memory: 256Mi
          cpu: "200m"
        limits:
          memory: 512Mi
          cpu: "500m"
    userOperator:
      resources:
        requests:
          memory: 256Mi
          cpu: "200m"
        limits:
          memory: 512Mi
          cpu: "500m"
  kafkaExporter:
    topicRegex: ".*"
    groupRegex: ".*"

⚠ Heads up: Persistent volumes are not deleted when the cluster is removed unless deleteClaim is true. Set to false for production to prevent accidental data loss. Always test backup/restore procedures before going live.

Step 8

Create Kafka Topics Declaratively

Strimzi manages topics as Kubernetes resources. Create a KafkaTopic custom resource and the Topic Operator synchronizes it with Kafka. This provides GitOps-friendly topic management with version control and declarative configuration.

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: orders
  namespace: kafka
  labels:
    strimzi.io/cluster: my-cluster
spec:
  partitions: 10
  replicas: 3
  config:
    retention.ms: 604800000  # 7 days
    segment.ms: 3600000      # 1 hour
    compression.type: lz4
    max.message.bytes: 1048576
    min.insync.replicas: 2
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: events
  namespace: kafka
  labels:
    strimzi.io/cluster: my-cluster
spec:
  partitions: 20
  replicas: 3
  config:
    retention.ms: 86400000   # 1 day
    cleanup.policy: delete
    compression.type: snappy

Step 9

Apply Topic Configuration

Apply topic definitions using kubectl. The Topic Operator creates topics in Kafka and keeps them synchronized with the custom resource. Any changes to the KafkaTopic spec are reflected in Kafka automatically.

# Apply topic definitions
kubectl apply -f topics.yaml -n kafka

# List KafkaTopic resources
kubectl get kafkatopics -n kafka

# Describe a specific topic
kubectl describe kafkatopic orders -n kafka

# Verify topics exist in Kafka (exec into broker pod)
kubectl exec -it my-cluster-kafka-0 -n kafka -- bin/kafka-topics.sh \
  --bootstrap-server localhost:9092 \
  --list

# Get detailed topic info
kubectl exec -it my-cluster-kafka-0 -n kafka -- bin/kafka-topics.sh \
  --bootstrap-server localhost:9092 \
  --describe \
  --topic orders

# Update topic (edit the KafkaTopic resource)
kubectl edit kafkatopic orders -n kafka
# Change partitions or config, save - Topic Operator applies changes

# Delete topic
kubectl delete kafkatopic orders -n kafka

⚠ Heads up: Topic deletion requires delete.topic.enable=true in Kafka config (enabled by default in Strimzi). Deleting a KafkaTopic resource deletes the actual topic and all its data.

Step 10

Create Kafka Users with Authentication

KafkaUser resources define users with TLS or SCRAM-SHA-512 authentication. Strimzi automatically generates certificates or passwords and stores them in Kubernetes Secrets. Users can have ACLs (Access Control Lists) for fine-grained authorization on topics and consumer groups.

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: producer-app
  namespace: kafka
  labels:
    strimzi.io/cluster: my-cluster
spec:
  authentication:
    type: tls
  authorization:
    type: simple
    acls:
      - resource:
          type: topic
          name: orders
          patternType: literal
        operations:
          - Write
          - Describe
      - resource:
          type: group
          name: producer-group
          patternType: literal
        operations:
          - Read
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: consumer-app
  namespace: kafka
  labels:
    strimzi.io/cluster: my-cluster
spec:
  authentication:
    type: scram-sha-512
  authorization:
    type: simple
    acls:
      - resource:
          type: topic
          name: orders
          patternType: literal
        operations:
          - Read
          - Describe
      - resource:
          type: group
          name: consumer-group
          patternType: prefix
        operations:
          - Read

Step 11

Apply User Configuration and Access Credentials

Apply KafkaUser resources and retrieve generated credentials from Secrets. TLS users get a certificate/key pair; SCRAM-SHA users get a password. Mount these secrets in your application pods to authenticate with Kafka.

# Apply user definitions
kubectl apply -f kafka-users.yaml -n kafka

# List KafkaUser resources
kubectl get kafkausers -n kafka

# Check generated secrets
kubectl get secrets -n kafka | grep producer-app
kubectl get secrets -n kafka | grep consumer-app

# View TLS certificate for producer-app
kubectl get secret producer-app -n kafka -o jsonpath='{.data.user\.crt}' | base64 -d

# View TLS key
kubectl get secret producer-app -n kafka -o jsonpath='{.data.user\.key}' | base64 -d

# View CA certificate (for trust)
kubectl get secret my-cluster-cluster-ca-cert -n kafka -o jsonpath='{.data.ca\.crt}' | base64 -d

# View SCRAM-SHA password for consumer-app
kubectl get secret consumer-app -n kafka -o jsonpath='{.data.password}' | base64 -d

# Extract credentials to files for application use
kubectl get secret producer-app -n kafka -o jsonpath='{.data.user\.crt}' | base64 -d > user.crt
kubectl get secret producer-app -n kafka -o jsonpath='{.data.user\.key}' | base64 -d > user.key
kubectl get secret my-cluster-cluster-ca-cert -n kafka -o jsonpath='{.data.ca\.crt}' | base64 -d > ca.crt

Step 12

Test Producer and Consumer

Verify your Kafka cluster is working by producing and consuming messages. Use the kafka-console tools included in the Kafka container image to test connectivity and authentication.

# Create a test topic
kubectl apply -f - <<EOF
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: test-topic
  namespace: kafka
  labels:
    strimzi.io/cluster: my-cluster
spec:
  partitions: 3
  replicas: 3
EOF

# Start a producer (plain listener)
kubectl run kafka-producer -ti --image=quay.io/strimzi/kafka:0.44.0-kafka-3.9.0 \
  --rm=true --restart=Never -n kafka -- bin/kafka-console-producer.sh \
  --bootstrap-server my-cluster-kafka-bootstrap:9092 \
  --topic test-topic

# Type some messages, Ctrl+C when done

# Start a consumer in another terminal
kubectl run kafka-consumer -ti --image=quay.io/strimzi/kafka:0.44.0-kafka-3.9.0 \
  --rm=true --restart=Never -n kafka -- bin/kafka-console-consumer.sh \
  --bootstrap-server my-cluster-kafka-bootstrap:9092 \
  --topic test-topic \
  --from-beginning

# Test with TLS listener (requires certificate)
kubectl run kafka-producer-tls -ti --image=quay.io/strimzi/kafka:0.44.0-kafka-3.9.0 \
  --rm=true --restart=Never -n kafka -- bin/kafka-console-producer.sh \
  --bootstrap-server my-cluster-kafka-bootstrap:9093 \
  --topic test-topic \
  --producer-property security.protocol=SSL \
  --producer-property ssl.truststore.location=/tmp/truststore.p12 \
  --producer-property ssl.truststore.password=<password>

Step 13

Deploy Kafka Connect for Integrations

Kafka Connect provides scalable, reliable streaming integration between Kafka and external systems. Strimzi manages Connect clusters as custom resources. You can use pre-built connectors (databases, S3, Elasticsearch) or build custom ones.

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
  name: my-connect-cluster
  namespace: kafka
  annotations:
    strimzi.io/use-connector-resources: "true"
spec:
  version: 3.9.0
  replicas: 3
  bootstrapServers: my-cluster-kafka-bootstrap:9092
  config:
    group.id: connect-cluster
    offset.storage.topic: connect-cluster-offsets
    config.storage.topic: connect-cluster-configs
    status.storage.topic: connect-cluster-status
    config.storage.replication.factor: 3
    offset.storage.replication.factor: 3
    status.storage.replication.factor: 3
  build:
    output:
      type: docker
      image: <your-registry>/kafka-connect:latest
    plugins:
      - name: debezium-postgres-connector
        artifacts:
          - type: tgz
            url: https://repo1.maven.org/maven2/io/debezium/debezium-connector-postgres/2.8.1.Final/debezium-connector-postgres-2.8.1.Final-plugin.tar.gz
      - name: camel-http-connector
        artifacts:
          - type: tgz
            url: https://repo1.maven.org/maven2/org/apache/camel/kafkaconnector/camel-http-kafka-connector/4.4.3/camel-http-kafka-connector-4.4.3-package.tar.gz

Step 14

Apply Connect Cluster and Deploy Connectors

Strimzi can build custom Connect images with your connectors using the build specification. For production, pre-build images and reference them. Then deploy connector instances using KafkaConnector resources.

# Apply Connect cluster
kubectl apply -f kafka-connect.yaml -n kafka

# Wait for Connect cluster to be ready
kubectl wait --for=condition=ready kafkaconnect/my-connect-cluster --timeout=600s -n kafka

# Check Connect pods
kubectl get pods -l strimzi.io/cluster=my-connect-cluster -n kafka

# Create a connector instance
kubectl apply -f - <<EOF
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
  name: postgres-source
  namespace: kafka
  labels:
    strimzi.io/cluster: my-connect-cluster
spec:
  class: io.debezium.connector.postgresql.PostgresConnector
  tasksMax: 2
  config:
    database.hostname: postgres.database.svc.cluster.local
    database.port: 5432
    database.user: kafka_user
    database.password: <your-password>
    database.dbname: production
    database.server.name: prod-db
    table.include.list: public.orders,public.customers
    plugin.name: pgoutput
EOF

# List connectors
kubectl get kafkaconnectors -n kafka

# Check connector status
kubectl describe kafkaconnector postgres-source -n kafka

# View Connect logs
kubectl logs -l strimzi.io/cluster=my-connect-cluster -n kafka -f

Step 15

Enable Monitoring with Prometheus and Grafana

Strimzi exposes JMX metrics from Kafka, ZooKeeper, and Connect via Prometheus exporters. Configure the metricsConfig in your Kafka resource to enable metrics collection. Deploy Prometheus and Grafana to visualize cluster health, throughput, latency, and consumer lag.

# Create Kafka metrics ConfigMap
kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/0.44.0/examples/metrics/kafka-metrics.yaml -n kafka

# Install Prometheus Operator (if not already installed)
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/bundle.yaml

# Create ServiceMonitor for Kafka
kubectl apply -f - <<EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kafka-metrics
  namespace: kafka
spec:
  selector:
    matchLabels:
      strimzi.io/kind: Kafka
  endpoints:
  - port: tcp-prometheus
    interval: 30s
EOF

# Deploy Grafana
kubectl create deployment grafana --image=grafana/grafana:latest -n kafka
kubectl expose deployment grafana --type=LoadBalancer --port=3000 -n kafka

# Get Grafana URL
kubectl get svc grafana -n kafka

# Import Strimzi dashboards from
# https://github.com/strimzi/strimzi-kafka-operator/tree/main/examples/metrics/grafana-dashboards

# View Prometheus metrics directly
kubectl port-forward svc/my-cluster-kafka-brokers 9404:9404 -n kafka
# Visit http://localhost:9404/metrics

Step 16

Configure External Access

Expose Kafka outside Kubernetes using LoadBalancer, NodePort, or Ingress listeners. Each broker gets a unique external address for client connections. Choose the listener type based on your cloud provider and networking setup.

# LoadBalancer listener (AWS, GCP, Azure)
listeners:
  - name: external
    port: 9094
    type: loadbalancer
    tls: true
    authentication:
      type: tls
    configuration:
      brokerCertChainAndKey:
        secretName: kafka-tls-cert
        certificate: tls.crt
        key: tls.key
---
# NodePort listener (on-prem, local)
listeners:
  - name: external
    port: 9094
    type: nodeport
    tls: true
    configuration:
      preferredNodePortAddressType: ExternalIP
      brokers:
        - broker: 0
          advertisedHost: <node-1-external-ip>
          nodePort: 32100
        - broker: 1
          advertisedHost: <node-2-external-ip>
          nodePort: 32101
        - broker: 2
          advertisedHost: <node-3-external-ip>
          nodePort: 32102
---
# Ingress listener (with NGINX or similar)
listeners:
  - name: external
    port: 9094
    type: ingress
    tls: true
    configuration:
      bootstrap:
        host: kafka-bootstrap.example.com
      brokers:
        - broker: 0
          host: kafka-0.example.com
        - broker: 1
          host: kafka-1.example.com
        - broker: 2
          host: kafka-2.example.com
      class: nginx

Step 17

Connect External Clients

Retrieve bootstrap addresses and certificates for external connections. Configure your Kafka clients with the appropriate security protocol and credentials. Test connectivity before deploying applications.

# Get external bootstrap address
kubectl get kafka my-cluster -n kafka -o jsonpath='{.status.listeners[?(@.name=="external")].bootstrapServers}'

# Get CA certificate for TLS
kubectl get secret my-cluster-cluster-ca-cert -n kafka -o jsonpath='{.data.ca\.crt}' | base64 -d > ca.crt

# Get client certificate (if using mutual TLS)
kubectl get secret producer-app -n kafka -o jsonpath='{.data.user\.crt}' | base64 -d > client.crt
kubectl get secret producer-app -n kafka -o jsonpath='{.data.user\.key}' | base64 -d > client.key

# Test connection with kafka-console-producer (local machine)
kafka-console-producer.sh \
  --bootstrap-server <external-bootstrap-address>:9094 \
  --topic test-topic \
  --producer-property security.protocol=SSL \
  --producer-property ssl.truststore.location=truststore.jks \
  --producer-property ssl.truststore.password=<password> \
  --producer-property ssl.keystore.location=keystore.jks \
  --producer-property ssl.keystore.password=<password>

# Java client configuration example
# properties.put("bootstrap.servers", "<external-address>:9094");
# properties.put("security.protocol", "SSL");
# properties.put("ssl.truststore.location", "/path/to/truststore.jks");
# properties.put("ssl.truststore.password", "password");
# properties.put("ssl.keystore.location", "/path/to/keystore.jks");
# properties.put("ssl.keystore.password", "password");

Step 18

Upgrade Kafka Version

Strimzi supports rolling upgrades with zero downtime. Update the Kafka version in your Kafka resource and apply. The operator upgrades brokers one at a time, ensuring the cluster remains available. Always check the Strimzi documentation for version compatibility and upgrade path.

# Check current Kafka version
kubectl get kafka my-cluster -n kafka -o jsonpath='{.spec.kafka.version}'

# Edit Kafka resource to update version
kubectl edit kafka my-cluster -n kafka
# Change spec.kafka.version from 3.8.0 to 3.9.0
# Change spec.kafka.config.inter.broker.protocol.version if needed

# Or patch directly
kubectl patch kafka my-cluster -n kafka --type=merge -p '{
  "spec": {
    "kafka": {
      "version": "3.9.0",
      "config": {
        "log.message.format.version": "3.9",
        "inter.broker.protocol.version": "3.9"
      }
    }
  }
}'

# Monitor upgrade progress
kubectl get pods -l strimzi.io/cluster=my-cluster -n kafka -w

# Check Kafka logs during upgrade
kubectl logs my-cluster-kafka-0 -n kafka -f

# Verify upgrade completed
kubectl get kafka my-cluster -n kafka -o yaml | grep version:

# After upgrade, update protocol versions if needed
# This may require a second rolling restart

⚠ Heads up: Always upgrade Strimzi operator first, then Kafka version. Test upgrades in a non-production environment. Some version jumps require incremental upgrades (e.g., 3.6 → 3.7 → 3.8).

Step 19

Backup and Disaster Recovery

Implement regular backups of Kafka topic data and cluster metadata. Use MirrorMaker2 for active-passive or active-active replication to a disaster recovery cluster. Back up PersistentVolumes and ZooKeeper state. Test restore procedures regularly.

# Deploy MirrorMaker2 for cross-cluster replication
kubectl apply -f - <<EOF
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaMirrorMaker2
metadata:
  name: disaster-recovery-mirror
  namespace: kafka
spec:
  version: 3.9.0
  replicas: 1
  connectCluster: "target"
  clusters:
    - alias: "source"
      bootstrapServers: my-cluster-kafka-bootstrap:9092
    - alias: "target"
      bootstrapServers: dr-cluster-kafka-bootstrap:9092
  mirrors:
    - sourceCluster: "source"
      targetCluster: "target"
      sourceConnector:
        config:
          replication.factor: 3
          offset-syncs.topic.replication.factor: 3
          sync.topic.acls.enabled: "false"
      heartbeatConnector:
        config:
          heartbeats.topic.replication.factor: 3
      checkpointConnector:
        config:
          checkpoints.topic.replication.factor: 3
      topicsPattern: ".*"
      groupsPattern: ".*"
EOF

# Backup PersistentVolumes using Velero
kubectl create ns velero
velero install --provider aws --bucket kafka-backups --secret-file ./credentials-velero

# Create backup schedule
velero schedule create kafka-daily --schedule="0 2 * * *" --include-namespaces kafka

# Manual backup
velero backup create kafka-backup-$(date +%Y%m%d) --include-namespaces kafka

# List backups
velero backup get

# Restore from backup
velero restore create --from-backup kafka-backup-20260529

Step 20

Security Hardening

Enable TLS for all listeners, use mutual TLS or SCRAM-SHA-512 authentication, implement network policies, enable authorization with ACLs, and use Pod Security Standards. Regular security audits and updates are essential for production clusters.

# Network Policy to restrict Kafka access
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: kafka-network-policy
  namespace: kafka
spec:
  podSelector:
    matchLabels:
      strimzi.io/cluster: my-cluster
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: applications
      ports:
        - protocol: TCP
          port: 9092
        - protocol: TCP
          port: 9093
  egress:
    - to:
        - podSelector:
            matchLabels:
              strimzi.io/cluster: my-cluster
      ports:
        - protocol: TCP
          port: 9091
    - to:
        - podSelector:
            matchLabels:
              strimzi.io/name: my-cluster-zookeeper
      ports:
        - protocol: TCP
          port: 2181
---
# Pod Security Context for Kafka
template:
  pod:
    securityContext:
      runAsNonRoot: true
      runAsUser: 1000
      fsGroup: 1000
      seccompProfile:
        type: RuntimeDefault
    affinity:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
                - key: strimzi.io/cluster
                  operator: In
                  values:
                    - my-cluster
            topologyKey: kubernetes.io/hostname

Step 21

Troubleshooting Common Issues

Common problems include pod restarts due to resource constraints, topic creation failures, authentication errors, and network connectivity issues. Always check operator logs first, then individual component logs. Describe resources to see events and status conditions.

# Check operator logs
kubectl logs -l name=strimzi-cluster-operator -n kafka --tail=100

# Check Kafka broker logs
kubectl logs my-cluster-kafka-0 -n kafka --tail=100

# Check ZooKeeper logs
kubectl logs my-cluster-zookeeper-0 -n kafka --tail=100

# Check entity operator logs (Topic/User operator)
kubectl logs -l strimzi.io/name=my-cluster-entity-operator -n kafka -c topic-operator --tail=100
kubectl logs -l strimzi.io/name=my-cluster-entity-operator -n kafka -c user-operator --tail=100

# Describe Kafka resource for status
kubectl describe kafka my-cluster -n kafka

# Check resource events
kubectl get events -n kafka --sort-by='.lastTimestamp'

# Verify storage is provisioned
kubectl get pvc -n kafka

# Check pod resource usage
kubectl top pods -n kafka

# Test network connectivity between pods
kubectl exec -it my-cluster-kafka-0 -n kafka -- nc -zv my-cluster-zookeeper-client 2181

# Verify DNS resolution
kubectl exec -it my-cluster-kafka-0 -n kafka -- nslookup my-cluster-kafka-bootstrap

# Check Kafka topic status
kubectl exec -it my-cluster-kafka-0 -n kafka -- bin/kafka-topics.sh \
  --bootstrap-server localhost:9092 \
  --describe --topic <topic-name>

# View under-replicated partitions
kubectl exec -it my-cluster-kafka-0 -n kafka -- bin/kafka-topics.sh \
  --bootstrap-server localhost:9092 \
  --describe --under-replicated-partitions

Step 22

Performance Tuning

Optimize Kafka performance by tuning JVM settings, adjusting broker configurations, sizing persistent storage appropriately, and configuring producer/consumer clients correctly. Monitor key metrics like throughput, latency, and disk I/O to identify bottlenecks.

# Performance-tuned Kafka configuration
kafka:
  jvmOptions:
    -Xms: 8192m
    -Xmx: 8192m
    -XX:
      UseG1GC: true
      MaxGCPauseMillis: 20
      InitiatingHeapOccupancyPercent: 35
      G1HeapRegionSize: 16m
  config:
    # Network tuning
    num.network.threads: 8
    num.io.threads: 16
    socket.send.buffer.bytes: 1048576
    socket.receive.buffer.bytes: 1048576
    socket.request.max.bytes: 104857600
    
    # Log tuning
    num.partitions: 16
    log.segment.bytes: 1073741824
    log.retention.check.interval.ms: 300000
    log.flush.interval.messages: 10000
    
    # Replication tuning
    replica.fetch.max.bytes: 1048576
    replica.lag.time.max.ms: 30000
    
    # Compression
    compression.type: lz4
    
  resources:
    requests:
      memory: 16Gi
      cpu: "4"
    limits:
      memory: 16Gi
      cpu: "8"
  
  # Use high-performance storage class
  storage:
    type: persistent-claim
    size: 500Gi
    class: high-iops-ssd

Step 23

Migrating from KRaft Mode (ZooKeeper-less)

Kafka 3.3+ supports KRaft mode (Kafka Raft metadata mode), which removes the ZooKeeper dependency. Strimzi 0.32+ supports KRaft deployments. This simplifies architecture and improves metadata scalability. For new deployments, consider starting with KRaft mode.

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: kraft-cluster
  namespace: kafka
spec:
  kafka:
    version: 3.9.0
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
    storage:
      type: persistent-claim
      size: 100Gi
    # KRaft-specific: no zookeeper section needed
    metadataVersion: 3.9-IV0
  entityOperator:
    topicOperator: {}
    userOperator: {}

⚠ Heads up: Migration from ZooKeeper to KRaft is one-way and requires careful planning. Test thoroughly in non-production environments. Not all Strimzi features are available in KRaft mode yet - check documentation for current limitations.