TechSetupGuides
AdvancedStrimziApache KafkaKubernetesk8sMessagingEvent StreamingKafka OperatorCNCFData StreamingMicroservices

Strimzi: Apache Kafka on Kubernetes Setup

Complete setup guide for Strimzi - an open-source Kubernetes operator for running Apache Kafka clusters in a cloud-native way. Originally created by Red Hat, now a CNCF Sandbox project. Includes installation, cluster configuration, topic management, and production best practices for running Kafka workloads on Kubernetes.

  1. Step 1

    Understanding Strimzi and Kafka on Kubernetes

    Strimzi provides a way to run Apache Kafka on Kubernetes using custom resources and operators. It manages Kafka brokers, ZooKeeper (or KRaft mode), Kafka Connect, MirrorMaker, and Kafka Bridge as native Kubernetes resources. The operator handles deployment, scaling, configuration changes, and upgrades declaratively. This eliminates the operational complexity of running distributed Kafka clusters manually.

    Strimzi Components:
    - Cluster Operator: Manages Kafka clusters, topics, and users
    - Entity Operator: Manages topics and users within clusters
      - Topic Operator: Synchronizes KafkaTopics with Kafka
      - User Operator: Manages KafkaUsers and ACLs
    - Bridge: HTTP-based Kafka client for browser/IoT devices
    
    Kafka Resources Managed:
    - Kafka: Broker cluster configuration
    - KafkaTopic: Topic definitions
    - KafkaUser: User credentials and ACLs
    - KafkaConnect: Connect cluster for integrations
    - KafkaMirrorMaker2: Cross-cluster replication
    - KafkaBridge: HTTP REST API gateway
  2. Step 2

    Prerequisites

    You need a running Kubernetes cluster (1.23+) with kubectl configured. Minimum cluster resources: 4 CPU cores, 8GB RAM for a test cluster; production needs vary by workload. Persistent storage is required (StorageClass with dynamic provisioning recommended). You'll need cluster-admin or equivalent permissions to install CRDs and create namespaces.

    # Verify Kubernetes cluster access
    kubectl version --client
    kubectl cluster-info
    
    # Check available resources
    kubectl top nodes
    
    # Verify StorageClass exists
    kubectl get storageclass
    
    # Create dedicated namespace
    kubectl create namespace kafka
    kubectl config set-context --current --namespace=kafka
    
    # Verify namespace
    kubectl get namespace kafka
    ⚠ Heads up: Strimzi requires Kubernetes 1.23 or later. For production use, ensure your cluster has monitoring (Prometheus), logging, and backup strategies in place before deploying Kafka.
  3. Step 3

    Install Strimzi Operator via YAML Manifests

    The fastest way to install Strimzi is applying the release manifests directly. This creates all necessary CRDs (Custom Resource Definitions), the Cluster Operator deployment, and RBAC resources. The operator watches for Kafka custom resources and manages their lifecycle automatically.

    # Install Strimzi operator (latest stable release)
    VERSION=0.44.0
    kubectl create -f https://github.com/strimzi/strimzi-kafka-operator/releases/download/$VERSION/strimzi-cluster-operator-$VERSION.yaml
    
    # Verify CRDs are installed
    kubectl get crd | grep strimzi
    # Should show: kafkas, kafkatopics, kafkausers, kafkaconnects, etc.
    
    # Wait for operator to be ready
    kubectl wait --for=condition=available --timeout=300s deployment/strimzi-cluster-operator -n kafka
    
    # Check operator logs
    kubectl logs -l name=strimzi-cluster-operator -n kafka -f
    
    # Verify operator is running
    kubectl get deployment strimzi-cluster-operator -n kafka
    ⚠ Heads up: The operator installation creates cluster-wide CRDs. If you want namespace-scoped installation, edit the YAML to remove ClusterRole resources and adjust RBAC accordingly.
  4. Step 4

    Alternative: Install via Helm

    Helm provides a more flexible installation method with configurable values. This is recommended for production deployments where you need to customize operator settings, resource limits, or watchNamespaces. Helm also simplifies upgrades and rollbacks.

    # Add Strimzi Helm repository
    helm repo add strimzi https://strimzi.io/charts/
    helm repo update
    
    # Search available versions
    helm search repo strimzi/strimzi-kafka-operator --versions
    
    # Install with default values
    helm install strimzi-kafka-operator strimzi/strimzi-kafka-operator \
      --namespace kafka \
      --create-namespace
    
    # Install with custom values
    cat <<EOF > values.yaml
    watchNamespaces:
      - kafka
      - production
    resources:
      limits:
        memory: 512Mi
        cpu: 500m
      requests:
        memory: 256Mi
        cpu: 100m
    logLevel: INFO
    EOF
    
    helm install strimzi-kafka-operator strimzi/strimzi-kafka-operator \
      --namespace kafka \
      --values values.yaml
    
    # Verify installation
    helm list -n kafka
    kubectl get pods -n kafka
  5. Step 5

    Deploy Your First Kafka Cluster (Ephemeral)

    An ephemeral cluster stores data in emptyDir volumes - data is lost when pods restart. This is perfect for development and testing. Strimzi deploys Kafka with ZooKeeper by default (KRaft mode is also supported). The cluster operator watches this custom resource and creates all necessary StatefulSets, Services, and ConfigMaps.

    apiVersion: kafka.strimzi.io/v1beta2
    kind: Kafka
    metadata:
      name: my-cluster
      namespace: kafka
    spec:
      kafka:
        version: 3.9.0
        replicas: 3
        listeners:
          - name: plain
            port: 9092
            type: internal
            tls: false
          - name: tls
            port: 9093
            type: internal
            tls: true
        config:
          offsets.topic.replication.factor: 3
          transaction.state.log.replication.factor: 3
          transaction.state.log.min.isr: 2
          default.replication.factor: 3
          min.insync.replicas: 2
          inter.broker.protocol.version: "3.9"
        storage:
          type: ephemeral
      zookeeper:
        replicas: 3
        storage:
          type: ephemeral
      entityOperator:
        topicOperator: {}
        userOperator: {}
  6. Step 6

    Apply Kafka Cluster Configuration

    Save the Kafka resource definition to a file and apply it. The Cluster Operator detects the new resource and provisions the entire cluster. This creates StatefulSets for Kafka brokers and ZooKeeper nodes, plus Services for access. Initial deployment takes 2-5 minutes.

    # Save the YAML from previous step as kafka-ephemeral.yaml
    
    # Apply the configuration
    kubectl apply -f kafka-ephemeral.yaml -n kafka
    
    # Watch cluster creation progress
    kubectl get kafka my-cluster -n kafka -w
    # Wait until STATUS shows Ready
    
    # Check all created resources
    kubectl get all -l app.kubernetes.io/instance=my-cluster -n kafka
    
    # Verify Kafka pods are running
    kubectl get pods -l strimzi.io/cluster=my-cluster -n kafka
    # Should show: 3 kafka pods, 3 zookeeper pods, entity-operator
    
    # Check Kafka service endpoints
    kubectl get svc -l strimzi.io/cluster=my-cluster -n kafka
    
    # View cluster status
    kubectl describe kafka my-cluster -n kafka
  7. Step 7

    Production-Ready Persistent Cluster

    For production use, configure persistent storage with appropriate IOPS and throughput. Use separate storage classes for Kafka (high IOPS) and ZooKeeper (lower latency). Add resource requests/limits based on your workload. Enable metrics collection and configure JVM tuning for optimal performance.

    apiVersion: kafka.strimzi.io/v1beta2
    kind: Kafka
    metadata:
      name: production-cluster
      namespace: kafka
    spec:
      kafka:
        version: 3.9.0
        replicas: 3
        listeners:
          - name: plain
            port: 9092
            type: internal
            tls: false
          - name: tls
            port: 9093
            type: internal
            tls: true
            authentication:
              type: tls
          - name: external
            port: 9094
            type: loadbalancer
            tls: true
            configuration:
              bootstrap:
                loadBalancerIP: <your-load-balancer-ip>
        config:
          offsets.topic.replication.factor: 3
          transaction.state.log.replication.factor: 3
          transaction.state.log.min.isr: 2
          default.replication.factor: 3
          min.insync.replicas: 2
          log.retention.hours: 168
          log.segment.bytes: 1073741824
          compression.type: producer
        storage:
          type: persistent-claim
          size: 100Gi
          class: fast-ssd
          deleteClaim: false
        resources:
          requests:
            memory: 4Gi
            cpu: "2"
          limits:
            memory: 8Gi
            cpu: "4"
        jvmOptions:
          -Xms: 2048m
          -Xmx: 4096m
        metricsConfig:
          type: jmxPrometheusExporter
          valueFrom:
            configMapKeyRef:
              name: kafka-metrics
              key: kafka-metrics-config.yml
      zookeeper:
        replicas: 3
        storage:
          type: persistent-claim
          size: 10Gi
          class: fast-ssd
          deleteClaim: false
        resources:
          requests:
            memory: 1Gi
            cpu: "500m"
          limits:
            memory: 2Gi
            cpu: "1"
        metricsConfig:
          type: jmxPrometheusExporter
          valueFrom:
            configMapKeyRef:
              name: kafka-metrics
              key: zookeeper-metrics-config.yml
      entityOperator:
        topicOperator:
          resources:
            requests:
              memory: 256Mi
              cpu: "200m"
            limits:
              memory: 512Mi
              cpu: "500m"
        userOperator:
          resources:
            requests:
              memory: 256Mi
              cpu: "200m"
            limits:
              memory: 512Mi
              cpu: "500m"
      kafkaExporter:
        topicRegex: ".*"
        groupRegex: ".*"
    ⚠ Heads up: Persistent volumes are not deleted when the cluster is removed unless deleteClaim is true. Set to false for production to prevent accidental data loss. Always test backup/restore procedures before going live.
  8. Step 8

    Create Kafka Topics Declaratively

    Strimzi manages topics as Kubernetes resources. Create a KafkaTopic custom resource and the Topic Operator synchronizes it with Kafka. This provides GitOps-friendly topic management with version control and declarative configuration.

    apiVersion: kafka.strimzi.io/v1beta2
    kind: KafkaTopic
    metadata:
      name: orders
      namespace: kafka
      labels:
        strimzi.io/cluster: my-cluster
    spec:
      partitions: 10
      replicas: 3
      config:
        retention.ms: 604800000  # 7 days
        segment.ms: 3600000      # 1 hour
        compression.type: lz4
        max.message.bytes: 1048576
        min.insync.replicas: 2
    ---
    apiVersion: kafka.strimzi.io/v1beta2
    kind: KafkaTopic
    metadata:
      name: events
      namespace: kafka
      labels:
        strimzi.io/cluster: my-cluster
    spec:
      partitions: 20
      replicas: 3
      config:
        retention.ms: 86400000   # 1 day
        cleanup.policy: delete
        compression.type: snappy
  9. Step 9

    Apply Topic Configuration

    Apply topic definitions using kubectl. The Topic Operator creates topics in Kafka and keeps them synchronized with the custom resource. Any changes to the KafkaTopic spec are reflected in Kafka automatically.

    # Apply topic definitions
    kubectl apply -f topics.yaml -n kafka
    
    # List KafkaTopic resources
    kubectl get kafkatopics -n kafka
    
    # Describe a specific topic
    kubectl describe kafkatopic orders -n kafka
    
    # Verify topics exist in Kafka (exec into broker pod)
    kubectl exec -it my-cluster-kafka-0 -n kafka -- bin/kafka-topics.sh \
      --bootstrap-server localhost:9092 \
      --list
    
    # Get detailed topic info
    kubectl exec -it my-cluster-kafka-0 -n kafka -- bin/kafka-topics.sh \
      --bootstrap-server localhost:9092 \
      --describe \
      --topic orders
    
    # Update topic (edit the KafkaTopic resource)
    kubectl edit kafkatopic orders -n kafka
    # Change partitions or config, save - Topic Operator applies changes
    
    # Delete topic
    kubectl delete kafkatopic orders -n kafka
    ⚠ Heads up: Topic deletion requires delete.topic.enable=true in Kafka config (enabled by default in Strimzi). Deleting a KafkaTopic resource deletes the actual topic and all its data.
  10. Step 10

    Create Kafka Users with Authentication

    KafkaUser resources define users with TLS or SCRAM-SHA-512 authentication. Strimzi automatically generates certificates or passwords and stores them in Kubernetes Secrets. Users can have ACLs (Access Control Lists) for fine-grained authorization on topics and consumer groups.

    apiVersion: kafka.strimzi.io/v1beta2
    kind: KafkaUser
    metadata:
      name: producer-app
      namespace: kafka
      labels:
        strimzi.io/cluster: my-cluster
    spec:
      authentication:
        type: tls
      authorization:
        type: simple
        acls:
          - resource:
              type: topic
              name: orders
              patternType: literal
            operations:
              - Write
              - Describe
          - resource:
              type: group
              name: producer-group
              patternType: literal
            operations:
              - Read
    ---
    apiVersion: kafka.strimzi.io/v1beta2
    kind: KafkaUser
    metadata:
      name: consumer-app
      namespace: kafka
      labels:
        strimzi.io/cluster: my-cluster
    spec:
      authentication:
        type: scram-sha-512
      authorization:
        type: simple
        acls:
          - resource:
              type: topic
              name: orders
              patternType: literal
            operations:
              - Read
              - Describe
          - resource:
              type: group
              name: consumer-group
              patternType: prefix
            operations:
              - Read
  11. Step 11

    Apply User Configuration and Access Credentials

    Apply KafkaUser resources and retrieve generated credentials from Secrets. TLS users get a certificate/key pair; SCRAM-SHA users get a password. Mount these secrets in your application pods to authenticate with Kafka.

    # Apply user definitions
    kubectl apply -f kafka-users.yaml -n kafka
    
    # List KafkaUser resources
    kubectl get kafkausers -n kafka
    
    # Check generated secrets
    kubectl get secrets -n kafka | grep producer-app
    kubectl get secrets -n kafka | grep consumer-app
    
    # View TLS certificate for producer-app
    kubectl get secret producer-app -n kafka -o jsonpath='{.data.user\.crt}' | base64 -d
    
    # View TLS key
    kubectl get secret producer-app -n kafka -o jsonpath='{.data.user\.key}' | base64 -d
    
    # View CA certificate (for trust)
    kubectl get secret my-cluster-cluster-ca-cert -n kafka -o jsonpath='{.data.ca\.crt}' | base64 -d
    
    # View SCRAM-SHA password for consumer-app
    kubectl get secret consumer-app -n kafka -o jsonpath='{.data.password}' | base64 -d
    
    # Extract credentials to files for application use
    kubectl get secret producer-app -n kafka -o jsonpath='{.data.user\.crt}' | base64 -d > user.crt
    kubectl get secret producer-app -n kafka -o jsonpath='{.data.user\.key}' | base64 -d > user.key
    kubectl get secret my-cluster-cluster-ca-cert -n kafka -o jsonpath='{.data.ca\.crt}' | base64 -d > ca.crt
  12. Step 12

    Test Producer and Consumer

    Verify your Kafka cluster is working by producing and consuming messages. Use the kafka-console tools included in the Kafka container image to test connectivity and authentication.

    # Create a test topic
    kubectl apply -f - <<EOF
    apiVersion: kafka.strimzi.io/v1beta2
    kind: KafkaTopic
    metadata:
      name: test-topic
      namespace: kafka
      labels:
        strimzi.io/cluster: my-cluster
    spec:
      partitions: 3
      replicas: 3
    EOF
    
    # Start a producer (plain listener)
    kubectl run kafka-producer -ti --image=quay.io/strimzi/kafka:0.44.0-kafka-3.9.0 \
      --rm=true --restart=Never -n kafka -- bin/kafka-console-producer.sh \
      --bootstrap-server my-cluster-kafka-bootstrap:9092 \
      --topic test-topic
    
    # Type some messages, Ctrl+C when done
    
    # Start a consumer in another terminal
    kubectl run kafka-consumer -ti --image=quay.io/strimzi/kafka:0.44.0-kafka-3.9.0 \
      --rm=true --restart=Never -n kafka -- bin/kafka-console-consumer.sh \
      --bootstrap-server my-cluster-kafka-bootstrap:9092 \
      --topic test-topic \
      --from-beginning
    
    # Test with TLS listener (requires certificate)
    kubectl run kafka-producer-tls -ti --image=quay.io/strimzi/kafka:0.44.0-kafka-3.9.0 \
      --rm=true --restart=Never -n kafka -- bin/kafka-console-producer.sh \
      --bootstrap-server my-cluster-kafka-bootstrap:9093 \
      --topic test-topic \
      --producer-property security.protocol=SSL \
      --producer-property ssl.truststore.location=/tmp/truststore.p12 \
      --producer-property ssl.truststore.password=<password>
  13. Step 13

    Deploy Kafka Connect for Integrations

    Kafka Connect provides scalable, reliable streaming integration between Kafka and external systems. Strimzi manages Connect clusters as custom resources. You can use pre-built connectors (databases, S3, Elasticsearch) or build custom ones.

    apiVersion: kafka.strimzi.io/v1beta2
    kind: KafkaConnect
    metadata:
      name: my-connect-cluster
      namespace: kafka
      annotations:
        strimzi.io/use-connector-resources: "true"
    spec:
      version: 3.9.0
      replicas: 3
      bootstrapServers: my-cluster-kafka-bootstrap:9092
      config:
        group.id: connect-cluster
        offset.storage.topic: connect-cluster-offsets
        config.storage.topic: connect-cluster-configs
        status.storage.topic: connect-cluster-status
        config.storage.replication.factor: 3
        offset.storage.replication.factor: 3
        status.storage.replication.factor: 3
      build:
        output:
          type: docker
          image: <your-registry>/kafka-connect:latest
        plugins:
          - name: debezium-postgres-connector
            artifacts:
              - type: tgz
                url: https://repo1.maven.org/maven2/io/debezium/debezium-connector-postgres/2.8.1.Final/debezium-connector-postgres-2.8.1.Final-plugin.tar.gz
          - name: camel-http-connector
            artifacts:
              - type: tgz
                url: https://repo1.maven.org/maven2/org/apache/camel/kafkaconnector/camel-http-kafka-connector/4.4.3/camel-http-kafka-connector-4.4.3-package.tar.gz
  14. Step 14

    Apply Connect Cluster and Deploy Connectors

    Strimzi can build custom Connect images with your connectors using the build specification. For production, pre-build images and reference them. Then deploy connector instances using KafkaConnector resources.

    # Apply Connect cluster
    kubectl apply -f kafka-connect.yaml -n kafka
    
    # Wait for Connect cluster to be ready
    kubectl wait --for=condition=ready kafkaconnect/my-connect-cluster --timeout=600s -n kafka
    
    # Check Connect pods
    kubectl get pods -l strimzi.io/cluster=my-connect-cluster -n kafka
    
    # Create a connector instance
    kubectl apply -f - <<EOF
    apiVersion: kafka.strimzi.io/v1beta2
    kind: KafkaConnector
    metadata:
      name: postgres-source
      namespace: kafka
      labels:
        strimzi.io/cluster: my-connect-cluster
    spec:
      class: io.debezium.connector.postgresql.PostgresConnector
      tasksMax: 2
      config:
        database.hostname: postgres.database.svc.cluster.local
        database.port: 5432
        database.user: kafka_user
        database.password: <your-password>
        database.dbname: production
        database.server.name: prod-db
        table.include.list: public.orders,public.customers
        plugin.name: pgoutput
    EOF
    
    # List connectors
    kubectl get kafkaconnectors -n kafka
    
    # Check connector status
    kubectl describe kafkaconnector postgres-source -n kafka
    
    # View Connect logs
    kubectl logs -l strimzi.io/cluster=my-connect-cluster -n kafka -f
  15. Step 15

    Enable Monitoring with Prometheus and Grafana

    Strimzi exposes JMX metrics from Kafka, ZooKeeper, and Connect via Prometheus exporters. Configure the metricsConfig in your Kafka resource to enable metrics collection. Deploy Prometheus and Grafana to visualize cluster health, throughput, latency, and consumer lag.

    # Create Kafka metrics ConfigMap
    kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/0.44.0/examples/metrics/kafka-metrics.yaml -n kafka
    
    # Install Prometheus Operator (if not already installed)
    kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/bundle.yaml
    
    # Create ServiceMonitor for Kafka
    kubectl apply -f - <<EOF
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: kafka-metrics
      namespace: kafka
    spec:
      selector:
        matchLabels:
          strimzi.io/kind: Kafka
      endpoints:
      - port: tcp-prometheus
        interval: 30s
    EOF
    
    # Deploy Grafana
    kubectl create deployment grafana --image=grafana/grafana:latest -n kafka
    kubectl expose deployment grafana --type=LoadBalancer --port=3000 -n kafka
    
    # Get Grafana URL
    kubectl get svc grafana -n kafka
    
    # Import Strimzi dashboards from
    # https://github.com/strimzi/strimzi-kafka-operator/tree/main/examples/metrics/grafana-dashboards
    
    # View Prometheus metrics directly
    kubectl port-forward svc/my-cluster-kafka-brokers 9404:9404 -n kafka
    # Visit http://localhost:9404/metrics
  16. Step 16

    Configure External Access

    Expose Kafka outside Kubernetes using LoadBalancer, NodePort, or Ingress listeners. Each broker gets a unique external address for client connections. Choose the listener type based on your cloud provider and networking setup.

    # LoadBalancer listener (AWS, GCP, Azure)
    listeners:
      - name: external
        port: 9094
        type: loadbalancer
        tls: true
        authentication:
          type: tls
        configuration:
          brokerCertChainAndKey:
            secretName: kafka-tls-cert
            certificate: tls.crt
            key: tls.key
    ---
    # NodePort listener (on-prem, local)
    listeners:
      - name: external
        port: 9094
        type: nodeport
        tls: true
        configuration:
          preferredNodePortAddressType: ExternalIP
          brokers:
            - broker: 0
              advertisedHost: <node-1-external-ip>
              nodePort: 32100
            - broker: 1
              advertisedHost: <node-2-external-ip>
              nodePort: 32101
            - broker: 2
              advertisedHost: <node-3-external-ip>
              nodePort: 32102
    ---
    # Ingress listener (with NGINX or similar)
    listeners:
      - name: external
        port: 9094
        type: ingress
        tls: true
        configuration:
          bootstrap:
            host: kafka-bootstrap.example.com
          brokers:
            - broker: 0
              host: kafka-0.example.com
            - broker: 1
              host: kafka-1.example.com
            - broker: 2
              host: kafka-2.example.com
          class: nginx
  17. Step 17

    Connect External Clients

    Retrieve bootstrap addresses and certificates for external connections. Configure your Kafka clients with the appropriate security protocol and credentials. Test connectivity before deploying applications.

    # Get external bootstrap address
    kubectl get kafka my-cluster -n kafka -o jsonpath='{.status.listeners[?(@.name=="external")].bootstrapServers}'
    
    # Get CA certificate for TLS
    kubectl get secret my-cluster-cluster-ca-cert -n kafka -o jsonpath='{.data.ca\.crt}' | base64 -d > ca.crt
    
    # Get client certificate (if using mutual TLS)
    kubectl get secret producer-app -n kafka -o jsonpath='{.data.user\.crt}' | base64 -d > client.crt
    kubectl get secret producer-app -n kafka -o jsonpath='{.data.user\.key}' | base64 -d > client.key
    
    # Test connection with kafka-console-producer (local machine)
    kafka-console-producer.sh \
      --bootstrap-server <external-bootstrap-address>:9094 \
      --topic test-topic \
      --producer-property security.protocol=SSL \
      --producer-property ssl.truststore.location=truststore.jks \
      --producer-property ssl.truststore.password=<password> \
      --producer-property ssl.keystore.location=keystore.jks \
      --producer-property ssl.keystore.password=<password>
    
    # Java client configuration example
    # properties.put("bootstrap.servers", "<external-address>:9094");
    # properties.put("security.protocol", "SSL");
    # properties.put("ssl.truststore.location", "/path/to/truststore.jks");
    # properties.put("ssl.truststore.password", "password");
    # properties.put("ssl.keystore.location", "/path/to/keystore.jks");
    # properties.put("ssl.keystore.password", "password");
  18. Step 18

    Upgrade Kafka Version

    Strimzi supports rolling upgrades with zero downtime. Update the Kafka version in your Kafka resource and apply. The operator upgrades brokers one at a time, ensuring the cluster remains available. Always check the Strimzi documentation for version compatibility and upgrade path.

    # Check current Kafka version
    kubectl get kafka my-cluster -n kafka -o jsonpath='{.spec.kafka.version}'
    
    # Edit Kafka resource to update version
    kubectl edit kafka my-cluster -n kafka
    # Change spec.kafka.version from 3.8.0 to 3.9.0
    # Change spec.kafka.config.inter.broker.protocol.version if needed
    
    # Or patch directly
    kubectl patch kafka my-cluster -n kafka --type=merge -p '{
      "spec": {
        "kafka": {
          "version": "3.9.0",
          "config": {
            "log.message.format.version": "3.9",
            "inter.broker.protocol.version": "3.9"
          }
        }
      }
    }'
    
    # Monitor upgrade progress
    kubectl get pods -l strimzi.io/cluster=my-cluster -n kafka -w
    
    # Check Kafka logs during upgrade
    kubectl logs my-cluster-kafka-0 -n kafka -f
    
    # Verify upgrade completed
    kubectl get kafka my-cluster -n kafka -o yaml | grep version:
    
    # After upgrade, update protocol versions if needed
    # This may require a second rolling restart
    ⚠ Heads up: Always upgrade Strimzi operator first, then Kafka version. Test upgrades in a non-production environment. Some version jumps require incremental upgrades (e.g., 3.6 → 3.7 → 3.8).
  19. Step 19

    Backup and Disaster Recovery

    Implement regular backups of Kafka topic data and cluster metadata. Use MirrorMaker2 for active-passive or active-active replication to a disaster recovery cluster. Back up PersistentVolumes and ZooKeeper state. Test restore procedures regularly.

    # Deploy MirrorMaker2 for cross-cluster replication
    kubectl apply -f - <<EOF
    apiVersion: kafka.strimzi.io/v1beta2
    kind: KafkaMirrorMaker2
    metadata:
      name: disaster-recovery-mirror
      namespace: kafka
    spec:
      version: 3.9.0
      replicas: 1
      connectCluster: "target"
      clusters:
        - alias: "source"
          bootstrapServers: my-cluster-kafka-bootstrap:9092
        - alias: "target"
          bootstrapServers: dr-cluster-kafka-bootstrap:9092
      mirrors:
        - sourceCluster: "source"
          targetCluster: "target"
          sourceConnector:
            config:
              replication.factor: 3
              offset-syncs.topic.replication.factor: 3
              sync.topic.acls.enabled: "false"
          heartbeatConnector:
            config:
              heartbeats.topic.replication.factor: 3
          checkpointConnector:
            config:
              checkpoints.topic.replication.factor: 3
          topicsPattern: ".*"
          groupsPattern: ".*"
    EOF
    
    # Backup PersistentVolumes using Velero
    kubectl create ns velero
    velero install --provider aws --bucket kafka-backups --secret-file ./credentials-velero
    
    # Create backup schedule
    velero schedule create kafka-daily --schedule="0 2 * * *" --include-namespaces kafka
    
    # Manual backup
    velero backup create kafka-backup-$(date +%Y%m%d) --include-namespaces kafka
    
    # List backups
    velero backup get
    
    # Restore from backup
    velero restore create --from-backup kafka-backup-20260529
  20. Step 20

    Security Hardening

    Enable TLS for all listeners, use mutual TLS or SCRAM-SHA-512 authentication, implement network policies, enable authorization with ACLs, and use Pod Security Standards. Regular security audits and updates are essential for production clusters.

    # Network Policy to restrict Kafka access
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: kafka-network-policy
      namespace: kafka
    spec:
      podSelector:
        matchLabels:
          strimzi.io/cluster: my-cluster
      policyTypes:
        - Ingress
        - Egress
      ingress:
        - from:
            - namespaceSelector:
                matchLabels:
                  name: applications
          ports:
            - protocol: TCP
              port: 9092
            - protocol: TCP
              port: 9093
      egress:
        - to:
            - podSelector:
                matchLabels:
                  strimzi.io/cluster: my-cluster
          ports:
            - protocol: TCP
              port: 9091
        - to:
            - podSelector:
                matchLabels:
                  strimzi.io/name: my-cluster-zookeeper
          ports:
            - protocol: TCP
              port: 2181
    ---
    # Pod Security Context for Kafka
    template:
      pod:
        securityContext:
          runAsNonRoot: true
          runAsUser: 1000
          fsGroup: 1000
          seccompProfile:
            type: RuntimeDefault
        affinity:
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                    - key: strimzi.io/cluster
                      operator: In
                      values:
                        - my-cluster
                topologyKey: kubernetes.io/hostname
  21. Step 21

    Troubleshooting Common Issues

    Common problems include pod restarts due to resource constraints, topic creation failures, authentication errors, and network connectivity issues. Always check operator logs first, then individual component logs. Describe resources to see events and status conditions.

    # Check operator logs
    kubectl logs -l name=strimzi-cluster-operator -n kafka --tail=100
    
    # Check Kafka broker logs
    kubectl logs my-cluster-kafka-0 -n kafka --tail=100
    
    # Check ZooKeeper logs
    kubectl logs my-cluster-zookeeper-0 -n kafka --tail=100
    
    # Check entity operator logs (Topic/User operator)
    kubectl logs -l strimzi.io/name=my-cluster-entity-operator -n kafka -c topic-operator --tail=100
    kubectl logs -l strimzi.io/name=my-cluster-entity-operator -n kafka -c user-operator --tail=100
    
    # Describe Kafka resource for status
    kubectl describe kafka my-cluster -n kafka
    
    # Check resource events
    kubectl get events -n kafka --sort-by='.lastTimestamp'
    
    # Verify storage is provisioned
    kubectl get pvc -n kafka
    
    # Check pod resource usage
    kubectl top pods -n kafka
    
    # Test network connectivity between pods
    kubectl exec -it my-cluster-kafka-0 -n kafka -- nc -zv my-cluster-zookeeper-client 2181
    
    # Verify DNS resolution
    kubectl exec -it my-cluster-kafka-0 -n kafka -- nslookup my-cluster-kafka-bootstrap
    
    # Check Kafka topic status
    kubectl exec -it my-cluster-kafka-0 -n kafka -- bin/kafka-topics.sh \
      --bootstrap-server localhost:9092 \
      --describe --topic <topic-name>
    
    # View under-replicated partitions
    kubectl exec -it my-cluster-kafka-0 -n kafka -- bin/kafka-topics.sh \
      --bootstrap-server localhost:9092 \
      --describe --under-replicated-partitions
  22. Step 22

    Performance Tuning

    Optimize Kafka performance by tuning JVM settings, adjusting broker configurations, sizing persistent storage appropriately, and configuring producer/consumer clients correctly. Monitor key metrics like throughput, latency, and disk I/O to identify bottlenecks.

    # Performance-tuned Kafka configuration
    kafka:
      jvmOptions:
        -Xms: 8192m
        -Xmx: 8192m
        -XX:
          UseG1GC: true
          MaxGCPauseMillis: 20
          InitiatingHeapOccupancyPercent: 35
          G1HeapRegionSize: 16m
      config:
        # Network tuning
        num.network.threads: 8
        num.io.threads: 16
        socket.send.buffer.bytes: 1048576
        socket.receive.buffer.bytes: 1048576
        socket.request.max.bytes: 104857600
        
        # Log tuning
        num.partitions: 16
        log.segment.bytes: 1073741824
        log.retention.check.interval.ms: 300000
        log.flush.interval.messages: 10000
        
        # Replication tuning
        replica.fetch.max.bytes: 1048576
        replica.lag.time.max.ms: 30000
        
        # Compression
        compression.type: lz4
        
      resources:
        requests:
          memory: 16Gi
          cpu: "4"
        limits:
          memory: 16Gi
          cpu: "8"
      
      # Use high-performance storage class
      storage:
        type: persistent-claim
        size: 500Gi
        class: high-iops-ssd
  23. Step 23

    Migrating from KRaft Mode (ZooKeeper-less)

    Kafka 3.3+ supports KRaft mode (Kafka Raft metadata mode), which removes the ZooKeeper dependency. Strimzi 0.32+ supports KRaft deployments. This simplifies architecture and improves metadata scalability. For new deployments, consider starting with KRaft mode.

    apiVersion: kafka.strimzi.io/v1beta2
    kind: Kafka
    metadata:
      name: kraft-cluster
      namespace: kafka
    spec:
      kafka:
        version: 3.9.0
        replicas: 3
        listeners:
          - name: plain
            port: 9092
            type: internal
            tls: false
          - name: tls
            port: 9093
            type: internal
            tls: true
        config:
          offsets.topic.replication.factor: 3
          transaction.state.log.replication.factor: 3
          transaction.state.log.min.isr: 2
          default.replication.factor: 3
          min.insync.replicas: 2
        storage:
          type: persistent-claim
          size: 100Gi
        # KRaft-specific: no zookeeper section needed
        metadataVersion: 3.9-IV0
      entityOperator:
        topicOperator: {}
        userOperator: {}
    ⚠ Heads up: Migration from ZooKeeper to KRaft is one-way and requires careful planning. Test thoroughly in non-production environments. Not all Strimzi features are available in KRaft mode yet - check documentation for current limitations.

Feature requests

Sign in to suggest features or vote on existing ones.

No feature requests yet.

Discussion

0 people marked this as worked·Sign in to mark your own.

Sign in to join the discussion.

No comments yet.