Intermediateelasticsearchsearchanalyticslucenenosqldistributedfull-text-searchelk-stackobservabilitydata-store

Elasticsearch - Distributed Search and Analytics Engine

Install and configure Elasticsearch, the powerful open-source distributed RESTful search and analytics engine built on Apache Lucene - covering installation, indexing, search queries, aggregations, cluster management, and production best practices.

Step 1
Overview
Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. Originally developed by Shay Banon in 2010, it has become the leading solution for full-text search, log analytics, and real-time data analysis. With 70,000+ GitHub stars, Elasticsearch powers search features for companies like Netflix, Uber, GitHub, and Wikipedia.

Key capabilities:
- Distributed architecture: Horizontal scaling across multiple nodes with automatic sharding and replication
- Full-text search: Advanced text analysis with 40+ language analyzers and custom tokenizers
- Real-time indexing: Near real-time data ingestion and search with sub-second latency
- RESTful API: Simple JSON-based HTTP interface for all operations
- Aggregations: Powerful analytics framework for metrics, bucketing, and pipeline aggregations
- Schema-free JSON: Dynamic mapping with automatic field type detection
- Multi-tenancy: Index-level isolation for multiple datasets in one cluster
Why Elasticsearch:
- Battle-tested: Powers some of the world's largest search deployments (billions of documents)
- ELK Stack: Native integration with Logstash and Kibana for complete observability
- Rich ecosystem: Clients for Java, Python, JavaScript, Go, Ruby, and 10+ languages
- Scalable: Scales from single-node development to multi-datacenter production clusters
- Versatile: Search engines, log analytics, metrics, security analytics, business intelligence
```
Official site: https://www.elastic.co/elasticsearch
GitHub: https://github.com/elastic/elasticsearch (70K+ stars)
Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
Download: https://www.elastic.co/downloads/elasticsearch
```
Step 2
Technology Stack
Elasticsearch is built on a sophisticated stack of Java technologies optimized for distributed systems and high-performance search.

Core platform:
- Java 21+ (bundled with distribution)
- Apache Lucene (text search library and inverted index engine)
- Netty for async HTTP and transport layer
- Jackson for JSON serialization
- Log4j2 for structured logging
Distributed systems:
- Custom cluster coordination (formerly Zen Discovery, now based on Raft consensus)
- Segment-based storage with automatic merge policies
- Vector clock for distributed versioning
- Cross-cluster replication for disaster recovery
Data structures:
- Inverted index for text search (term → document IDs)
- Doc values for sorting and aggregations (column-oriented storage)
- BKD trees for numeric and geo-spatial indexing
- Finite state transducers (FST) for efficient term lookups
Query execution:
- Two-phase distributed search (query then fetch)
- Scoring with TF-IDF and BM25 algorithms
- Vector search with k-NN for semantic/machine learning use cases
- Query cache and request cache for performance
```
Architecture:
├── Core: Java 21, Apache Lucene
├── Network: Netty (HTTP + Transport)
├── Serialization: Jackson (JSON)
├── Coordination: Raft consensus
└── Storage: Segment-based with LSM-tree patterns

Data structures:
├── Inverted index (text search)
├── Doc values (aggregations)
├── BKD trees (numerics, geo)
└── FST (term lookups)

Query:
├── Distributed search (query → fetch)
├── Scoring: BM25 (default), TF-IDF
└── Vector: k-NN, ANN algorithms
```

Step 3

Quick Installation Options

Multiple installation methods available depending on your environment and use case.

Installation options:

Docker: Fastest for development and testing
Binary archive: Direct download for any platform
Package managers: APT, YUM, Homebrew for production
Kubernetes: Elastic Cloud on Kubernetes (ECK) operator
Elastic Cloud: Fully managed SaaS offering

System requirements:

2+ GB RAM (4+ GB recommended for production)
64-bit OS (Linux, macOS, Windows)
Java bundled with distribution (no separate install needed)
Sufficient disk space for indices (varies by use case)

# Option 1: Docker (quick start)
docker run -d \
  --name elasticsearch \
  -p 9200:9200 -p 9300:9300 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  docker.elastic.co/elasticsearch/elasticsearch:8.13.0

# Verify installation
curl http://localhost:9200
# Output: cluster info JSON with version, tagline

# Option 2: Binary (Linux/macOS)
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.13.0-linux-x86_64.tar.gz
tar -xzf elasticsearch-8.13.0-linux-x86_64.tar.gz
cd elasticsearch-8.13.0/
./bin/elasticsearch

# Option 3: Homebrew (macOS)
brew tap elastic/tap
brew install elastic/tap/elasticsearch-full
brew services start elastic/tap/elasticsearch-full

# Option 4: APT (Debian/Ubuntu)
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt update && sudo apt install elasticsearch
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch

# Option 5: YUM (RHEL/CentOS)
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
cat > /etc/yum.repos.d/elasticsearch.repo << 'EOF'
[elasticsearch]
name=Elasticsearch repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
EOF
sudo yum install elasticsearch
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch

# Verify
curl http://localhost:9200

Step 4

Basic Configuration

Elasticsearch uses YAML configuration files located in config/ directory. Key files are elasticsearch.yml (main config) and jvm.options (JVM settings).

Essential settings:

cluster.name: Cluster identifier (nodes with same name join together)
node.name: Human-readable node identifier
path.data: Where indices are stored (critical for backups)
path.logs: Log file location
network.host: Network binding address
http.port: HTTP API port (default 9200)
discovery.seed_hosts: Bootstrap cluster discovery
cluster.initial_master_nodes: Initial master-eligible nodes

# config/elasticsearch.yml - Basic single-node configuration

cluster.name: my-application
node.name: node-1

# Data and logs paths
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

# Network settings
network.host: 0.0.0.0
http.port: 9200

# Single-node cluster (development)
discovery.type: single-node

# Security (disable for development, enable for production)
xpack.security.enabled: false
xpack.security.enrollment.enabled: false

# --- Production multi-node configuration ---

# cluster.yml for 3-node cluster
cluster.name: production-cluster
node.name: ${HOSTNAME}  # Set via environment variable

# Node roles (can combine multiple)
node.roles: [ master, data, ingest ]

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

network.host: 0.0.0.0
http.port: 9200
transport.port: 9300

# Cluster discovery
discovery.seed_hosts:
  - es-node1:9300
  - es-node2:9300
  - es-node3:9300

cluster.initial_master_nodes:
  - es-node1
  - es-node2
  - es-node3

# Security
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12

Step 5

JVM and Memory Configuration

Elasticsearch is a Java application, so JVM tuning is critical for performance. The heap size is the most important setting.

Heap size rules:

Set -Xms and -Xmx to the same value (prevents heap resizing)
Never exceed 50% of physical RAM (leave space for OS file cache)
Never exceed ~31 GB (compressed object pointers threshold)
For a 64 GB server, set heap to 31 GB
For a 16 GB server, set heap to 8 GB

# config/jvm.options - JVM heap settings

# Set heap size (example: 8 GB for a 16 GB server)
-Xms8g
-Xmx8g

# Production recommended settings
-XX:+UseG1GC
-XX:G1ReservePercent=25
-XX:InitiatingHeapOccupancyPercent=30

# Heap dump on out-of-memory
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/lib/elasticsearch

# GC logging (helpful for troubleshooting)
-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m

# Environment variable approach (overrides jvm.options)
export ES_JAVA_OPTS="-Xms8g -Xmx8g"
./bin/elasticsearch

# Docker environment variable
docker run -d \
  -e "ES_JAVA_OPTS=-Xms4g -Xmx4g" \
  docker.elastic.co/elasticsearch/elasticsearch:8.13.0

Step 6

First Steps: Creating an Index and Adding Documents

Elasticsearch stores data in indices (similar to databases) containing documents (similar to rows). Documents are JSON objects. Let's create an index and add documents.

Key concepts:

Index: Collection of documents with similar characteristics
Document: Basic unit of information (JSON)
Field: Key-value pair in a document
Mapping: Schema definition (field types and settings)
Shard: Index subdivision for horizontal scaling
Replica: Shard copy for high availability

# Create an index with explicit mapping
curl -X PUT "localhost:9200/products" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "description": { "type": "text" },
      "price": { "type": "float" },
      "category": { "type": "keyword" },
      "tags": { "type": "keyword" },
      "in_stock": { "type": "boolean" },
      "created_at": { "type": "date" }
    }
  }
}'

# Add a document (POST generates auto ID)
curl -X POST "localhost:9200/products/_doc" -H 'Content-Type: application/json' -d'
{
  "name": "Wireless Headphones",
  "description": "High-quality noise-cancelling headphones",
  "price": 299.99,
  "category": "electronics",
  "tags": ["audio", "wireless", "bluetooth"],
  "in_stock": true,
  "created_at": "2024-01-15T10:30:00Z"
}'

# Add a document with specific ID
curl -X PUT "localhost:9200/products/_doc/1" -H 'Content-Type: application/json' -d'
{
  "name": "USB-C Cable",
  "description": "Fast charging cable 2m",
  "price": 19.99,
  "category": "accessories",
  "tags": ["cable", "usb-c"],
  "in_stock": true,
  "created_at": "2024-01-16T14:20:00Z"
}'

# Bulk indexing (faster for multiple documents)
curl -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' --data-binary @- << 'EOF'
{ "index": { "_index": "products" } }
{ "name": "Laptop Stand", "price": 49.99, "category": "accessories", "in_stock": true }
{ "index": { "_index": "products" } }
{ "name": "Mechanical Keyboard", "price": 149.99, "category": "electronics", "in_stock": false }
EOF

# Retrieve a document by ID
curl -X GET "localhost:9200/products/_doc/1"

# Get index mapping
curl -X GET "localhost:9200/products/_mapping"

# Get index stats
curl -X GET "localhost:9200/products/_stats"

Step 7

Search Queries: From Simple to Complex

Elasticsearch provides a rich Query DSL (Domain Specific Language) for searching documents. Queries range from simple text matches to complex boolean logic.

Query types:

Match: Full-text search with analysis
Term: Exact match (no analysis)
Range: Numeric or date ranges
Bool: Combine queries with AND/OR/NOT logic
Wildcard: Pattern matching with * and ?
Fuzzy: Approximate matching (typo tolerance)
Nested: Query nested objects
Geo: Geographic queries

# Simple match query (full-text search)
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "description": "wireless headphones"
    }
  }
}'

# Match with size and pagination
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "from": 0,
  "size": 10,
  "query": {
    "match": {
      "name": "cable"
    }
  }
}'

# Multi-match (search across multiple fields)
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "multi_match": {
      "query": "wireless",
      "fields": ["name", "description"]
    }
  }
}'

# Term query (exact match, no analysis)
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "category": "electronics"
    }
  }
}'

# Range query
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "range": {
      "price": {
        "gte": 50,
        "lte": 200
      }
    }
  }
}'

# Bool query (complex logic)
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "description": "wireless" } }
      ],
      "filter": [
        { "term": { "in_stock": true } },
        { "range": { "price": { "lte": 500 } } }
      ],
      "must_not": [
        { "term": { "category": "refurbished" } }
      ],
      "should": [
        { "match": { "tags": "bluetooth" } }
      ],
      "minimum_should_match": 1
    }
  }
}'

# Wildcard query
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "wildcard": {
      "name": "*phone*"
    }
  }
}'

# Fuzzy query (typo tolerance)
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "fuzzy": {
      "name": {
        "value": "hedphones",
        "fuzziness": "AUTO"
      }
    }
  }
}'

Step 8

Aggregations: Analytics and Metrics

Aggregations provide analytics over your data. Think of them as SQL GROUP BY on steroids. There are three types: metric (calculate metrics), bucket (group documents), and pipeline (aggregate aggregation results).

Common aggregations:

Metrics: avg, sum, min, max, stats, cardinality, percentiles
Bucket: terms (group by field), date_histogram, range, filters
Pipeline: derivative, cumulative_sum, moving_average

# Terms aggregation (group by category)
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": {
        "field": "category"
      }
    }
  }
}'

# Average price per category
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": {
        "field": "category"
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}'

# Stats aggregation (min, max, avg, sum, count)
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "price_stats": {
      "stats": {
        "field": "price"
      }
    }
  }
}'

# Date histogram (time-series data)
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "products_over_time": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "month"
      },
      "aggs": {
        "total_revenue": {
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
}'

# Percentiles aggregation
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "price_percentiles": {
      "percentiles": {
        "field": "price",
        "percents": [50, 75, 90, 95, 99]
      }
    }
  }
}'

# Range aggregation
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 50 },
          { "from": 50, "to": 100 },
          { "from": 100 }
        ]
      }
    }
  }
}'

Step 9

Index Management: Mappings, Aliases, and Templates

Effective index management is crucial for performance and maintainability. This includes defining mappings, using aliases for zero-downtime reindexing, and templates for consistent settings.

Best practices:

Define explicit mappings (don't rely on dynamic mapping for production)
Use index aliases for production indices
Create index templates for time-series data
Set appropriate shard counts (over-sharding hurts performance)
Use index lifecycle management (ILM) for data retention

# Update mapping (add new field to existing index)
curl -X PUT "localhost:9200/products/_mapping" -H 'Content-Type: application/json' -d'
{
  "properties": {
    "manufacturer": {
      "type": "keyword"
    },
    "rating": {
      "type": "float"
    }
  }
}'

# Create index alias
curl -X POST "localhost:9200/_aliases" -H 'Content-Type: application/json' -d'
{
  "actions": [
    {
      "add": {
        "index": "products",
        "alias": "products-latest"
      }
    }
  ]
}'

# Atomic alias switch (zero-downtime reindex)
curl -X POST "localhost:9200/_aliases" -H 'Content-Type: application/json' -d'
{
  "actions": [
    { "remove": { "index": "products-v1", "alias": "products" } },
    { "add": { "index": "products-v2", "alias": "products" } }
  ]
}'

# Create index template
curl -X PUT "localhost:9200/_index_template/logs_template" -H 'Content-Type: application/json' -d'
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1,
      "refresh_interval": "5s"
    },
    "mappings": {
      "properties": {
        "timestamp": { "type": "date" },
        "level": { "type": "keyword" },
        "message": { "type": "text" },
        "service": { "type": "keyword" }
      }
    }
  }
}'

# Now any index matching logs-* gets these settings
curl -X PUT "localhost:9200/logs-2024-01-15"

# Reindex data from old index to new
curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "products-old"
  },
  "dest": {
    "index": "products-new"
  }
}'

# Delete index
curl -X DELETE "localhost:9200/products-old"

# Get all indices
curl -X GET "localhost:9200/_cat/indices?v"

# Get cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"

Step 10

Production Cluster Setup

Production deployments require a multi-node cluster for high availability and scalability. A typical setup includes dedicated master, data, and ingest nodes.

Node roles:

Master: Cluster state management (lightweight, 3+ nodes for quorum)
Data: Store indices and execute queries (most resources)
Ingest: Pre-process documents (optional)
Coordinating: Route requests (no data, no master)
ML: Machine learning (optional)

Minimum production cluster: 3 master-eligible nodes + 2+ data nodes

# Master node config (es-master-1)
cluster.name: production
node.name: master-1
node.roles: [ master ]

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

network.host: 0.0.0.0
http.port: 9200
transport.port: 9300

discovery.seed_hosts:
  - master-1:9300
  - master-2:9300
  - master-3:9300

cluster.initial_master_nodes:
  - master-1
  - master-2
  - master-3

xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true

---

# Data node config (es-data-1)
cluster.name: production
node.name: data-1
node.roles: [ data, ingest ]

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

network.host: 0.0.0.0
http.port: 9200
transport.port: 9300

discovery.seed_hosts:
  - master-1:9300
  - master-2:9300
  - master-3:9300

xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true

# Hot/Warm architecture (data tiers)
node.attr.data: hot  # or warm, cold

---

# Coordinating-only node (load balancer)
cluster.name: production
node.name: coordinator-1
node.roles: [ ]  # No roles = coordinating only

network.host: 0.0.0.0
http.port: 9200
transport.port: 9300

discovery.seed_hosts:
  - master-1:9300
  - master-2:9300
  - master-3:9300

Step 11

Security: Authentication and TLS

Elasticsearch security features (X-Pack Security) provide authentication, authorization, and encryption. Essential for production deployments.

Security layers:

TLS: Encrypt HTTP and transport communication
Authentication: Built-in, LDAP, Active Directory, SAML, OpenID Connect
Authorization: Role-based access control (RBAC)
Audit logging: Track security events
Field/document level security: Fine-grained access control

# Generate certificates for inter-node communication
cd /usr/share/elasticsearch
bin/elasticsearch-certutil ca --pem
# Creates elastic-stack-ca.zip

unzip elastic-stack-ca.zip
bin/elasticsearch-certutil cert \
  --ca-cert ca/ca.crt \
  --ca-key ca/ca.key \
  --pem \
  --name node1 \
  --dns node1.example.com \
  --ip 192.168.1.10

# Copy certificates to config/certs/
mkdir config/certs
cp node1/node1.crt node1/node1.key ca/ca.crt config/certs/
chmod 644 config/certs/*

# Enable security in elasticsearch.yml
xpack.security.enabled: true

# TLS for transport (inter-node)
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.certificate: certs/node1.crt
xpack.security.transport.ssl.key: certs/node1.key
xpack.security.transport.ssl.certificate_authorities: [ "certs/ca.crt" ]

# TLS for HTTP (client connections)
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.certificate: certs/node1.crt
xpack.security.http.ssl.key: certs/node1.key
xpack.security.http.ssl.certificate_authorities: [ "certs/ca.crt" ]

# Set built-in user passwords
bin/elasticsearch-setup-passwords auto
# Or interactive:
bin/elasticsearch-setup-passwords interactive

# Create custom user
curl -X POST "https://localhost:9200/_security/user/john" \
  -u elastic:password -k \
  -H 'Content-Type: application/json' -d'
{
  "password" : "s3cr3t",
  "roles" : [ "kibana_admin", "monitoring_user" ],
  "full_name" : "John Doe",
  "email" : "john@example.com"
}'

# Create custom role
curl -X POST "https://localhost:9200/_security/role/products_read" \
  -u elastic:password -k \
  -H 'Content-Type: application/json' -d'
{
  "indices": [
    {
      "names": [ "products*" ],
      "privileges": [ "read" ]
    }
  ]
}'

# Test authenticated request
curl -u john:s3cr3t -k https://localhost:9200/_cluster/health

Step 12

Monitoring and Observability

Monitor Elasticsearch health and performance using built-in APIs and the Elastic Stack (formerly ELK Stack).

Key metrics to monitor:

Cluster health (green/yellow/red)
Node CPU, memory, disk usage
JVM heap usage and GC times
Query latency and throughput
Indexing rate and latency
Shard count and size
Rejected threads (thread pool saturation)

# Cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"
# Status: green (all good), yellow (replicas missing), red (primary missing)

# Node stats (detailed metrics)
curl -X GET "localhost:9200/_nodes/stats?pretty"

# Index stats
curl -X GET "localhost:9200/products/_stats?pretty"

# Thread pool stats (watch for rejections)
curl -X GET "localhost:9200/_cat/thread_pool?v&h=name,queue,active,rejected,completed"

# Pending tasks (should be near zero)
curl -X GET "localhost:9200/_cluster/pending_tasks"

# Hot threads (troubleshoot CPU spikes)
curl -X GET "localhost:9200/_nodes/hot_threads"

# Recovery status (ongoing shard movements)
curl -X GET "localhost:9200/_cat/recovery?v&active_only=true"

# Allocation explanation (why shard isn't allocated)
curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"

# Enable slow log for queries (add to elasticsearch.yml)
index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s

index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s

# Or set dynamically
curl -X PUT "localhost:9200/products/_settings" -H 'Content-Type: application/json' -d'
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.fetch.debug": "500ms"
}'

# Metricbeat for comprehensive monitoring
# Download and configure Metricbeat, then enable elasticsearch module
metricbeat modules enable elasticsearch
metricbeat setup
metricbeat -e

Step 13

Client Libraries and Language SDKs

Elasticsearch provides official clients for many programming languages. All clients support the full REST API with language-specific idioms.

Official clients:

Java (High-Level REST Client, Java API Client)
Python (elasticsearch-py)
JavaScript/Node.js (@elastic/elasticsearch)
Go (go-elasticsearch)
Ruby (elasticsearch-ruby)
PHP (elasticsearch-php)
.NET (Elasticsearch.Net, NEST)
Rust (elasticsearch-rs)

# Python client example
from elasticsearch import Elasticsearch

# Create client
es = Elasticsearch(
    ["http://localhost:9200"],
    basic_auth=("elastic", "password")
)

# Check cluster health
health = es.cluster.health()
print(f"Cluster status: {health['status']}")

# Index a document
response = es.index(
    index="products",
    id=1,
    document={
        "name": "Laptop",
        "price": 999.99,
        "category": "electronics"
    }
)
print(f"Indexed: {response['result']}")

# Search
response = es.search(
    index="products",
    body={
        "query": {
            "match": {
                "name": "laptop"
            }
        }
    }
)

for hit in response['hits']['hits']:
    print(f"{hit['_source']['name']}: ${hit['_source']['price']}")

# Aggregation
response = es.search(
    index="products",
    body={
        "size": 0,
        "aggs": {
            "categories": {
                "terms": {
                    "field": "category"
                }
            }
        }
    }
)

for bucket in response['aggregations']['categories']['buckets']:
    print(f"{bucket['key']}: {bucket['doc_count']} products")

Step 14

JavaScript/Node.js Client Example

The official JavaScript client works in both Node.js and browser environments with full TypeScript support.

// npm install @elastic/elasticsearch

const { Client } = require('@elastic/elasticsearch');

// Create client
const client = new Client({
  node: 'http://localhost:9200',
  auth: {
    username: 'elastic',
    password: 'password'
  }
});

// Index a document
async function indexDocument() {
  const result = await client.index({
    index: 'products',
    id: 1,
    document: {
      name: 'Smartphone',
      price: 699.99,
      category: 'electronics',
      in_stock: true
    }
  });
  console.log('Indexed:', result.result);
}

// Search with bool query
async function search() {
  const result = await client.search({
    index: 'products',
    query: {
      bool: {
        must: [
          { match: { category: 'electronics' } }
        ],
        filter: [
          { range: { price: { lte: 1000 } } }
        ]
      }
    },
    sort: [
      { price: 'asc' }
    ],
    size: 10
  });

  result.hits.hits.forEach(hit => {
    console.log(`${hit._source.name}: $${hit._source.price}`);
  });
}

// Aggregation with sub-aggregation
async function aggregate() {
  const result = await client.search({
    index: 'products',
    size: 0,
    aggs: {
      by_category: {
        terms: { field: 'category' },
        aggs: {
          avg_price: {
            avg: { field: 'price' }
          }
        }
      }
    }
  });

  result.aggregations.by_category.buckets.forEach(bucket => {
    console.log(`${bucket.key}: ${bucket.doc_count} items, avg price $${bucket.avg_price.value.toFixed(2)}`);
  });
}

// Run examples
async function main() {
  await indexDocument();
  await search();
  await aggregate();
}

main().catch(console.error);

Step 15
Common Use Cases
Elasticsearch excels in several key domains:

1. Full-text search: Power website search, e-commerce product search, documentation search. Think GitHub code search, Stack Overflow, Netflix search.

2. Log and event analytics: Centralize logs from applications and infrastructure. The "L" in the ELK/Elastic Stack (Elasticsearch + Logstash + Kibana).

3. Metrics and APM: Store and analyze application performance metrics, infrastructure metrics, business metrics.

4. Security analytics: SIEM (Security Information and Event Management), threat detection, audit logs. Elastic Security provides pre-built detections.

5. Business analytics: Real-time dashboards, customer behavior analytics, sales analytics. Kibana provides visualization layer.

6. Geospatial: Location-based search, geographic analytics, ride-sharing, delivery optimization.

7. Machine learning: Anomaly detection, forecasting, outlier detection via X-Pack ML.

Step 16

Performance Tuning Best Practices

Optimize Elasticsearch for your specific workload:

Indexing performance:

Increase refresh_interval during bulk indexing (default 1s → 30s or -1)
Disable replicas during initial load, re-enable after
Use bulk API instead of individual index requests
Increase index.translog.flush_threshold_size for write-heavy loads

Query performance:

Use filters instead of queries when possible (filters are cached)
Avoid deep pagination (use search_after instead of from/size)
Use doc values for sorting and aggregations
Limit _source fields returned (_source_includes)
Use index aliases for zero-downtime reindexing

Shard sizing:

Target 20-40 GB per shard for search workloads
Target 40-50 GB per shard for logging workloads
Avoid over-sharding (1000s of tiny shards hurt performance)
Use shrink API to reduce shard count
Use rollover for time-series data

Memory:

50% heap, 50% OS file cache is the golden rule
Monitor JVM heap usage (target <75%)
Use G1GC for heaps >4GB
Consider disabling swapping (bootstrap.memory_lock: true)

# Disable refresh during bulk indexing
curl -X PUT "localhost:9200/products/_settings" -H 'Content-Type: application/json' -d'
{
  "index": {
    "refresh_interval": "-1",
    "number_of_replicas": 0
  }
}'

# Bulk index (do your indexing here)

# Re-enable refresh and replicas
curl -X PUT "localhost:9200/products/_settings" -H 'Content-Type: application/json' -d'
{
  "index": {
    "refresh_interval": "1s",
    "number_of_replicas": 1
  }
}'

# Force merge after bulk indexing (optimize segments)
curl -X POST "localhost:9200/products/_forcemerge?max_num_segments=1"

# Use search_after for deep pagination (more efficient than from/size)
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "size": 10,
  "sort": [
    { "created_at": "asc" },
    { "_id": "asc" }
  ]
}'
# Use last hit's sort values in search_after for next page

# Disable swapping (add to elasticsearch.yml)
bootstrap.memory_lock: true

# Then run on Linux:
sudo systemctl edit elasticsearch
# Add:
[Service]
LimitMEMLOCK=infinity

Step 17

Backup and Restore

Elasticsearch snapshots provide backup and disaster recovery. Snapshots are incremental and stored in a repository (filesystem, S3, GCS, Azure).

Best practices:

Automate snapshots (daily or hourly)
Store snapshots off-cluster (S3, GCS, Azure)
Test restores regularly
Use Snapshot Lifecycle Management (SLM) for automation

# Register snapshot repository (filesystem)
curl -X PUT "localhost:9200/_snapshot/my_backup" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/mount/backups/elasticsearch"
  }
}'

# Add to elasticsearch.yml first:
# path.repo: ["/mount/backups/elasticsearch"]

# Create snapshot
curl -X PUT "localhost:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true"

# Snapshot specific indices
curl -X PUT "localhost:9200/_snapshot/my_backup/snapshot_2" -H 'Content-Type: application/json' -d'
{
  "indices": "products,logs-*",
  "ignore_unavailable": true,
  "include_global_state": false
}'

# List snapshots
curl -X GET "localhost:9200/_snapshot/my_backup/_all?pretty"

# Restore snapshot
curl -X POST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore" -H 'Content-Type: application/json' -d'
{
  "indices": "products",
  "ignore_unavailable": true,
  "include_global_state": false,
  "rename_pattern": "products",
  "rename_replacement": "restored-products"
}'

# S3 repository (AWS)
curl -X PUT "localhost:9200/_snapshot/s3_backup" -H 'Content-Type: application/json' -d'
{
  "type": "s3",
  "settings": {
    "bucket": "my-es-backups",
    "region": "us-east-1",
    "base_path": "elasticsearch/snapshots"
  }
}'

# Requires repository-s3 plugin:
# bin/elasticsearch-plugin install repository-s3
# Configure AWS credentials in elasticsearch-keystore

# Delete old snapshots
curl -X DELETE "localhost:9200/_snapshot/my_backup/snapshot_1"

Step 18

Kubernetes Deployment with ECK

Elastic Cloud on Kubernetes (ECK) is the official operator for deploying and managing Elasticsearch on Kubernetes. It automates deployment, upgrades, scaling, and monitoring.

# Install ECK operator
kubectl create -f https://download.elastic.co/downloads/eck/2.12.0/crds.yaml
kubectl apply -f https://download.elastic.co/downloads/eck/2.12.0/operator.yaml

# Verify operator is running
kubectl -n elastic-system logs -f statefulset.apps/elastic-operator

# Deploy Elasticsearch cluster
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: production
  namespace: elastic
spec:
  version: 8.13.0
  nodeSets:
  - name: master
    count: 3
    config:
      node.roles: ["master"]
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: fast-ssd
  - name: data
    count: 3
    config:
      node.roles: ["data", "ingest"]
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: fast-ssd
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          resources:
            requests:
              memory: 8Gi
              cpu: 2
            limits:
              memory: 8Gi
              cpu: 4
          env:
          - name: ES_JAVA_OPTS
            value: "-Xms4g -Xmx4g"
EOF

# Get cluster password
PASSWORD=$(kubectl get secret production-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')
echo "Elasticsearch password: $PASSWORD"

# Access via port-forward
kubectl port-forward service/production-es-http 9200
curl -u "elastic:$PASSWORD" -k "https://localhost:9200"

# Scale data nodes
kubectl patch elasticsearch production --type='merge' -p '
{
  "spec": {
    "nodeSets": [
      {"name": "data", "count": 5}
    ]
  }
}'

Step 19
Resources & Next Steps
Documentation:
Community:
Related tools:
- Kibana - Visualization and dashboards
- Logstash - Data pipeline and ingestion
- Beats - Lightweight data shippers
- APM - Application performance monitoring
- Fleet - Centralized management for Elastic Agents
Learning:
- Elastic Training - Official courses
- Elasticsearch: The Definitive Guide
- Elasticsearch in Action (Manning)
Next guides:
- ELK Stack: Complete log analytics pipeline
- Kibana: Building dashboards and visualizations
- Logstash: Data ingestion and transformation
- Elasticsearch performance tuning deep dive
```
GitHub: https://github.com/elastic/elasticsearch
Official site: https://www.elastic.co/elasticsearch
Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/
Downloads: https://www.elastic.co/downloads/elasticsearch
Community: https://discuss.elastic.co/
Training: https://www.elastic.co/training
```