# OpenTSDB — Scalable Time Series Database on HBase

> OpenTSDB is a distributed, scalable time series database built on top of Apache HBase, designed for storing and querying billions of data points from infrastructure and application metrics.

## Install

Save as a script file and run:

# OpenTSDB — Scalable Time Series Database on HBase

## Quick Use
```bash
# Build from source
git clone https://github.com/OpenTSDB/opentsdb.git
cd opentsdb && ./build.sh
# Start with HBase running
./build/tsdb tsd --port=4242 --staticroot=build/staticroot --cachedir=/tmp/opentsdb
```

## Introduction
OpenTSDB is a time series database that stores metrics data in Apache HBase, enabling it to scale horizontally to handle billions of data points. It was created at StumbleUpon and is widely used for infrastructure monitoring at organizations that already run Hadoop ecosystems.

## What OpenTSDB Does
- Stores and retrieves time series data at scale using HBase as the storage backend
- Supports high write throughput for collecting millions of data points per second
- Provides an HTTP API and built-in web UI for querying and visualizing metrics
- Implements downsampling, rate calculation, and aggregation at query time
- Tags each data point with arbitrary key-value pairs for flexible filtering

## Architecture Overview
OpenTSDB runs as a stateless daemon (TSD) that accepts data points via HTTP, Telnet, or a collector framework. Each data point consists of a metric name, timestamp, value, and one or more tags. The TSD processes writes into compact row keys optimized for HBase range scans, with metric names and tag values mapped to short UIDs to save storage. Multiple TSD instances can run in parallel behind a load balancer, sharing the same HBase cluster.

## Self-Hosting & Configuration
- Requires a running Apache HBase cluster (standalone or distributed)
- Run the create_table.sh script to initialize the OpenTSDB tables in HBase
- Configure opentsdb.conf with HBase ZooKeeper quorum and storage settings
- Deploy one or more TSD instances behind a load balancer for high availability
- Use tcollector or other agents to push metrics into the HTTP endpoint

## Key Features
- Horizontal scalability via HBase with no single point of failure
- Tag-based data model allows flexible, ad-hoc queries across dimensions
- Built-in downsampling reduces storage costs for older data
- HTTP JSON API for integration with Grafana and custom dashboards
- Supports rate calculations, interpolation, and mathematical expressions in queries

## Comparison with Similar Tools
- **Prometheus** — pull-based with local storage; OpenTSDB uses push-based writes and HBase for long-term scale
- **InfluxDB** — standalone time series DB; OpenTSDB leverages existing HBase infrastructure
- **TimescaleDB** — PostgreSQL extension; OpenTSDB is purpose-built for Hadoop ecosystems
- **VictoriaMetrics** — Prometheus-compatible; OpenTSDB predates it and integrates with HBase/HDFS
- **Graphite** — whisper-file storage limits scale; OpenTSDB scales horizontally via HBase

## FAQ
**Q: Does OpenTSDB require Hadoop?**
A: OpenTSDB requires HBase, which typically runs on HDFS. For small deployments, HBase standalone mode works without a full Hadoop cluster.

**Q: How does OpenTSDB handle high cardinality?**
A: OpenTSDB maps metric names and tag values to compact UIDs. Very high cardinality (millions of unique tag combinations) can degrade query performance.

**Q: Can I use OpenTSDB with Grafana?**
A: Yes. Grafana includes a built-in OpenTSDB data source plugin for querying and visualizing metrics.

**Q: What is the maximum retention period?**
A: OpenTSDB has no built-in retention limit. Data persists in HBase until explicitly deleted or managed via HBase TTL settings on the tables.

## Sources
- https://github.com/OpenTSDB/opentsdb
- http://opentsdb.net/docs/build/html/index.html

---
Source: https://tokrepo.com/en/workflows/asset-8b5cc318
Author: Script Depot