Druid Basics
Druid is a high performance, real-time analytics database that delivers sub-second queries on streaming and batch data at scale and under load. Druid is a column-oriented, open-source, distributed data store written in Java. Druid is designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data.
Key Features of Apache Druid
Apache Druid has many features that make it a popular choice for real-time data analysis. Here are a few key features:
Fast Querying
Druid is designed to provide fast query performance even on large datasets. Druid achieves this by pre-aggregating data at ingestion time and storing it in a column-oriented storage format. This allows Druid to quickly scan and filter data, reducing the amount of data that needs to be processed.
Real-time Ingestion
Druid is designed to ingest data in real-time, allowing users to quickly query and analyze data as it is generated. Druid’s ingestion pipeline is highly customizable, allowing users to tailor it to their specific use case.
Scalability
Druid is designed to scale horizontally, allowing users to add nodes to their cluster as needed. Druid’s distributed architecture allows it to handle large amounts of data and queries, making it well-suited for applications with high data ingestion rates.
Use Cases for Apache Druid
Apache Druid is used in a variety of industries and applications. Here are a few examples:
Ad Tech
Druid is used in the ad tech industry to analyze large amounts of data generated by ad impressions, clicks, and conversions. Druid’s fast query performance and real-time ingestion make it well-suited for this use case.
IoT
Druid is used in the Internet of Things (IoT) industry to analyze data generated by sensors and other IoT devices. Druid’s real-time ingestion and scalability make it well-suited for this use case.
Gaming
Druid is used in the gaming industry to analyze player behavior and game performance. Druid’s fast query performance and real-time ingestion make it well-suited for this use case.
Installing Apache Druid
Prerequisites
Before we begin, make sure that you have the following prerequisites installed on your system:
- Java 8 or higher
- ZooKeeper
- MySQL or PostgreSQL
Step 1: Download Druid
The first step is to download the latest version of Druid from the official website. Once the download is complete, extract the contents of the archive to a directory of your choice.
https://druid.apache.org/
Step 2: Configure Druid
Next, you need to configure Druid according to your system requirements. The configuration files are located in the conf
directory of the Druid installation. Here are the most important configuration files:
druid.common.runtime.properties
: This file contains common configuration properties used by all Druid services.druid.zk.properties
: This file contains the ZooKeeper configuration properties.druid.coordinator.properties
: This file contains the Druid coordinator configuration properties.druid.overlord.properties
: This file contains the Druid overlord configuration properties.druid.broker.properties
: This file contains the Druid broker configuration properties.druid.middleManager.properties
: This file contains the Druid middle manager configuration properties.druid.indexer.properties
: This file contains the Druid indexer configuration properties.
Step 3: Start ZooKeeper
Druid uses ZooKeeper for coordination between its services. Start ZooKeeper by running the following command:
zkServer.sh start
Step 4: Start MySQL or PostgreSQL
Druid uses MySQL or PostgreSQL as its metadata storage. Start the database service by running the following command:
systemctl start mysql
or
systemctl start postgresql
Step 5: Start Druid Services
Finally, start the Druid services using the following commands:
java `cat conf/druid/coordinator/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/coordinator:lib/*" org.apache.druid.cli.Main server coordinator
java `cat conf/druid/overlord/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/overlord:lib/*" org.apache.druid.cli.Main server overlord
java `cat conf/druid/broker/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/broker:lib/*" org.apache.druid.cli.Main server broker
Conclusion
Apache Druid is a powerful tool for real-time data analysis. Its fast query performance, real-time ingestion, and scalability make it well-suited for a variety of industries and applications. If you’re looking for a tool to handle large amounts of data and provide fast query capabilities, Apache Druid is definitely worth considering.
References
https://druid.apache.org/docs/latest/tutorials/cluster.html