Druid Basics

Druid is a high performance, real-time analytics database that delivers sub-second queries on streaming and batch data at scale and under load. Druid is a column-oriented, open-source, distributed data store written in Java. Druid is designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data.

Key Features of Apache Druid

Apache Druid has many features that make it a popular choice for real-time data analysis. Here are a few key features:

Fast Querying

Druid is designed to provide fast query performance even on large datasets. Druid achieves this by pre-aggregating data at ingestion time and storing it in a column-oriented storage format. This allows Druid to quickly scan and filter data, reducing the amount of data that needs to be processed.

Real-time Ingestion

Druid is designed to ingest data in real-time, allowing users to quickly query and analyze data as it is generated. Druid’s ingestion pipeline is highly customizable, allowing users to tailor it to their specific use case.

Scalability

Druid is designed to scale horizontally, allowing users to add nodes to their cluster as needed. Druid’s distributed architecture allows it to handle large amounts of data and queries, making it well-suited for applications with high data ingestion rates.

Use Cases for Apache Druid

Apache Druid is used in a variety of industries and applications. Here are a few examples:

Ad Tech

Druid is used in the ad tech industry to analyze large amounts of data generated by ad impressions, clicks, and conversions. Druid’s fast query performance and real-time ingestion make it well-suited for this use case.

IoT

Druid is used in the Internet of Things (IoT) industry to analyze data generated by sensors and other IoT devices. Druid’s real-time ingestion and scalability make it well-suited for this use case.

Gaming

Druid is used in the gaming industry to analyze player behavior and game performance. Druid’s fast query performance and real-time ingestion make it well-suited for this use case.

Installing Apache Druid

Prerequisites

Before we begin, make sure that you have the following prerequisites installed on your system:

Java 8 or higher
ZooKeeper
MySQL or PostgreSQL

Step 1: Download Druid

The first step is to download the latest version of Druid from the official website. Once the download is complete, extract the contents of the archive to a directory of your choice.

https://druid.apache.org/

Step 2: Configure Druid

Next, you need to configure Druid according to your system requirements. The configuration files are located in the conf directory of the Druid installation. Here are the most important configuration files:

druid.common.runtime.properties: This file contains common configuration properties used by all Druid services.
druid.zk.properties: This file contains the ZooKeeper configuration properties.
druid.coordinator.properties: This file contains the Druid coordinator configuration properties.
druid.overlord.properties: This file contains the Druid overlord configuration properties.
druid.broker.properties: This file contains the Druid broker configuration properties.
druid.middleManager.properties: This file contains the Druid middle manager configuration properties.
druid.indexer.properties: This file contains the Druid indexer configuration properties.

Step 3: Start ZooKeeper

Druid uses ZooKeeper for coordination between its services. Start ZooKeeper by running the following command:

zkServer.sh start

Step 4: Start MySQL or PostgreSQL

Druid uses MySQL or PostgreSQL as its metadata storage. Start the database service by running the following command:

systemctl start mysql

or

systemctl start postgresql

Step 5: Start Druid Services

Finally, start the Druid services using the following commands:

java `cat conf/druid/coordinator/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/coordinator:lib/*" org.apache.druid.cli.Main server coordinator

java `cat conf/druid/overlord/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/overlord:lib/*" org.apache.druid.cli.Main server overlord

java `cat conf/druid/broker/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/broker:lib/*" org.apache.druid.cli.Main server broker

Conclusion

Apache Druid is a powerful tool for real-time data analysis. Its fast query performance, real-time ingestion, and scalability make it well-suited for a variety of industries and applications. If you’re looking for a tool to handle large amounts of data and provide fast query capabilities, Apache Druid is definitely worth considering.

References

https://druid.apache.org/docs/latest/tutorials/cluster.html

Leave a message