I spent my weekend to setup an InfluxDB infrastructure. My goal is to have a high availability infrastructure. If you do some readings, you will see that the high availability only available in Enterprise version. There is some alternatives, i select the most simple way to do: InfluxDB with InfluxDB relay. My architecture includes: one InfluxDB relay and 2 Data Nodes.

I don’t go to the setup detail here but i want to share some knowledge when i do the setup.

I start with the good things first:
– Data is replicated into 2 two nodes ( 2 nodes will have the same data)
– InfluxDB-relay has a buffer to keep data for failed node. If your node is failed, InfluxDB relay will hold the data for that node, when it’s back online, influxdb will resend the data . You can configure the amount of buffer, the buffer is stored in memory so it shouldn’t be larger than your memory. When the buffer is full, data will be dropped for that node.
– If one of your node is failed, client will not receive any error. Client only receives error (503) when all nodes are down.

And some challenges:
– InfluxDB relay only handle the data insert , it doesn’t handle the query.
– If you have to do some administration tasks like creating database/user/retention, you need to connect to the InfluxDB directly and run that query in each of your data nodes.

Other things to concern:
– If one of your node is down for long time, you want to sync the data. First, you need to back up the data from the good node , Second, restore it to your bad node.
– Data retention: You can do data sampling to save disk space by using continuous query, this is very costly. You have to do it manually for every metrics you want to do sampling. Not at a database level. For example, you have 100 metrics, you need to do 100 continuous queries, that could kill your server. There is also a challenge when you do the query as you need to specify which retention policy in the query.

How about the enterprise ?
– Enterprise will handle the administration tasks from a single place – remember that you need to manually create database/user… in each data nodes.
– You can scale up easily by adding more data nodes to the cluster to have more storage, processing… in the open source one node data holds everything
– Enterprise does not do load balance: you are sending data directly to one of the node, that node will replicate data to other nodes. If that node is down, data will be dropped. To fix this issue, you should have a load balance for your enterprise architect.