Turning your Splunk log data into a timeseries database – InfluxDB

Let say you have a load balancer with few thousand sites, the load balancer keeps track the response time from the backend server. Your log is saved in Splunk. Now, the application team comes to you and ask the site performance in the last 30 days. You think it’s easy, yes, when you start with last hour , the results come out within a minute. But when you do last 30 days, it’s a such a pain because the data is too large. Since last one hour seems to be OK, why don’t we dump that result data into some other faster database? Since it’s just number , it means it’s a metric. So, i think InfluxDB will be a good choice , i know you can store in Splunk as a metric as well, but it’s not very straight forward. The query below assumes that you already have InfluxDB running , you also have curl command app installed in your splunk environment (search splunk web addons)

Here is the query, after you test this query and see data in your Influx, you can set it up as a scheduled search or report ,  it will push these data into InfluxDB based on your schedule.

sourcetype="lb_source_name" HTTP_RESPONSE  status=* site=**
| table _time,site,status,response_time
|  timechart span=5m limit=0 avg(response_time) by site 
| foreach *
     [eval <<FIELD>>="lb_backend_response,type=response_time,site="+ "<<FIELD>>"+ " " +"value=" + '<<FIELD>>'+" "+_time+"000000000"] 
 |  stats list(*) as * 
 | foreach *
     [eval <<FIELD>>=mvjoin('<<FIELD>>',"XXX")] 
| eval final_data=""
| foreach *
     [eval final_data=final_data+"XXX"+'<<FIELD>>'] 
| table final_data 
| rex field=final_data mode=sed "s/XXX/\n/g"
| eval final_data=trim(final_data)  
 ```Send it to influx ```
 | eval header="{\"Authorization\":\"Bearer Your token\"}"
| curl method=post  uri="https://yourinfluxdb/api/v2/write?org=YOURORG&bucket=splunk&precision=ns"   headerfield=header datafield=final_data 
|  fields curl*

What it does is:

  • extract http requests log
  • get the 5m average  for every site
  • format each site data in influxdb format
  • join all results into one data field
  • send it to influxdb

What i learnt:

–  timechart should have limit=0 so that all sites will become a column – Splunk only chart the first 10 data

– foreach command   ‘<<FIELD>>’   (single quote) is the value of the column

“<<FIELD>>”  (double quote) name of the field

– | rex field=final_data mode=sed “s/XXX/\n/g”    , replace XXX with new line

 

The solution here is just to show you that you can turn your Splunk log metrics data into a real time series metric data, it needs some more work to make it work perfectly in your environment.

Leave a Reply

Your email address will not be published. Required fields are marked *