Monitoring - Splunk

Splunk: The power of regular expression

Let’s walk through how to extract meaningful fields like IP address, port, error level, and message content from a raw PHP warning log using regular expressions and Splunk’s rex command.

Step 1: The Raw Log

Here’s a sample of the raw log we’re working with:

[client 104.23.211.100:21636] PHP Warning: Cannot assign an empty string to a string offset in /www/website.com/httpdocs/wp-includes/user.php on line 41

From this log, we want to extract:

  • IP Address

  • Port

  • Error Level (e.g., Warning, Notice, etc.)

  • Full message content after the error level


Step 2: Crafting the Regex Pattern

We used regexr.com to help test and refine our regular expression. After experimenting, we arrived at this pattern:


\[client\s(.*):\d+]\sPHP\s(\w+):\s+(.*)

But to use this effectively in Splunk and extract named fields, we need to refactor the regex using Splunk’s capture group syntax:

 
\[client\s(?<ip>.*):(?<port>\d+)]\sPHP\s(?<level>\w+):\s+(?<message>.*)

Let’s break it down:

  • (?<ip>.*) captures the IP address.

  • (?<port>\d+) captures the port number.

  • (?<level>\w+) captures the PHP error level (e.g., Warning).

  • (?<message>.*) captures the rest of the message.


Step 3: The Splunk Query

Here’s how we use this in a Splunk search:

 
index=web | rex field=_raw "\[client\s(?<ip>.*):(?<port>\d+)]\sPHP\s(?<level>\w+):\s+(?<message>.*)" | table ip, port, level, message, source | search ip=* | stats count by message, source

What’s Happening Here:

  • rex extracts the fields using the regex pattern.

  • table presents the extracted data in a readable format.

  • search ip=* filters events that contain an IP.

  • stats count by message, source aggregates the data to show how many times each message occurred per source.


A Quick Note on Regex Syntax in Splunk

When working with regex in Splunk, capturing groups should follow this syntax:

 
(?<field_name>pattern)

This tells Splunk to assign the matched content to the field field_name.


Final Thoughts

Regex might look intimidating at first, but tools like Regexr make it easier to visualize and test. With consistent practice, crafting expressions and extracting fields in Splunk becomes second nature — an essential skill for log analysis and troubleshooting.

So keep practicing, and soon enough, writing regex in Splunk will feel as routine as checking your logs.