14. Sagan & JSON

14.1. Why JSON?

Sagan has traditionally been a syslog analysis and parsing engine. Over time, more and more platforms have been switching to JSON as an output option. Not just traditional syslog data sources but non-traditional sources like APIs and “cloud” platforms. The good side of this is the data becomes more structured and now has more context. Unfortunately, traditional Sagan rules weren’t built to process this data.

The goal of Sagan is to keep the traditional syslog parsing in place and to add on JSON keyword rule options and functionality. Sagan is about processing log data, regardless of the source. This means that in many cases it is important for Sagan to properly handle JSON.

14.2. Different method of JSON input

Sagan can interpret JSON from two locations. From the named pipe (FIFO) input or from a “syslog message”.

The first methods general options is that Sagan reads incoming JSON data from a named pipe (FIFO). Traditionally, this data is in a pipe delimited format. Pipe delimitation greatly limits the types of data Sagan can process. As of Sagan 2.0.0, Sagan can read JSON data via the named pipe. Most modern day syslog engines (Rsyslog, Syslog-NX, NXlog, etc) support JSON output. See sections 4.2. rsyslog - JSON mode <https://https://sagan.readthedocs.io/en/latest/configuration.html#rsyslog-json-modeg>_ or 4.4. syslog-ng - JSON mode for more information about configuration of various log daemons.

Keep in mind, traditional syslog daemons are no longer the only sources Sagan can collect this type of data from. For example, the IDS engine Suricata (https://suricata-ids.org) produces a lot of JSON data. Various security tool APIs like Cisco Umbrella, AWS Cloudtrail, CrowdStrike Falcon Cloud, etc. also generate a lot of JSON output. These all become possible “sources” for Sagan data processing.

The second method of JSON data collection is via the syslog “message” field. Some syslog “forwarders” use this method to send SIEM data. The idea is that the data is transferred via the traditional syslog transport but the message contains the JSON data. Sagan can interpret that data for alerting purposes.

14.3. JSON “mapping”

Either method you decide to receive the JSON data in, it is likely you will want to “map” the data so that Sagan can properly process it. You can think of mapping this way; When Sagan receives JSON data, it initially has no means to determine what is important and what is not. Sagan “mapping” allows you to assign values so that the engine and signatures can be used. It is also important to understand that different platforms label key/value pairs differently. For example, a source IP address on one platform might be “src_ip”, while on another platform it might be “source_ip”. Mapping allows you to assign the source IP value from the JSON.

Mapping not only allows you to use signature key words (content, pcre, meta_content, etc) but other Sagan features like threshold, after, xbits, etc.

As the name suggests, “mapping” simply allows key/value pairs to be assigned to specific internal Sagan engine values.

Within the Sagan rules are two files. One is json-input.map and the other is json-message.map. These are the mapping files that are used depending on your method of input. These files can be altered to support the JSON mapping you might need.

14.4. Nesting Limitations

Sagan will automatically processes “nested” JSON data. There is a limitation in that Sagan will only keep the last “key” name. To examine this, lets look at a Suricata “nested” JSON line.

{"timestamp":"2019-11-19T20:50:02.856040+0000","flow_id":1221352694083219,"in_iface":"eth0","event_type":"alert","src_ip":"12.12.12.12","dest_ip":"13.13.13.13","proto":"ICMP","icmp_type":8,"icmp_code":0,"alert":{"action":"allowed","gid":1,"signature_id":20000004,"rev":1,"signature":"QUADRANT Ping Packet [ICMP]","category":"Not Suspicious Traffic","severity":3},"flow":{"pkts_toserver":2,"pkts_toclient":0,"bytes_toserver":196,"bytes_toclient":0,"start":"2019-11-19T20:50:01.847507+0000"},"payload":"elXUXQAAAACtDw0AAAAAAE9GVFdJTkstUElOR9raU09GVFdJTkstUElOR9raU09GVFdJTkstUEk=","stream":0,"packet":"VDloD8YYADAYyy0NCABFAABUkEpAAEABniMMnwIKDJHxAQgAk9tJcwACelXUXQAAAACtDw0AAAAAAE9GVFdJTkstUElOR9raU09GVFdJTkstUElOR9raU09GVFdJTkstUEk=","packet_info":{"linktype":1},"host":"firewall"}

Internally, Sagan will “break apart” the nests. However, the keys will not hold there nested key names. For example, the key “timestamp” will be recorded as “timestamp” within Sagan because that is within the first level of nesting.

However, if you look at the “alert” JSON nest, you might expect the nested value “action” to be something like “alert.action”. Internally to Sagan, it will assign it a value of just “action”. This means there might be some issues when dealing with JSON will duplicate key names.

14.5. When mapping is not needed

In most cases, you’ll likely want to performing mapping for your JSON data. However, there are some instances where mapping might not be required. Keep in mind, without mapping things like threshold, after, xbits might not perform properly.

Regardless of whether Sagan properly maps the JSON, it will internally still split the key/value pairs in real time. While you won’t be able to use the standard Sagan rule operators (ie - content, pcre, etc) you will be able use some JSON specific operators.

These are json_content, json_pcre and json_meta_content. With these, you can specify the key you want to process and then what you are searching for.

This becomes useful when used in conjunction with mapping. This way you can use traditional Sagan keywords (threshold, after, content, etc) along with JSON specific (json_content, json_pcre, etc) rule options.

14.6. Mappable JSON Fields

While not all JSON field can be internally mapped, these are the Sagan internal fields that should be consider. Each field has different functionality internally to Sagan. For example, if you want to apply rule operators like threshold or after in a signature, you’ll likely want to map src_ip and/or dst_ip. The following are internal Sagan variables/mappings to consider for mapping.

Fields to consider for internal JSON mappings are as follows.

src_ip

This value will become source IP address of the event. This will apply to rule options like threshold, after, xbits, flexbits, etc.

dst_ip

This value will become the destination IP address of the event. This can also be represented as dest_ip. This will apply to rule options like threshold, after, xbits, flexbits, etc.

src_port

JSON data for this will become the source port of the event. This will apply to rule options like flexbits.

dst_port

JSON data for this will become the destination port for the event. This will apply to rule options like flexbits. This can also be represented as dest_port.

message

The JSON for this value will becoming the syslog message. This will apply to rule options like content, pcre, meta_content, parse_src_ip, parse_dst_ip, parse_hash, etc.

event_id

The JSON data will be applied to the event_id rule option.

proto

This will represent the protocol. Valid options are TCP, UDP and ICMP (case insensitive).

facility

The JSON data will be mapped to the syslog facility. This will apply to the rule option facility.

level

The JSON data will be mapped to the internal Sagan variable level. This will apply to the rule option level.

tag.

The JSON data will be mapped to the internal Sagan variable of tag. This will apply to the rule option tag.

syslog-source-ip

The JSON data will be mapped to the internally to Sagan’s syslog source. This should not be confused with src_ip. If src_ip is not present, the syslog-source-ip become the src-ip. This might apply to threshold and after is src_ip is not populated.

event_type

The JSON data extracted will be applied internally to the Sagan variable of “program”. event_type is simply an alias for program and both can be interchanged. This applies to rule options like program and event_type.

program

The JSON data extracted will be applied internally to the Sagan variable of “program”. program is simply an alias for event_type and both can be interchanged. This applies to rule options like program and event_type.

time

The JSON data extracted will be applied internally to the syslog “time” stamp. This option is recorded but is not used in any rule options.

date

The JSON data extracted will be applied internally to the syslog “date” stamp. This option is recorded but is not used in any rule options.

14.7. JSON via named pipe (FIFO)

Mapping for JSON data coming in via the named pipe (FIFO) is configured in the sagan-core section under input-type. Two types are available, json and pipe. If pipe is used, the sections below (json-map & json-software) are ignored.

# Controls how data is read from the FIFO. The "pipe" setting is the traditional
# way Sagan reads in events and is default. "json" is more flexible and
# will become the default in the future. If "pipe" is set, "json-map"
# and "json-software" have no function.::

input-type: json                       # pipe or json
json-map: "$RULE_PATH/json-input.map"  # mapping file if input-type: json
json-software: syslog-ng               # by "software" type.

The json-map function informs the Sagan engine where to locate the mapping file. This is a file that is shipped with the Sagan rule set and already has some mappings within it. The next option is the json-software type. The json-input.map typically contains more than one mapping type. The json-software tells Sagan which mapping to use from that file. A typically mapping for Syslog-NG looks like this:

{"software":"syslog-ng","syslog-source-ip":"SOURCEIP","facility":"FACILITY","level":"PRIORITY","priority":"PRIORITY","time":"DATE","date":"DATE","program":"PROGRAM","message":"MESSAGE"}

These are key/value pairs. The first option (ie - message, program, etc) is the internal Sagan engine value. The value to the key is what Syslog-NG names the key.

When Sagan starts up, it will parse the json-input.map for the software type of “syslog-ng”. If the software of “syslog-ng” is not found, Sagan will abort.

When located, Sagan will expect data via the named pipe to be in the mapped JSON format. Data that is not in this format will be dropped. To understand mapping better, below is an example of JSON via the named pipe that Sagan might receive:

{"TAGS":".source.s_src","SOURCEIP":"127.0.0.1","SEQNUM":"437","PROGRAM":"sshd","PRIORITY":"notice","Authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=49.88.112.77  user=root","LEGACY_M"dev-2","HOST":"dev-2","FACILITY":"authpriv","DATE":"Jan  2 20:12:36"}

As we can see, Syslog-NG maps the syslog “message” field as “MESSAGE”. The Sagan engine takes that data and internally maps it to the “message” value. It repeats this through the rest of the mapping.

Mapping this way becomes a more convient and flexible method of getting data into Sagan than the old “pipe delimited” format.

Note: When processing JSON via the named pipe, only one mapping can be used at a time.

14.8. JSON via syslog message field

The mapping concept for Sagan when receiving JSON data via the syslog “message” is similar to JSON data via the named pipe.

Unlike JSON data via the named pipe, when receiving data via a syslog “message” multiple maps can be applied. The idea is that your Sagan system might be receiving different types of JSON data from different systems.

To determine which “map” works best, the Sagan engine does an internal “scoring” of each map. The idea is that Sagan will apply the best map that matches the most fields. This also means that sometimes mapping fields, even if you don’t plan on using them, will ensure that the proper map “wins”. Even if you don’t plan on internally using the field, the more fields that “match” will mean an increased score. This will help ensure you don’t have two mappings competing over data.

To enabled JSON syslog message processing, you will need to enable the following fields within the sagan-core part of the sagan.yaml.

# "parse-json-message" allows Sagan to detect and decode JSON within a
# syslog "message" field.  If a decoder/mapping is found,  then Sagan will
# extract the JSON values within the messages.  The "parse-json-program"
# tells Sagan to start looking for JSON within the "program" field.  Some
# systems (i.e. - Splunk) start JSON within the "program" field and
# into the "message" field.  This option tells Sagan to "append" the
# strings together (program+message) and then decode.  The "json-message-map"
# tells Sagan how to decode JSON values when they are encountered.

parse-json-message: enabled
parse-json-program: enabled
json-message-map: "$RULE_PATH/json-message.map"

The parse-json-message configures Sagan to automatically detect JSON within the syslog “message” field. The parse-json-program configures Sagan to automatically detect JSON within the syslog “program” field.

Some applications will send the start of the JSON within the “program” field and it will overflow into the “message” field. The parse-json-program option configures Sagan to look for JSON within the “program” field and append the “program” and “message” field if JSON detected.

The json-message-map contains the mappings for systems that might be sending you JSON. As with the json-input.map, the Sagan rule sets come with a json-message.map.

An example mapping:

{ "software":"suricata", "syslog-source-ip":"src_ip","src_ip":"src_ip","dest_ip":"dest_ip","src_port":"src_port","dest_port":"dest_port","message":"signature,category,severity","event_type":"hash","time":"timestamp","date":"timestamp", "proto":"proto" }

Unlike named pipe JSON mapping, the “software” name is not used other than for debugging. When Sagan receives JSON data, it will apply all mapping to found in the json-message.map file.

Make a note of the “message” field. This shows the message being assigned multiple key value pairs. In this case the key “signature”,”category” and “severity” will be become the “message”. Internally, the “message” will become “key:value,key:value,key:value”. For example, let say the JSON Sagan is processing is the follow Suricata JSON line:

{"timestamp":"2020-01-03T18:20:05.716295+0000","flow_id":812614352473482,"in_iface":"eth0","event_type":"alert","src_ip":"12.12.12.12","dest_ip":"13.13.13.13","proto":"ICMP","icmp_type":8,"icmp_code":0,"alert":{"action":"allowed","gid":1,"signature_id":20000004,"rev":1,"signature":"QUADRANT Ping Packet [ICMP]","category":"Not Suspicious Traffic","severity":3},"flow":{"pkts_toserver":5,"pkts_toclient":0,"bytes_toserver":490,"bytes_toclient":0,"start":"2020-01-03T18:20:01.691594+0000"},"payload":"1YUPXgAAAADM7QoAAAAAAE9GVFdJTkstUElOR9raU09GVFdJTkstUElOR9raU09GVFdJTkstUEk=","stream":0,"packet":"VDloD8YYADAYyy0NCABFAABUCshAAEABI6YMnwIKDJHxAQgAHoELvAAF1YUPXgAAAADM7QoAAAAAAE9GVFdJTkstUElOR9raU09GVFdJTkstUElOR9raU09GVFdJTkstUEk=","packet_info":{"linktype":1},"host":"firewall"}

Internally to Sagan the “message” will become:

signature:QUADRANT Ping Packet [ICMP],category:Not Suspicious Traffic,severity:3

This means any signatures you are going to create will need to take this format into account. In cases where you would like the entire JSON string to become the message, simply make the “message” mapping “%JSON%” (no quotes). This tells Sagan that the entire JSON string should be considered the message.