15. Sagan & JSON

15.1. Why JSON?

Sagan has traditionally been a syslog analysis and parsing engine. Over time, more and more platforms have been switching to JSON as an output option. Not just traditional syslog data sources but non-traditional sources like APIs and "cloud" platforms. The good side of this is the data becomes more structured and now has more context. Unfortunately, traditional Sagan rules weren't built to process this data.

The goal of Sagan is to keep the traditional syslog parsing in place and to add on JSON keyword rule options and functionality. Sagan is about processing log data, regardless of the source. This means that in many cases it is important for Sagan to properly handle JSON.

15.2. Different method of JSON input

Sagan can interpret JSON from two locations. From the named pipe (FIFO) or from a "syslog message".

The first methods is that Sagan reads incoming JSON data from a named pipe (FIFO). Traditionally, this data is in a "pipe" (|) delimited format. The "pipe" delimitation greatly limits the types of data Sagan can process. As of Sagan 2.0.0, Sagan can read JSON data via the named pipe. Most modern day syslog engines (Rsyslog, Syslog-NX, NXlog, etc) support JSON output. See sections 4.2. rsyslog - JSON mode <https://https://sagan.readthedocs.io/en/latest/configuration.html#rsyslog-json-modeg>_ or 4.4. syslog-ng - JSON mode for more information about configuration of various log daemons.

With this in mind, this means that Sagan can collect data from non-syslog sources. For example, the IDS engine Suricata (https://suricata-ids.org) produces a lot of JSON data. Various security tool APIs like Cisco Umbrella, AWS Cloudtrail, CrowdStrike Falcon Cloud, etc. also generate a lot of JSON output. These all become possible "sources" for Sagan data processing.

The second method of JSON data collection is via the syslog "message" field. Some syslog "forwarders" use this method to send SIEMs data. The idea is that the data is transferred via the traditional syslog transport but the message contains the JSON data. Sagan can interpret that data for alerting purposes.

15.3. JSON "mapping"

Either method you decide to receive the JSON data in, it is likely you will want to "map" the data so that Sagan can properly process it. You can think of mapping this way; When Sagan receives JSON data, it doesn't know what is "important" and what isn't. "Mapping" allows you to assign values to the data so the engine can process it and signatures can be used. It is also important to understand that different platforms label key/value pairs differently. For example, a source IP address on one platform might be "src_ip", while on another platform it might be "source_ip". Mapping allows you to assign the "source" IP value from the JSON.

"Mapping" allows you to use signature keys words like content, pcre, meta_content, etc. and features like threshold, after, xbits, etc.

Simply put, "Mapping" allows you to assign JSON "key" data to specific internal Sagan values.

Within the Sagan rules are two files. One is json-input.map and the other is json-message.map. These are the mapping files that are used depending on your method of input. These files can be altered to support the JSON mapping you might need and come with some example mapping.

In some cases, "mapping" might be over kill and can be skipped. See When mapping is not needed.

15.4. How JSON nest are processed

Sagan will automatically "flatten" nests. For example, let say you want to process the

following JSON format.

{"timestamp":"2019-11-19T20:50:02.856040+0000","flow_id":1221352694083219,"in_iface":"eth0","event_type":"alert","src_ip":"12.12.12.12","dest_ip":"13.13.13.13","proto":"ICMP","icmp_type":8,"icmp_code":0,"alert":{"action":"allowed","gid":1,"signature_id":20000004,"rev":1,"signature":"QUADRANT Ping Packet [ICMP]","category":"Not Suspicious Traffic","severity":3},"flow":{"pkts_toserver":2,"pkts_toclient":0,"bytes_toserver":196,"bytes_toclient":0,"start":"2019-11-19T20:50:01.847507+0000"},"payload":"elXUXQAAAACtDw0AAAAAAE9GVFdJTkstUElOR9raU09GVFdJTkstUElOR9raU09GVFdJTkstUEk=","stream":0,"packet":"VDloD8YYADAYyy0NCABFAABUkEpAAEABniMMnwIKDJHxAQgAk9tJcwACelXUXQAAAACtDw0AAAAAAE9GVFdJTkstUElOR9raU09GVFdJTkstUElOR9raU09GVFdJTkstUEk=","packet_info":{"linktype":1},"host":"firewall"}

All nest, including the top nest, start with a .. For example, the JSON key "timestamp" will become .timestamp

internally to Sagan. The "event_type" and "src_ip" would become .event_type and .src_ip. For nested objects like "alert", you would access the "signature_id" as .alert.signature_id. This structure is similar to JSON processing commands like jq.

There is no limitations on nest depths. This logic applies for JSON "mapping" and Sagan signature keywords like json_content,

json_pcre and json_meta_content.

15.5. When mapping is not needed

In most cases, you'll likely want to performing mapping for your JSON data. However, there are some instances where mapping might not be required. Keep in mind, without mapping things like threshold, after, xbits might not perform properly.

Regardless of whether Sagan properly maps the JSON, it will internally still split the key/value pairs in real time. While you won't be able to use the standard Sagan rule operators (ie - content, pcre, etc) you will be able use some JSON specific operators.

These are json_content, json_pcre and json_meta_content. With these, you can specify the key you want to process and then what you are searching for.

This can be useful when used in conjunction with mapping. This way you can use traditional Sagan keywords (threshold, after, content, etc) along with JSON specific (json_content, json_pcre, etc) rule options.

15.6. Mappable JSON Fields

While not all JSON field can be internally mapped, these are the Sagan internal fields that should be consider. Each field has different functionality internally to Sagan. For example, if you want to apply rule operators like threshold or after in a signature, you'll likely want to map src_ip and/or dst_ip. The following are internal Sagan variables/mappings to consider for mapping.

Fields to consider for internal JSON mappings are as follows.

src_ip

This value will become source IP address of the event. This will apply to rule options like threshold, after, xbits, flexbits, etc.

dst_ip

This value will become the destination IP address of the event. This can also be represented as dest_ip. This will apply to rule options like threshold, after, xbits, flexbits, etc.

src_port

JSON data for this will become the source port of the event. This will apply to rule options like flexbits.

dst_port

JSON data for this will become the destination port for the event. This will apply to rule options like flexbits. This can also be represented as dest_port.

message

The JSON for this value will becoming the syslog message. This will apply to rule options like content, pcre, meta_content, parse_src_ip, parse_dst_ip, parse_hash, etc.

event_id

The JSON data will be applied to the event_id rule option.

proto

This will represent the protocol. Valid options are TCP, UDP and ICMP (case insensitive).

facility

The JSON data will be mapped to the syslog facility. This will apply to the rule option facility.

level

The JSON data will be mapped to the internal Sagan variable level. This will apply to the rule option level.

tag.

The JSON data will be mapped to the internal Sagan variable of tag. This will apply to the rule option tag.

syslog-source-ip

The JSON data will be mapped to the internally to Sagan's syslog source. This should not be confused with src_ip. If src_ip is not present, the syslog-source-ip become the src-ip. This might apply to threshold and after is src_ip is not populated.

event_type

The JSON data extracted will be applied internally to the Sagan variable of "program". event_type is simply an alias for program and both can be interchanged. This applies to rule options like program and event_type.

program

The JSON data extracted will be applied internally to the Sagan variable of "program". program is simply an alias for event_type and both can be interchanged. This applies to rule options like program and event_type.

time

The JSON data extracted will be applied internally to the syslog "time" stamp. This option is recorded but is not used in any rule options.

date

The JSON data extracted will be applied internally to the syslog "date" stamp. This option is recorded but is not used in any rule options.

15.7. JSON via named pipe (FIFO)

Mapping for JSON data coming in via the named pipe (FIFO) is configured in the sagan-core section under input-type. Two types are available, json and pipe. If pipe is used, the sections below (json-map & json-software) are ignored.

# Controls how data is read from the FIFO. The "pipe" setting is the traditional
# way Sagan reads in events and is default. "json" is more flexible and
# will become the default in the future. If "pipe" is set, "json-map"
# and "json-software" have no function.::

input-type: json                       # pipe or json
json-map: "$RULE_PATH/json-input.map"  # mapping file if input-type: json
json-software: syslog-ng               # by "software" type.

The json-map function informs the Sagan engine where to locate the mapping file. This is a file that is shipped with the Sagan rule set and already has some mappings within it. The next option is the json-software type. The json-input.map typically contains more than one mapping type. The json-software tells Sagan which mapping to use from that file. A typically mapping for Syslog-NG looks like this:

{"software":"syslog-ng","syslog-source-ip":".SOURCEIP","facility":".FACILITY","level":".PRIORITY","priority":".PRIORITY","time":".DATE","date":".DATE","program":".PROGRAM","message":".MESSAGE"}

These are key/value pairs. The first option (ie - message, program, etc) is the internal Sagan engine value. The value to the key is what Syslog-NG names the key.

When Sagan starts up, it will parse the json-input.map for the software type of "syslog-ng". If the software of "syslog-ng" is not found, Sagan will abort.

When located, Sagan will expect data via the named pipe to be in the mapped JSON format. Data that is not in this format will be dropped. To understand mapping better, below is an example of JSON via the named pipe that Sagan might receive:

{"TAGS":".source.s_src","SOURCEIP":"127.0.0.1","SEQNUM":"437","PROGRAM":"sshd","PRIORITY":"notice","Authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=49.88.112.77  user=root","LEGACY_M"dev-2","HOST":"dev-2","FACILITY":"authpriv","DATE":"Jan  2 20:12:36"}

As we can see, Syslog-NG maps the syslog "message" field as ".MESSAGE". The Sagan engine takes that data and internally maps it to the "message" value. It repeats this through the rest of the mapping.

Mapping this way becomes a more convient and flexible method of getting data into Sagan than the old "pipe delimited" format.

Note: When processing JSON via the named pipe, only one mapping can be used at a time.

15.8. JSON via syslog message field

The mapping concept for Sagan when receiving JSON data via the syslog "message" is similar to JSON data via the named pipe.

Unlike JSON data via the named pipe, when receiving data via a syslog "message" multiple maps can be applied. The idea is that your Sagan system might be receiving different types of JSON data from different systems.

To determine which "map" works best, the Sagan engine does an internal "scoring" of each map. Sagan will then apply the best map that matches the most fields. This means that you might want to "map" fields event if you don't plan on using them. This ensures that the proper "map" will "win" (score the highest).

To enabled JSON syslog message processing, you will need to enable the following fields within the sagan-core part of the sagan.yaml.

# "parse-json-message" allows Sagan to detect and decode JSON within a
# syslog "message" field.  If a decoder/mapping is found,  then Sagan will
# extract the JSON values within the messages.  The "parse-json-program"
# tells Sagan to start looking for JSON within the "program" field.  Some
# systems (i.e. - Splunk) start JSON within the "program" field and
# into the "message" field.  This option tells Sagan to "append" the
# strings together (program+message) and then decode.  The "json-message-map"
# tells Sagan how to decode JSON values when they are encountered.

parse-json-message: enabled
parse-json-program: enabled
json-message-map: "$RULE_PATH/json-message.map"

The parse-json-message configures Sagan to automatically detect JSON within the syslog "message" field. The parse-json-program configures Sagan to automatically detect JSON within the syslog "program" field.

Some applications will send the start of the JSON within the "program" field and it will overflow into the "message" field. The parse-json-program option configures Sagan to look for JSON within the "program" field and append the "program" and "message" field if JSON detected.

The json-message-map contains the mappings for systems that might be sending you JSON. As with the json-input.map, the Sagan rule sets come with a json-message.map.

An example mapping:

{ "software":"suricata", "syslog-source-ip":".src_ip","src_ip":".src_ip","dest_ip":".dest_ip","src_port":".src_port","dest_port":".dest_port","message":".alert.signature,.alert_category,.alert.severity","event_type":".hash","time":".timestamp","date":".timestamp", "proto":".proto" }

Unlike named pipe JSON mapping, the "software" name is not used other than for debugging. When Sagan receives JSON data, it will apply all mapping to found in the json-message.map file.

Note of the “message” field. This shows the "message" being assigned multiple key values. In this case the key “.alert.signature”,”.alert.category” and “.alert.severity” will be become the “message”. Internally to Sagan, the “message” will become “key:value,key:value,key:value”. For example, let say the JSON Sagan is processing is the follow Suricata JSON line:

{"timestamp":"2020-01-03T18:20:05.716295+0000","flow_id":812614352473482,"in_iface":"eth0","event_type":"alert","src_ip":"12.12.12.12","dest_ip":"13.13.13.13","proto":"ICMP","icmp_type":8,"icmp_code":0,"alert":{"action":"allowed","gid":1,"signature_id":20000004,"rev":1,"signature":"QUADRANT Ping Packet [ICMP]","category":"Not Suspicious Traffic","severity":3},"flow":{"pkts_toserver":5,"pkts_toclient":0,"bytes_toserver":490,"bytes_toclient":0,"start":"2020-01-03T18:20:01.691594+0000"},"payload":"1YUPXgAAAADM7QoAAAAAAE9GVFdJTkstUElOR9raU09GVFdJTkstUElOR9raU09GVFdJTkstUEk=","stream":0,"packet":"VDloD8YYADAYyy0NCABFAABUCshAAEABI6YMnwIKDJHxAQgAHoELvAAF1YUPXgAAAADM7QoAAAAAAE9GVFdJTkstUElOR9raU09GVFdJTkstUElOR9raU09GVFdJTkstUEk=","packet_info":{"linktype":1},"host":"firewall"}

Internally to Sagan the "message" will become:

.alerts.ignature:QUADRANT Ping Packet [ICMP],.alert.category:Not Suspicious Traffic,alert.severity:3

This means any signatures you are going to create will need to take this format into account. In cases where you would like the entire JSON string to become the message, simply make the "message" mapping %JSON%. This tells Sagan that the entire JSON string should be considered the "message".