Skip to content

Conversation

@pagbabian-splunk
Copy link
Contributor

@pagbabian-splunk pagbabian-splunk commented Aug 30, 2025

Related Issue: #1487

Description of changes:

Currently OCSF events don't explicitly call out the source or type of the event, whether natively produced or mapped from another source. A classic approach to high level classificaiton of telemetry data, or experimental measurement data is to have standard source and type labels. Although metadata has a general labels array, it is more structured to standardize the names for these two dimensions of events. logger and loggers are intended to indicate the chain of custody of events in a pipeline. There is a special case where the "source" of an event comes from a log.

Splunk has host source and sourcetype fields that correspond to the physical origin of the event, (e.g. /var/log/messages) and its combined source and format (e.g. cisco_syslog, Apache access_combined). Given it is implicit that the format of an OCSF event is OCSF, type within metadata is not a format but rather a classification (e.g. firewall traffic logs).

These are optional attributes.

In the process of analyzing the gaps, I noticed that the log_* attributes have evolved in an ambiguous way. Specifically it isn't clear whether the log in question is a source log or a consumer log (e.g. a SIEM). The log_version was an indirect format indicator but with no detail. The logger object and loggers array were not clearly distinguished from what is represented for log_* in metadata.

Hence I modified the descriptions in light of new, explicit log_source and log_format attributes, and updated the description of log_version which then entailed the deprecation of version from loggers in favor of log_version to be consistent.

@zschmerber
Copy link
Contributor

zschmerber commented Aug 30, 2025

We do have the loggers array which calls out where the data came from.
image

@floydtree
Copy link
Contributor

Apart from Loggers, we also have a log_name and log_provider field in metadata today.
Screenshot 2025-09-02 at 10 33 12

With the proposed fields, there will be a lot of overlap in the object. It would be good to improve descriptions of all such log* fields and then evaluate if we still need new source and type fields in metadata.

@pagbabian-splunk
Copy link
Contributor Author

Yes, agreed - however as I read the descriptions for the log*_* attributes and the loggers array, it isn't clear whether the meaning would be the same as source and type in the general case. e.g. log_time was originally not supposed to be a source time but the time the event was received by a logging system. loggers was added due to store-and-forward hops along the way from a source to a destination. Let's discuss this further but I think there is already some ambiguity (and this would not help, admittedly, unless we make things clear).

@mikeradka
Copy link
Contributor

mikeradka commented Sep 2, 2025

The Splunk extension has these within the metadata object. These attributes were added very early on to the extension:

  • source: The source of an event is the data input from which it originates. In the case of data monitored from files and directories, the source consists of the full pathname of the file or directory. In the case of a network-based source, the source field consists of the protocol and port, such as UDP:514.
  • source_type: The source type of an event is the format of the data input from which it originates.

@pagbabian-splunk
Copy link
Contributor Author

pagbabian-splunk commented Sep 3, 2025

As I've been reviewing loggers and the metadata.log* attributes, they are first, very specific to events being stored somewhere, and second, somewhat ambiguous at the metadata level where in an event pipeline they apply. The description for logger as an object is more clear and states anywhere in the pipeline including a source for collection, a destination for "final" storage, or any hop in between.

However, as with the Splunk extension (which mirrors Splunk default fields), there are other types of sources besides log files, and the logger doesn't explicitly describe the type of the event.

We already have two forms of log* attributes (the attributes, and the array of objects) but even then, there is no simple way to determine the source and type of an event. Even when there is overlap of a source (e.g. it comes from a log file) there are no specifics about the type of event (or even its format, as log_version is an indirect indicator).

For example, Splunk has different sourcetype values for the same source e.g. Apache logs have access_combined types which are NCSA format web server logs, and apache_error types for the standard Apache web server. In this example, the two types come from two different log files.

There is a cisco_syslog type and a cisco_asa type, the latter not necessarily coming from a log file. Similarly there are two types of linux events, those coming from the standard syslog (/var/log/messages - a log file source with a syslog type), but also a linux_secure type of authentication events.

Etc. Therefore, whether redundant or not, IMO log* attributes are not sufficient to express what source and type of any event expresses (and they are quite open to potential overlap, e.g. with product.name etc. I admit).

…d log_format to metadata and loggers. Added transmit_time to metadata. Deprecated version in loggers in favor of the added log_version.

Signed-off-by: Paul Agbabian <[email protected]>
@pagbabian-splunk pagbabian-splunk changed the title Added source and type to the metadata object. Added source and type to the metadata object. Updated the log_* attributes and descriptions for consistency. Sep 5, 2025
@pagbabian-splunk pagbabian-splunk added enhancement New feature or request description_updates Issues related to missing/incorrect/lacking descriptions of attributes deprecation A schema artifact is being deprecated v1.7.0 labels Sep 9, 2025
Aniak5
Aniak5 previously approved these changes Oct 1, 2025
Aniak5
Aniak5 previously approved these changes Oct 16, 2025
…ions of correlation_uid and uid.

Signed-off-by: Paul Agbabian <[email protected]>
…ved the deprecation of logger.version.

Signed-off-by: Paul Agbabian <[email protected]>
…ictionary. Tweaked some metadata log descriptions for source and type.

Signed-off-by: Paul Agbabian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deprecation A schema artifact is being deprecated description_updates Issues related to missing/incorrect/lacking descriptions of attributes enhancement New feature or request v1.7.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants