Here’s a short breakdown of MELT, which stands for metrics, events, logs, and traces. These are the four most basic data types used for network and system telemetry. There are other data types that are widely used and very useful in a robust observability solution, though I would argue some are just another form of one of these general data types.
These data types are used in system monitoring in general, but are very valuable in observability. Metrics, events, logs, and traces have most commonly been the cornerstone of application and system observability, but today they also form key components of the telelemtry used for network observability.
Metrics are data, usually some sort of measurement in numerical format, collected regularly from a device. They’re defined by a dimension, like memory usage, CPU utilization, or in networking, CRC errors, packet loss, or retransmissions. By and large, the colorful graphs and charts we see are made up of metrics.
Events are specific things that happened at a single moment in time, and not necessarily regularly. That could be something like a change in the system state such as an interface going down, a system restarting, an access rule blocking traffic, and so on. Events will usually have an event type associated with them as an identifier.
Logs are simply timestamped data generated by a system when code is executed. This can be in almost any data structure and take almost any form depending on the system that generated the log. On the surface it may seem there’s overlap between events and logs, but just remember that a log is the actual text the system generates.
Traces, which are a little harder to do but are super interesting, are a breakdown of a user request and all the services that are involved in fulfilling it. Think of a waterfall chart with the request at the top kicking off a chain of events including each service involved in fulfilling the request. This could include both frontend and backend activity. Traces, commonly referred to as distributed traces, area a useful tool to pinpoint exactly where application delivery is broken or slow, especially in today’s microservices architectures.
These are broad and high level categories, and you may notice some overlap among them. And as you look at one system, an event, log, or metrics could appear very different than it would for another system. As a simple example, many of the metrics gleaned from a router would be different from the metrics gleaned from a CentOS server.
Leave a Reply