Overview 概述
Metrics 指标
The purpose of metrics is to inform observers about the health & operations regarding a component or system. A metric represents a point in time measurement of a particular source, and data-wise it tends to be very small. The compact size allows for efficient collection even at scale in large systems. Metrics also lend themselves very well to pre-aggregation within the component before collection, reducing computation cost for processing & storing large numbers of metric time series in a central system. Due to how efficiently metrics are processed & stored, it lends itself very well for use in automated alerting, as metrics are an excellent source for the health data for all components in the system.指标的目的是告知观察者关于组件或系统的健康状况和运营情况。指标代表某个特定源的时间点测量值,从数据角度来看它们通常非常小。紧凑的大小使得即使在大型系统中也能高效收集。指标也非常适合在收集之前在组件内部进行预聚合,从而降低在中央系统中处理和存储大量指标时间序列的计算成本。由于指标的高效处理和存储方式,它们非常适合用于自动化报警,因为指标是系统中所有组件健康数据的优秀来源。
Logs 日志
Log data inform observers about the discrete events that occurred within a component or a set of components. Just about every software component logs information about its activities over time. This rich data tends to be much larger than metric data and can cause processing issues, especially if components are logging too verbosely. Therefore, using log data to understand the health of an extensive system tends to be avoided and depends on metrics for that data. Once metric telemetry highlights potential problem sources, filtered log data for those sources can be used to understand what occurred.日志数据告知观察者组件或组件集合内发生的离散事件。几乎每个软件组件都会记录其活动的相关信息。这些丰富的数据往往比指标数据大得多,并可能导致处理问题,特别是当组件记录过于冗长时。因此,使用日志数据来了解复杂系统的健康状况往往会被避免,而更依赖于指标数据。一旦指标遥测突出了潜在的问题源,就可以使用这些源的过滤日志数据来了解发生了什么。
Traces 跟踪
Whereas logging provides an overview to a discrete, event-triggered log, tracing encompasses a much wider, continuous view of an application. The goal of tracing is to following a program’s flow and data progression.而日志提供了一个离散的、事件触发的日志概述,跟踪则包含了应用程序的一个更广泛、连续的视图。跟踪的目标是跟踪程序的流程和数据进程。
In many instances, tracing represents a single user’s journey through an entire app stack. Its purpose isn’t reactive, but is instead focused on optimization. By tracing through a stack, developers can identify bottlenecks and focus on improving performance.在许多情况下,跟踪代表了一个用户在整个应用程序堆栈中的旅程。它的目的不是反应性的,而是专注于优化。通过跟踪堆栈,开发人员可以识别瓶颈并专注于提高性能。
A distributed trace is defined as a collection of spans. A span is the smallest unit in a trace and represents a piece of the workflow in a distributed landscape. It can be an HTTP request, call to a database, or execution of a message from a queue.分布式跟踪被定义为一组跨度的集合。跨度是跟踪中最小的单元,代表分布式环境中工作流程的一部分。它可以是HTTP请求、对数据库的调用或从队列执行消息。
When a problem does occur, tracing allows you to see how you got there:当出现问题时,跟踪可以让您了解问题的来源:
- Which function. 哪个函数。
- The function’s duration. 函数的持续时间。
- Parameters passed. 传递的参数。
- How deep into the function the user could get.用户可以进入函数的深度。
Usage Guidance 使用指南
When to use metric or log data to track a particular piece of telemetry can be summarized with the following points:使用指标或日志数据跟踪特定遥测数据的时机可以总结如下:
- Use metrics to track the occurrence of an event, counting of items, the time taken to perform an action or to report the current value of a resource (CPU, memory, etc.)使用指标跟踪事件的发生、项目的计数、执行操作的时间或报告资源(CPU、内存等)的当前值。
- Use logs to track detailed information about an event also monitored by a metric, particularly errors, warnings or other exceptional situations.使用日志跟踪由指标监控的事件的详细信息,特别是错误、警告或其他异常情况。
- A trace provides visibility into how a request is processed across multiple services in a microservices environment. Every trace needs to have a unique identifier associated with it.跟踪提供了在微服务环境中跨多个服务处理请求的可见性。每个跟踪都需要有一个唯一的标识符与之关联。