使用 Prometheus 指标监控 Caddy

无论您是在云中运行数千个 Caddy 实例，还是在嵌入式设备上运行单个 Caddy 服务器，您可能都希望在某个时候了解 Caddy 正在做什么，以及它花费了多长时间。换句话说，您希望能够_监控_ Caddy。

启用指标

您需要打开指标功能。

如果使用 Caddyfile，请在全局选项中启用指标：

{
	metrics
}

如果使用 JSON，请在apps > http > servers 配置中添加 "metrics": {}。

要添加每个主机的指标，您可以插入 per_host 选项。特定主机的指标现在将有一个 Host 标签。

{
	metrics {
		per_host
	}
}

Prometheus

Prometheus 是一个监控平台，通过抓取这些目标上的指标 HTTP 端点来从受监控的目标收集指标。除了帮助您使用 Grafana 等仪表板工具显示指标外，Prometheus 还用于告警。

与 Caddy 一样，Prometheus 是用 Go 编写的，并作为单个二进制文件分发。要安装它，请参阅 Prometheus 安装文档，或在 MacOS 上只需运行 brew install prometheus。

如果您是 Prometheus 的新手，请阅读 Prometheus 文档，否则请继续阅读！

要配置 Prometheus 从 Caddy 抓取指标，您需要一个类似这样的 YAML 配置文件：

# prometheus.yaml
global:
  scrape_interval: 15s # 默认为 1 分钟

scrape_configs:
  - job_name: caddy
    static_configs:
      - targets: ['localhost:2019']

然后您可以像这样启动 Prometheus：

$ prometheus --config.file=prometheus.yaml

Caddy 的指标

与任何使用 Prometheus 监控的进程一样，Caddy 公开了一个以 Prometheus 展示格式响应的 HTTP 端点。如果协商（即，如果 Accept 头设置为 application/openmetrics-text; version=0.0.1），Caddy 的 Prometheus 客户端也配置为以 OpenMetrics 展示格式响应。

默认情况下，管理 API（即 http://localhost:2019/metrics）上有一个 /metrics 端点可用。但如果管理 API 被禁用或您希望在不同的端口或路径上监听，您可以使用 metrics 处理程序来配置此功能。

您可以使用任何浏览器或 HTTP 客户端（如 curl）查看指标：

$ curl http://localhost:2019/metrics
# HELP caddy_admin_http_requests_total 对管理 API 的 HTTP 端点的请求计数器。
# TYPE caddy_admin_http_requests_total counter
caddy_admin_http_requests_total{code="200",handler="metrics",method="GET",path="/metrics"} 2
# HELP caddy_http_request_duration_seconds 往返请求持续时间的直方图。
# TYPE caddy_http_request_duration_seconds histogram
caddy_http_request_duration_seconds_bucket{code="308",handler="static_response",method="GET",server="remaining_auto_https_redirects",le="0.005"} 1
caddy_http_request_duration_seconds_bucket{code="308",handler="static_response",method="GET",server="remaining_auto_https_redirects",le="0.01"} 1
caddy_http_request_duration_seconds_bucket{code="308",handler="static_response",method="GET",server="remaining_auto_https_redirects",le="0.025"} 1
...

您会看到许多指标，它们大致分为 4 类：

运行时指标
管理 API 指标
HTTP 中间件指标
反向代理指标

运行时指标

这些指标涵盖 Caddy 进程的内部，由 Prometheus Go 客户端自动提供。它们以 go_* 和 process_* 为前缀。

请注意，process_* 指标仅在 Linux 和 Windows 上收集。

请参阅 Go 收集器、进程收集器和 BuildInfo 收集器的文档。

管理 API 指标

这些是帮助监控 Caddy 管理 API 的指标。每个管理端点都经过检测，以跟踪请求计数和错误。

这些指标以 caddy_admin_* 为前缀。

例如：

$ curl -s http://localhost:2019/metrics | grep ^caddy_admin
caddy_admin_http_requests_total{code="200",handler="admin",method="GET",path="/config/"} 1
caddy_admin_http_requests_total{code="200",handler="admin",method="GET",path="/debug/pprof/"} 2
caddy_admin_http_requests_total{code="200",handler="admin",method="GET",path="/debug/pprof/cmdline"} 1
caddy_admin_http_requests_total{code="200",handler="load",method="POST",path="/load"} 1
caddy_admin_http_requests_total{code="200",handler="metrics",method="GET",path="/metrics"} 3

`caddy_admin_http_requests_total`

管理端点处理的请求数量的计数器，包括 admin.api.* 命名空间中的模块。

标签	描述
`code`	HTTP 状态码
`handler`	处理程序或模块名称
`method`	HTTP 方法
`path`	管理端点挂载到的 URL 路径

`caddy_admin_http_request_errors_total`

管理端点遇到的错误数量的计数器，包括 admin.api.* 命名空间中的模块。

标签	描述
`handler`	处理程序或模块名称
`method`	HTTP 方法
`path`	管理端点挂载到的 URL 路径

HTTP 中间件指标

所有 Caddy HTTP 中间件处理程序都自动进行检测，以确定请求延迟、首字节时间、错误和请求/响应体大小。

对于下面的直方图指标，目前无法配置桶。对于持续时间，使用默认的 prometheus.DefBuckets 桶集（5ms、10ms、25ms、50ms、100ms、250ms、500ms、1s、2.5s、5s 和 10s）。对于大小，桶为 256b、1kiB、4kiB、16kiB、64kiB、256kiB、1MiB 和 4MiB。

`caddy_http_requests_in_flight`

当前正在由此服务器处理的请求数量的仪表。

标签	描述
`server`	服务器名称
`handler`	处理程序或模块名称

`caddy_http_request_errors_total`

处理请求时遇到的中间件错误计数器。

标签	描述
`server`	服务器名称
`handler`	处理程序或模块名称

`caddy_http_requests_total`

HTTP(S) 请求计数器。

标签	描述
`server`	服务器名称
`handler`	处理程序或模块名称

`caddy_http_request_duration_seconds`

往返请求持续时间的直方图。

标签	描述
`server`	服务器名称
`handler`	处理程序或模块名称
`code`	HTTP 状态码
`method`	HTTP 方法

`caddy_http_request_size_bytes`

请求总（估计）大小的直方图。包括请求体。

标签	描述
`server`	服务器名称
`handler`	处理程序或模块名称
`code`	HTTP 状态码
`method`	HTTP 方法

`caddy_http_response_size_bytes`

返回的响应体大小的直方图。

标签	描述
`server`	服务器名称
`handler`	处理程序或模块名称
`code`	HTTP 状态码
`method`	HTTP 方法

`caddy_http_response_duration_seconds`

响应首字节时间的直方图。

标签	描述
`server`	服务器名称
`handler`	处理程序或模块名称
`code`	HTTP 状态码
`method`	HTTP 方法

反向代理指标

`caddy_reverse_proxy_upstreams_healthy`

反向代理上游健康状态的仪表。

值 0 表示上游不健康，而 1 表示上游健康。

Sample Queries

Once you have Prometheus scraping Caddy's metrics, you can start to see some interesting metrics about how Caddy's performing.

For example, to see the per-second request rate, as averaged over 5 minutes:

rate(caddy_http_requests_total{handler="file_server"}[5m])

To see the rate at which your latency threshold of 100ms is being exceeded:

sum(rate(caddy_http_request_duration_seconds_count{server="srv0"}[5m])) by (handler)
-
sum(rate(caddy_http_request_duration_seconds_bucket{le="0.100", server="srv0"}[5m])) by (handler)

To find the 95th percentile request duration on the file_server handler, you can use a query like this:

histogram_quantile(0.95, sum(caddy_http_request_duration_seconds_bucket{handler="file_server"}) by (le))

Or to see the median response size in bytes for successful GET requests on the file_server handler:

histogram_quantile(0.5, caddy_http_response_size_bytes_bucket{method="GET", handler="file_server", code="200"})