Go to file
Marvin Preuss bf0d3369f6
Some checks failed
test / test (push) Has been cancelled
docs: fix release url
2024-09-24 12:46:13 +02:00
.github/workflows ci: archlist from taskfile options 2024-09-24 12:34:29 +02:00
internal fix: only write metrics if there is a useragent 2024-09-23 12:48:35 +02:00
test/integration build: moving things around 2024-09-23 11:31:05 +02:00
vendor tes: adds integration tests with testcontainers 2024-09-23 09:37:27 +02:00
.envrc first commit 2024-09-19 11:21:05 +02:00
.gitignore build: multi arch docker images 2024-09-24 10:33:40 +02:00
.golangci.yaml first commit 2024-09-19 11:21:05 +02:00
flake.lock chore: flake update 2024-09-24 10:29:59 +02:00
flake.nix build: disable doCheck because we already test it 2024-09-23 09:43:37 +02:00
go.mod tes: adds integration tests with testcontainers 2024-09-23 09:37:27 +02:00
go.sum tes: adds integration tests with testcontainers 2024-09-23 09:37:27 +02:00
grafana.png docs: adds more screenshots 2024-09-23 14:17:19 +02:00
main.go first commit 2024-09-19 11:21:05 +02:00
README.md docs: fix release url 2024-09-24 12:46:13 +02:00
Taskfile.yml ci: archlist from taskfile options 2024-09-24 12:34:29 +02:00
vm.png docs: adds more screenshots 2024-09-23 14:17:19 +02:00

caddy-log-exporter 📈🙈🙉🙊

Uses the caddy json log and exports metrics from parsing them for prometheus or victoriametrics.

Why

I use the wonderful caddy server just everywhere. There are two things that i discovered. For once there is still the thing that the caddy metrics doesnt add the host as label. There are some github issues about it and im sure that this feature will be added at one point. until now its a problem if you have multiple domains behind a caddy instance.

The other point is: i run a little gitea instance for some little projects on a small vserver from hetzner. some weeks ago i got prometheus alerts for cpu and memory exhaustion. i checked logs and everything and found out that i got a victim of this stupid fucked up ai craweler shit. so i start to check my caddy logs if i discover high load on my server and add the user agents to my Caddyfile. i thought it would be nice to find a top list of scraper bots that gets by my caddy server. so this exporter also adds the user agents as prometheus labels.

Here metrics after running the exporter for a few minutes:

vm

Here in grafana:

grafana

Happily, all the stuff we need is available through the standard json logs 🥳🎉, we tail them and create metrics out of them.

Installation

Grab the latest docker image from here.

Images are available for x86_64-linux and aarch64-linux.

Usage

docker run -v /var/log/caddy:/var/log/caddy -e CADDY_LOG_EXPORTER_LOG_FILES=/var/log/caddy/caddy.log ghcr.io/xsteadfastx/caddy-log-exporter:0.1.0-rc6

Caddyfile

Enabling the json log to file.

log {
    format json
    output file /var/log/caddy/caddy.log
}

Scraping

- job_name: caddy-log-exporter
  scheme: http
  static_configs:
    - targets:
        - "caddy-log-exporter.tld:2112"

Bonus: Blocking AI bots in caddy

git.foo.tld {
    @badbots {
        header User-Agent *AhrefsBot*
        header User-Agent *Amazonbot*
        header User-Agent *Barkrowler*
        header User-Agent *Bytespider*
        header User-Agent *DataForSeoBot*
        header User-Agent *ImagesiftBot*
        header User-Agent *MJ12bot*
        header User-Agent *PetalBot*
        header User-Agent *SemrushBot*
        header User-Agent *facebookexternalhit*
        header User-Agent *meta-externalagent*
    }

    abort @badbots

    cache
    reverse_proxy gitea:3000
}

Configuration

There are some config values we can set through environment variables.

  • CADDY_LOG_EXPORTER_LOG_FILES: Comma seperated paths with log files
  • CADDY_LOG_EXPORTER_ADDR: defaults to :2112