Grafana Stack & Observability#
This section details the monitoring, logging, and observability stack managed via Flux. The stack is primarily built around the LGTM stack (Loki, Grafana, Tempo, Mimir), utilizing Alloy as the collector.
Grafana Stack#
The core observability platform is defined in grafana-stack.
Overview#
- Namespace:
monitoring - Components: Loki (Logs), Mimir (Metrics), Alloy (Collector), k8s-monitoring (Meta-chart).
- Source: Grafana Helm Charts
Components & Architecture#
1. Alloy (Collector)#
Alloy is deployed as a DaemonSet (and singleton Deployment via k8s-monitoring) to collect telemetry.
- Configuration: Defined in
apps/production/grafana-stack/config.alloyand injected via a ConfigMapGenerator namedalloy-config. - Pipelines:
- Logs: Discovers Pods/Nodes, relabels Kubernetes metadata (namespace, pod, container), and pushes to
loki-distributor. - Metrics: Scrapes Pods, Nodes (cAdvisor), OS metrics, and standard Kubernetes ServiceMonitors/PodMonitors.
- Remote Write: Pushes metrics to
http://mimir-nginx/api/v1/push. - Static Targets: Explicitly scrapes Docker (
192.168.1.32) and Graphite Exporter.
- Logs: Discovers Pods/Nodes, relabels Kubernetes metadata (namespace, pod, container), and pushes to
2. K8s Monitoring#
A meta-chart (k8s-monitoring) orchestrates the monitoring configuration.
- Version:
3.5.3 - Cluster Name:
k3s-prod - Integrations:
- Cert-Manager: Auto-discovered.
- Node Logs: Scrapes
kubelet.serviceandcontainerd.servicefrom systemd journal. - PrusaLink: Static scrape configuration for 3D printer metrics (
192.168.1.32:10009).
- Features Enabled: Annotation Autodiscovery (
prometheus.io/scrape), Cluster Events, Cluster Metrics.
3. Loki (Logs)#
- Deployment: Distributed microservices mode (implied by chart
loki). - Version:
6.x.x
4. Mimir (Metrics)#
- Deployment: Distributed mode (
mimir-distributed). - Version:
6.x.x
GitOps Strategy#
- Base:
apps/base/grafana-stackdefines the HelmRepository and base HelmReleases. - Production:
apps/production/grafana-stackapplies overlays:- Loki/Mimir: Updates chart versions.
- Alloy: Enables clustering and binds the custom
config.alloy. - K8s-Monitoring: Defines the cluster-specific monitoring rules and destinations.
Observability Extras (o11y)#
The apps/production/o11y directory contains additional observability configurations.
Inactive Status
The files in apps/production/o11y are currently not included in the production kustomization.yaml resource list. These components are defined but not deployed.
Components#
1. API Server SLO (apiserver-slo.yaml)#
- Kind:
PrometheusServiceLevel(Sloth) - Objectives:
- Availability: 99.9% (Errors 5xx/429).
- Latency: 99% (Requests < 0.4s).
- Alerts: Generates
K8sApiserverAvailabilityAlertandK8sApiserverLatencyAlert.
2. SNMP Probe (snmp-probe.yaml)#
- Kind:
Probe(Prometheus Operator) - Module:
ubiquiti_unifi - Targets: Unifi devices (
192.168.1.152,.181,.193). - Endpoint:
docker.local.isabelsoler.es:9116
3. Uptime Kuma (uptime-kuma.yaml)#
- Kind:
Probe - Target:
status.local.isabelsoler.es - Auth: Uses Basic Auth from
uptime-kuma-authsecret. - Relabeling: Rewrites
monitor_urlto mask credentials in PostgreSQL connection strings.
Recommendations#
Activate o11y
If these probes and SLOs are desired, add - ../base/o11y (if a base exists) or direct file references (e.g., - o11y/apiserver-slo.yaml) to apps/production/kustomization.yaml.
Sloth Integration
The root kustomization.yaml installs Sloth CRDs. Enabling apiserver-slo.yaml would leverage this to provide automatic high-quality recording rules and alerts for the Kubernetes API.
Prometheus Blackbox Exporter#
The Blackbox Exporter allows probing of endpoints over HTTP, HTTPS, DNS, TCP, and ICMP. It is used to monitor the external and internal availability of services.
Overview#
- Namespace:
monitoring - Component: Prometheus Blackbox Exporter
- Source: Prometheus Community Charts
- Version:
>=9.2.0
Configuration#
Modules#
The exporter is configured with a custom module http_2xx for HTTP checking:
- Prober:
http - Timeout: 5 seconds
- Features: Follows redirects, prefers IPv4, and skips TLS verification (useful for internal self-signed certs or avoiding expiry noise).
Service Monitor & Targets#
The deployment includes a ServiceMonitor that automatically scrapes the following targets using the http_2xx module.
External Targets (Public Endpoints):
- Docs:
https://docs.igresc.com/ - Personal Site:
https://sergicastro.com/healthcheck - Authentik:
https://auth.igresc.com/-/health/ready/ - Immich:
https://photos.igresc.com/ - Jellyfin:
https://jellyfin.igresc.com/ - Mealie:
https://mealie.igresc.com/ - Navidrome:
https://music.igresc.com/
Internal Targets (Cluster Internal):
- Authentik Service:
https://authentik-server.auth.svc.cluster.local/-/health/ready/
GitOps Strategy#
- Base:
apps/base/prometheus-blackboxdefines the HelmRepository and the base HelmRelease. - Production:
apps/production/prometheus-blackbox/blackbox-exporter-release.yamlpatches the configuration to add the specific probe targets and module definitions.