问题描述
我是 Loki 的新手,并在 Loki 中发出了警报,但我在 Alertmanager 中没有看到任何通知。 Loki 工作正常(收集日志),Alertmanager 也工作正常(从其他来源获取警报),但 loki 的日志不会推送到 alertmanager。
Loki 配置:
auth_enabled: false
server:
http_listen_port: 3100
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 1h # Any chunk not receiving new logs in this time will be flushed
max_chunk_age: 1h # All chunks will be flushed when they hit this age,default is 1h
chunk_target_size: 1048576 # Loki will attempt to build chunks up to 1.5MB,flushing first if chunk_idle_period or max_chunk_age is reached first
chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
max_transfer_retries: 0 # Chunk transfers disabled
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /loki/boltdb-shipper-active
cache_location: /loki/boltdb-shipper-cache
cache_ttl: 24h # Can be increased for faster performance over longer query periods,uses more disk space
shared_store: filesystem
filesystem:
directory: /loki/chunks
compactor:
working_directory: /loki/boltdb-shipper-compactor
shared_store: filesystem
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: false
retention_period: 0s
ruler:
storage:
type: local
local:
directory: etc/loki/rules
rule_path: /etc/loki/
alertmanager_url: http://171.11.3.160:9093
ring:
kvstore:
store: inmemory
enable_api: true
Docker-compose Loki:
loki:
image: grafana/loki:2.0.0
container_name: loki
ports:
- "3100:3100"
volumes:
- ./loki/etc/local-config.yaml:/etc/loki/local-config.yaml
- ./loki/etc/rules/rules.yaml:/etc/loki/rules/rules.yaml
command:
- '--config.file=/etc/loki/local-config.yaml'
洛基规则:
groups:
- name: rate-alerting
rules:
- alert: HighLogRate
expr: |
count_over_time(({job="grafana"})[1m]) >=0
for: 1m
有人知道这是什么问题吗?
解决方法
配置看起来不错,和我的很像。我将通过以下步骤对其进行故障排除:
-
执行到 docker 容器并检查规则文件是否为空
cat /etc/loki/rules/rules.yaml
-
查看 loki 的日志。当规则加载正确时,会弹出这样的日志:
level=info ts=2021-05-06T11:18:33.355446729Z caller=module_service.go:58 msg=initialising module=ruler
level=info ts=2021-05-06T11:18:33.355538059Z caller=ruler.go:400 msg="ruler up and running"
level=info ts=2021-05-06T11:18:33.356584674Z caller=mapper.go:139 msg="updating rule file" file=/data/loki/loki-stack-alerting-rules.yaml
- 在运行时 loki 还会记录有关您的规则的信息消息(我将向您展示我正在运行的规则,但略有缩短)(注意
status=200
和非空bytes=...
):
level=info
ts=...
caller=metrics.go:83
org_id=...
traceID=...
latency=fast
query="sum(rate({component=\"kube-apiserver\"} |~ \"stderr F E.*failed calling webhook \\\"webhook.openpolicyagent.org\\\". an error on the server.*has prevented the request from succeeding\"[1m])) > 1"
query_type=metric
range_type=instant
length=0s
step=0s
duration=9.028961ms
status=200
throughput=40MB
total_bytes=365kB
-
然后确保您可以从 loki 容器访问 alertmanager http://171.11.3.160:9093,没有任何问题(可能是网络问题,或者您已经设置了基本身份验证等)。
-
如果您设置的规则(您可以从 grafana 浏览窗口测试)将超过您设置的阈值 1 分钟,警报应显示在警报管理器中。由于您没有为其添加任何标签,因此它很可能会被取消分组。