Rocky ceilometer memory mteric add memory.usage 数据采集粒度不准确

问题描述

Here 是我参考的文档

配置/etc/ceilometer/pipeline.yaml,添加以下内容

sources:
    - name: memory_util_source
      meters:
          - "memory"
          - "memory.usage"
      sinks:
          - memory_util_sink
sinks:
    - name: memory_util_sink
      transformers:
          - name: "arithmetic"
            parameters:
                target:
                    name: "memory.usage"
                    unit: "%"
                    type: "gauge"
                    expr: "100 * $(memory.usage) / $(memory)"
      publishers:
          - gnocchi://?filter_project=service&archive_policy=ceilometer-low

gnocchi archive-policy show ceilometer-low

+---------------------+------------------------------------------------------------------+
| Field               | Value                                                            |
+---------------------+------------------------------------------------------------------+
| aggregation_methods | max,min,mean                                                   |
| back_window         | 0                                                                |
| deFinition          | - points: 8640,granularity: 0:05:00,timespan: 30 days,0:00:00 |
| name                | ceilometer-low                                                   |
+---------------------+------------------------------------------------------------------+

Gnocchi 资源内存使用指标衡量间隔粒度是每小时。每五分钟只有一个数据粒度,为什么会出现这么奇怪的现象。

enter image description here

解决方法

我尝试了一种解决方法来获取实例的内存利用率,步骤如下。

(1)在/ceilometer/compute/pollsters/instance_stats.py文件中添加如下代码。

class MemoryUtilPollster(InstanceStatsPollster):
    sample_name = 'memory_util'
    sample_unit = '%'
    sample_stats_key = 'memory_util'

(2)修改/ceilometer/compute/virt/libvirt/inspector.py文件中计算实例内存使用的逻辑代码

class LibvirtInspector(virt_inspector.Inspector):

    def inspect_instance(self,instance,duration=None):
        domain = self._get_domain_not_shut_off_or_raise(instance)
    
        memory_used = memory_resident = None
        memory_swap_in = memory_swap_out = None
        memory_stats = domain.memoryStats()
    
        # Stat provided from libvirt is in KB,converting it to MB.
        if 'usable' in memory_stats and 'available' in memory_stats:
            memory_used = (memory_stats['available'] -
                           memory_stats['usable']) / units.Ki
        elif 'available' in memory_stats and 'unused' in memory_stats:
            memory_used = (memory_stats['available'] -
                           memory_stats['unused']) / units.Ki
        if 'rss' in memory_stats:
            memory_resident = memory_stats['rss'] / units.Ki
        if 'swap_in' in memory_stats and 'swap_out' in memory_stats:
            memory_swap_in = memory_stats['swap_in'] / units.Ki
            memory_swap_out = memory_stats['swap_out'] / units.Ki
    
        # Tristack: add memory_util
        memory_total = memory_stats['available'] / units.Ki
        memory_util = int(100 * memory_used / memory_total)
    
        # TODO(sileht): stats also have the disk/vnic info
        # we could use that instead of the old method for Queen
        stats = self.connection.domainListGetStats([domain],0)[0][1]
        cpu_time = 0
        current_cpus = stats.get('vcpu.current')
        # Iterate over the maximum number of CPUs here,and count the
        # actual number encountered,since the vcpu.x structure can
        # have holes according to
        # https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/libvirt-domain.c
        # virConnectGetAllDomainStats()
        for vcpu in six.moves.range(stats.get('vcpu.maximum',0)):
            try:
                cpu_time += (stats.get('vcpu.%s.time' % vcpu) +
                             stats.get('vcpu.%s.wait' % vcpu))
                current_cpus -= 1
            except TypeError:
                # pass here,if there are too many holes,the cpu count will
                # not match,so don't need special error handling.
                pass
    
        if current_cpus:
            # There wasn't enough data,so fall back
            cpu_time = stats.get('cpu.time')
    
        return virt_inspector.InstanceStats(
            cpu_number=stats.get('vcpu.current'),cpu_time=cpu_time,# Tristack: add memory_util
            memory_util=memory_util,memory_usage=memory_used,memory_resident=memory_resident,memory_swap_in=memory_swap_in,memory_swap_out=memory_swap_out,cpu_cycles=stats.get("perf.cpu_cycles"),instructions=stats.get("perf.instructions"),cache_references=stats.get("perf.cache_references"),cache_misses=stats.get("perf.cache_misses"),memory_bandwidth_total=stats.get("perf.mbmt"),memory_bandwidth_local=stats.get("perf.mbml"),cpu_l3_cache_usage=stats.get("perf.cmt"),)

(3)在/ceilometer/compute/virt/inspector.py文件中添加InstanceStats对象的memory_util属性

class InstanceStats(object):
    fields = [
        'cpu_number',# number: number of CPUs
        'cpu_time',# time: cumulative CPU time
        'cpu_util',# util: CPU utilization in percentage
        'cpu_l3_cache_usage',# cachesize: Amount of CPU L3 cache used
        'memory_util',# Tristack: add memory_util
        'memory_usage',# usage: Amount of memory used
        'memory_resident',#
        'memory_swap_in',# memory swap in
        'memory_swap_out',# memory swap out
        'memory_bandwidth_total',# total: total system bandwidth from one
                                   #   level of cache
        'memory_bandwidth_local',# local: bandwidth of memory traffic for a
                                   #   memory controller
        'cpu_cycles',# cpu_cycles: the number of cpu cycles one
                                   #   instruction needs
        'instructions',# instructions: the count of instructions
        'cache_references',# cache_references: the count of cache hits
        'cache_misses',# cache_misses: the count of caches misses
    ]

    def __init__(self,**kwargs):
        for k in self.fields:
            setattr(self,k,kwargs.pop(k,None))
        if kwargs:
            raise AttributeError(
                "'InstanceStats' object has no attributes '%s'" % kwargs)

(4)在setup.cfg文件中ceilometer.poll.compute下添加memory_util插件

ceilometer.poll.compute =
    memory_util = ceilometer.compute.pollsters.instance_stats:MemoryUtilPollster

(5)打包编译安装ceilometer,打包编译过程参考链接如下。可以找到 openstack-ceilometer-11.0.1-1.el7.src.rpm here

# groupadd mockbuild
# useradd mockbuild -g mockbuild
# rpm -ivh openstack-ceilometer-11.0.1-1.el7.src.rpm 
After the installation is complete,the rpm build project is automatically deployed in
/root/rpmbuild/SPECS
/root/rpmbuild/SOURCES

cd /root/rpmbuild/SPECS
rpmbuild -bb openstack-ceilometer.spec

The ceilometer rpm package is in the /root/rpmbuild/RPMS directory,install these packages.

(6)在/etc/ceilometer/gnocchi_resources.yaml文件中添加如下配置。

resources:
  - resource_type: instance
      # Tristack: add memory_util
      memory_util:

(7)在/etc/ceilometer/polling.yaml文件中添加如下配置。

sources:
    - name: some_pollsters
      interval: 300
      meters:
        # Tristack: add memory_util
        - memory_util

(8)在/etc/ceilometer/pipeline.yaml文件中添加如下配置。

sources:
    # Tristack: add memory_util
    - name: memory_util_source
      meters:
          - "memory_util"
      sinks:
          - memory_util_sink
sinks:
    # Tristack: add memory_util
    - name: memory_util_sink
      publishers:
          - gnocchi://?filter_project=service&archive_policy=ceilometer-low

(9)最后重启openstack-ceilometer-compute服务