prometheus – How can I get nginx to add a newline to a response from an upstream server?

I have a service that exposes a Prometheus statistics endpoint. I’m using Telegraf’s input.prometheus to relay these stats. Unfortunately this particular service doesn’t terminate the request with a newline character as is required by the text exposition format spec, and as a result, the Telegraf Prometheus input chokes (unexpected end of stream).

As a quick fix I’d like to use nginx to proxy this stats endpoint and append a newline char to the response. I don’t need to inspect the response and only add the newline if it doesn’t already end with one, as I already know that it won’t have one.

My deployment environment is a bit sensitive to added dependencies, so I’d really like a solution that can work with the default community build of nginx without need for any additional modules. However if that’s not possible, I’d also be very grateful for a solution that would run on a typical OpenResty install. Solutions that make use of very recent features are fine as well.

Observe history of remaining capacity of a buffer with Prometheus

I have a ring buffer and I want to observe its remaining capacity at specific intervals (say, each 200ms). To do this, I used a Histogram. Here is the code in Java:


Histogram capacityHistogram = Histogram.build()
            .linearBuckets(30, 10, 7)  // Buckets are : (0, 30), (30, 40), ...
            .name("capacity_history")
            .help("history of buffer free capacity")
            .labelNames("name")
            .register();

void updateMetrics() {
    double remainingPercentage = (ringBuffer.remainingCapacity() / (double) ringBuffer.size()) * 100;
    capacityHistogram.labels(this.name).observe(remainingPercentage);
}

this.taskScheduler = new ThreadPoolTaskScheduler();
this.taskScheduler.initialize();
taskScheduler.schedule(this::updateMetrics, new PeriodicTrigger(200, TimeUnit.MILLISECONDS));

However, since the exposed metrics are not exclusive (e.g. le="40" contains both (0, 30) and (30, 40) buckets) and my observation is not per event nor per second nor per scrape, I have problems in visualizing and getting the business value that I was looking for. I tried this query:

rate(capacity_history_bucket{name="whatever"}(5m)) * 20
// I sample 5 times per second so I multiplied it by 20 to have a percent-like value

I am stuck in thinking for another way of monitoring the history of the remaining capacity of my ring buffer. Can my current approach be used or I need to change my usage?

monitoring – Netdata, prometheus and grafana – why recommended in this constellation?

Currently I do setup monitoring-service for our Server.
I’ve found some articles, which recommend to use: Netdata, prometheus and grafana

So I wonder what are the benefits to use all three of them.
Can anybody explain it to me, why prometheus and grafana is recommended, if it’s seems that netdata do the same stuff as those both?

Even at the netdata-documentations, there are an article about the setup for this: https://learn.netdata.cloud/docs/agent/backends/prometheus/#filtering-metrics-sent-to-prometheus

What are the benefits of this constellation?

prometheus – Handling alerts triggered by boolean condition; keep alertmanager from auto-resolving until manual clearance

Let’s say we have an event which is a point in time event, meaning that on the next check Prometheus may not show the event was triggered but the underlying condition is still active.

What we’re finding in our situations like this, prometheus will “clear” the event and alertmanager will auto-resolve. So we’ll get alerted/paged for the event, but on the next check the alert clears. What should really be done is we need to check on the event and manually resolve the alert when we have either fixed the problem or determined that the incident did indeed resolve.

I understand this is counterintuitive in most cases, but in some cases, it’s valid as is the case in some use-cases in our environment.

So, how does one achieve this trigger-on-event-and-don’t-auto-resolve-until-manually-resolved thing with prom/alertmanager? Is it possible? I haven’t found any documentation describing how to do this.

linux – Prometheus not monitoring all ec2 instances of a region

I have setup an Prometheus for the monitoring of my AWS ec2 instances but the issue is Prometheus showing UP only 1 instance, however in my AWS instance account there is 2 instances running. The both instances is running in same region but don’t know why its showing only 1 instance.

Here is the my Configuration in Prometheus.yml

job_name: 'AWS-Instance' scrape_interval: 5s ec2_sd_configs: - region: us-east-2 access_key: ************ secret_key: ************ port: 9100 relabel_configs: - source_labels: (__meta_ec2_public_ip) regex: '(.*)' target_label: address replacement: '${1}:9100' - source_labels: (__meta_ec2_tag_Name) target_label: instance

What does it mean to have more than one instance of Prometheus in Kubernetes

Suppose I’m using a volume to persist my Prometheus data, I wonder if I can have more than one instance of it running to have high availability.

I believe only one instance of Prometheus must be in charge of writing to the tsdb series, and having more than one instance can cause race conditions and corrupt the data (and I think the reason its tsdb has a lockfile must be that).

So, the question is, can I have multiple instances of Prometheus running, and both point to a same data and use it as a high availability option? Or is there any other way to keep Prometheus always available during a rolling restart caused by configuration changes?

ubuntu – Gitlab prometheus error after upgrading with apt-get upgrade

After upgrading my Gitlab from 10.9.2 to version 10.10 on my Ubuntu 18.04, the apt-get upgrade process could not be successfully completed. It ended with the following message:

Ensuring Prometheus is updated:Traceback (most recent call last):
    11: from /opt/gitlab/embedded/bin/omnibus-ctl:23:in `
' 10: from /opt/gitlab/embedded/bin/omnibus-ctl:23:in `load' 9: from /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/omnibus-ctl-0.6.0/bin/omnibus-ctl:31:in `' 8: from /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/omnibus-ctl-0.6.0/lib/omnibus-ctl.rb:746:in `run' 7: from /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/omnibus-ctl-0.6.0/lib/omnibus-ctl.rb:197:in `block in add_command' 6: from /opt/gitlab/embedded/service/omnibus-ctl/prometheus-upgrade.rb:45:in `block in load_file' 5: from /opt/gitlab/embedded/service/omnibus-ctl/lib/gitlab_ctl/prometheus_upgrade.rb:40:in `backup_data' 4: from /opt/gitlab/embedded/lib/ruby/2.6.0/fileutils.rb:520:in `mv' 3: from /opt/gitlab/embedded/lib/ruby/2.6.0/fileutils.rb:1555:in `fu_each_src_dest' 2: from /opt/gitlab/embedded/lib/ruby/2.6.0/fileutils.rb:1571:in `fu_each_src_dest0' 1: from /opt/gitlab/embedded/lib/ruby/2.6.0/fileutils.rb:1557:in `block in fu_each_src_dest' /opt/gitlab/embedded/lib/ruby/2.6.0/fileutils.rb:525:in `block in mv': File exists - /var/opt/gitlab/prometheus/data_tmp/data (Errno::EEXIST) Ensuring Prometheus is updated: NOT OK Error ensuring Prometheus is updated. Please check the logs dpkg: error processing package gitlab-ce (--configure): installed gitlab-ce package post-installation script subprocess returned error exit status 1 Errors were encountered while processing: gitlab-ce E: Sub-process /usr/bin/dpkg returned an error code (1)

When I restart my gitlab with gitlab-ctl restart it works fine, but when I try apt-get upgrade again it wants to finish the installation and stops again with the same message.

Any idea?

Grafana – Graphite vs. Prometheus query differences

I am just about to shift my metrics from statsd to prometheus.

I'm having trouble converting queries from statsD syntax to prometheus. I am not sure and am looking for a documentary that compares the two. Specifically this range syntax (5 m) and rate etc. It is not clearly translated into the language statsD.

In this example, I want to measure and graph the average delay time of processed events between the timestamp in which the event was created (attribute for an event) and the current time when the metric is reported by the code itself:

Pseudo:

def processEvent(event) 
  eventTime = event.timestamp
  now = Now()
  lag = now - eventTime
   ...
  statsD.time(...metric_name(event.feed,event.action)..., lag)

The statsD query looks like this:

groupByNode(stats.timers..lag...mean, 4, 'avg')

(The average time of the mean reported for this metric across all instances)

In my new code, the metric is created as follows:

current_lag_gauge = Gauge('kafka_lag', 'tracks the lag of events',labelnames=('feed', 'action'))
lag = event.timestamp - Now()
...
current_lag_gauge.labels(feed="v1", action="v2").set(lag)

The library adds some additional labels. The metric looks like this:

kafka_lag{action="v2",app="app1",base_chart_version="xxx",feed="123",instance="1.2.3.4:8000",job="kubernetes-xxx",kubernetes_namespace="xxxx",kubernetes_pod_name="xxxxx",pod_template_hash="123431314",release="xxxx"}

Is there a reference guide or a cheat sheet to compare the 2? I would also appreciate a few sample inquiries to solve this translation.

kubernetes – The provision of the Prometheus operator via Helm 3 and Grafana leads to nothing

I am trying to deploy Prometheus via helmet version 3.1.2 and Grafana on my K8S cluster.
(- The cluster was provided with ansub with kubespary.)

To accomplish this task, run O helm install my-prometheus-operator stable/prometheus-operator – –
the simple command to run the desired diagram. (- Without overwriting etc …)

I keep getting the output shown in the screenshot, and it seems that the command continues forever – no result from the server.

Does anyone know how to solve the problem? Or maybe someone really knows how to accomplish the task mentioned?
Thanks in advance for any help!

Enter the image description here

How to create a separate warning rule for each scrape_config in prometheus

We have 3 env. Developer, test and product defined in the Prometheus Yamal file.

We have a single rule file that monitors CPU, memory, etc. on all 3 environments mentioned above.

We are planning to add another env say preprod and it should use a different warning rule than the default rule defined in the global section. The prepord server has JVM applications. We only have to monitor and only be alerted for the JVM application in preprod env in prometheus. How we can get there. Kind advice.