Home GitHub

Disclaimer: Mobb.ninja is not official Red Hat documentation - These guides may be experimental, proof of concept or early adoption. Officially supported documentation is available at https://docs.openshift.com.

Federating Metrics to a centralized Prometheus Cluster

Red Hat Openshift for AWS (ROSA) comes with two built-in monitoring stacks. ClusterMonitoring and User Workload Monitoring. They are both based on Prometheus, the first targets the Cluster Operator (Red Hat SRE) and the latter targets the Cluster user (you!).

Both provide amazing metrics insights inside the Cluster’s web console, showing overall cluster metrics as well as namespace specific workload metrics, all integrated with your configured IDP.

However the Alert Manager instance is locked down and used to send alerts to the Red Hat SRE team. This means that the customer cannot create alerts for either the cluster resources, or their own workloads. This is being worked on and future versions of ROSA will provide a way for the end user to create alerts for their own workloads.

Until that work is done, the ROSA cluster administrator can deploy a Prometheus instance and configure it to send alerts to themselves. Thankfully with Prometheus’ federated metrics feature and the Prometheus Operator, this can be done in a few simple steps.

This guide is heavily influenced by Tommer Amber’s guide for OCP 4.x.


  1. Make sure the following pre-requisites are met:
  1. Before we get started we need to set some environment variables to be used throughout the guide.

     export NAMESPACE=custom-monitoring

Prepare Environment

  1. Set the following environment variables

     export NAMESPACE=federated-metrics
  2. Create the namespace

     oc new-project $NAMESPACE
  3. Add the MOBB chart repository to your Helm

     helm repo add mobb https://rh-mobb.github.io/helm-charts/
  4. Update your repositories

     helm repo update
  5. Use the mobb/operatorhub chart to deploy the needed operators

     helm upgrade -n $NAMESPACE federated-metrics-operators \
       mobb/operatorhub --version 0.1.0 --install \
       --values https://raw.githubusercontent.com/rh-mobb/helm-charts/main/charts/rosa-federated-prometheus/files/operatorhub.yaml
  6. Wait until the two operators are running

     watch kubectl get pods -n $NAMESPACE
     NAME                                                   READY   STATUS    RESTARTS   AGE
     grafana-operator-controller-manager-775f8d98c9-822h7   2/2     Running   0          7m33s
     operatorhubio-dtb2v                                    1/1     Running   0          8m32s
     prometheus-operator-5cb6844699-t7wfd                   1/1     Running   0          7m29s

Deploy the monitoring stack

  1. Wait until the Operators are running

     watch kubectl -n $NAMESPACE get pods

    You should see both operators and the catalog pods running:

     NAME                                                   READY   STATUS    RESTARTS   AGE
     grafana-operator-controller-manager-7f945d45d8-ggzk4   2/2     Running   0          87s
     operatorhubio-catalog-lmgt6                            1/1     Running   0          2m35s
     prometheus-operator-fc85b9bd-9klsq                     1/1     Running   0          3m10s
  2. Install the mobb/rosa-federated-prometheus Helm Chart

     helm upgrade --install -n $NAMESPACE monitoring \
       --set grafana-cr.basicAuthPassword='mypassword' \
       --set fullnameOverride='monitoring' \
       --version 0.5.1 \

    Validate Prometheus

  3. Ensure the new Prometheus instance’s Pods are running

     kubectl get pods -n ${NAMESPACE} -l app=prometheus -o wide

    You should see the following:

     NAME                                 READY   STATUS    RESTARTS   AGE     IP             NODE                                        NOMINATED NODE   READINESS GATES
     prometheus-federation-prometheus-0   3/3     Running   1          7m58s   ip-10-0-215-84.us-east-2.compute.internal   <none>           <none>
     prometheus-federation-prometheus-1   3/3     Running   1          7m58s    ip-10-0-146-85.us-east-2.compute.internal   <none>           <none>
  4. Log into the new Prometheus instance

    Fetch the Route:

     kubectl -n ${NAMESPACE} get route prometheus-route

    You should see the following:

     NAME               HOST/PORT                                                                     PATH   SERVICES                   PORT            TERMINATION   WILDCARD
    prometheus-route   prometheus-route-custom-prometheus.apps.mycluster.jnmf.p1.openshiftapps.com          monitoring-prometheus-cr   web-proxy       reencrypt     None

    Open the Prometheus Route in your browser (the HOST/PATH field from above)

    It should take you through authorization and then you should see the Prometheus UI.

  5. add /targets to the end of the URL to see the list of available targets

    screenshot of prometheus targets screen

  6. Switch out the trailing path to be graph?g0.range_input=1h&g0.expr=kubelet_running_containers&g0.tab=0 to see the graph of the number of running containers fetched from cluster monitoring.

    screenshot of prometheus graph screen

  7. click on Alerts in the menu to see our example Alert

Validate Alert Manager

  1. forward a port to Alert Manager

     kubectl -n ${NAMESPACE} port-forward svc/monitoring-alertmanager-cr 9093:9093
  2. Browse to http://localhost:9093/#/alerts to see the alert “ExampleAlert”

    Screenshot of Alert Manager

Validate Grafana and Dashboards

  1. Find the Grafana Route

     kubectl get route grafana-route
     NAME            HOST/PORT                                                                PATH   SERVICES          PORT            TERMINATION   WILDCARD
    grafana-route   grafana-route-federated-metrics.apps.metrics.9l1z.p1.openshiftapps.com   /      grafana-service   grafana-proxy   reencrypt     None
  2. Log into grafana using your cluster’s idp

  3. Click login and login to Grafana as admin with the password you set when doing helm install.

  4. Click on Configuration -> Datasources and check that the prometheus data source is loaded.

    Sometimes due to Kubernetes resource ordering the Data Source may not be loaded. We can force the Operator to reload it by running kubectl annotate -n $NAMESPACE grafanadatasources.integreatly.org federated reroll=true

  5. Click on Dashboards -> Manage and click on the “Use Method / Cluster” dashboard.

    Screenshot of Grafana USE Dashboard


  1. Delete the helm release

     helm -n $NAMESPACE delete monitoring
  2. Delete the namespace

     kubectl delete namespace $NAMESPACE