Home GitHub

Disclaimer: Mobb.ninja is not official Red Hat documentation - These guides may be experimental, proof of concept or early adoption. Officially supported documentation is available at https://docs.openshift.com.

Federating Metrics to a centralized Prometheus Cluster

Red Hat Openshift for AWS (ROSA) comes with two built-in monitoring stacks. ClusterMonitoring and User Workload Monitoring. They are both based on Prometheus, the first targets the Cluster Operator (Red Hat SRE) and the latter targets the Cluster user (you!).

Both provide amazing metrics insights inside the Cluster’s web console, showing overall cluster metrics as well as namespace specific workload metrics, all integrated with your configured IDP.

However the Alert Manager instance is locked down and used to send alerts to the Red Hat SRE team. This means that the customer cannot create alerts for either the cluster resources, or their own workloads. This is being worked on and future versions of ROSA will provide a way for the end user to create alerts for their own workloads.

Until that work is done, the ROSA cluster administrator can deploy a Prometheus instance and configure it to send alerts to themselves. Thankfully with Prometheus’ federated metrics feature and the Prometheus Operator, this can be done in a few simple steps.

This guide is heavily influenced by Tommer Amber’s guide for OCP 4.x.

Pre-requisites

  1. Make sure the following pre-requisites are met:
  1. Before we get started we need to set some environment variables to be used throughout the guide.

     export NAMESPACE=custom-monitoring
    

Create a Namespace to work in

kubectl create namespace ${NAMESPACE}

Install Prometheus Operator

If you prefer you can do this from the Operator Hub in the cluster console itself.

Create a OperatorGroup and Subscription for the Prometheus Operator

cat << EOF | kubectl apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: federated-metrics
  namespace: ${NAMESPACE}
spec:
  targetNamespaces:
  - ${NAMESPACE}
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: prometheus
  namespace: ${NAMESPACE}
spec:
  channel: beta
  installPlanApproval: Automatic
  name: prometheus
  source: community-operators
  sourceNamespace: openshift-marketplace
EOF

Install Grafana Operator

If you prefer you can do this from the Operator Hub in the cluster console itself.

Create a Subscription for the Grafana Operator

cat << EOF | kubectl apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: operatorhubio-catalog
  namespace: ${NAMESPACE}
spec:
  sourceType: grpc
  image: quay.io/operator-framework/upstream-community-operators:latest
  displayName: Community Operators
  publisher: OperatorHub.io
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: grafana-operator
  namespace: ${NAMESPACE}
spec:
  channel: v4
  name: grafana-operator
  installPlanApproval: Automatic
  source: operatorhubio-catalog
  sourceNamespace: ${NAMESPACE}
EOF

Deploy the monitoring stack

  1. Wait until the Operators are running

     watch kubectl -n $NAMESPACE get pods
    

    You should see both operators and the catalog pods running:

     NAME                                                   READY   STATUS    RESTARTS   AGE
     grafana-operator-controller-manager-7f945d45d8-ggzk4   2/2     Running   0          87s
     operatorhubio-catalog-lmgt6                            1/1     Running   0          2m35s
     prometheus-operator-fc85b9bd-9klsq                     1/1     Running   0          3m10s
    
  2. Add The mobb.ninja repository to your local Helm

     helm repo add mobb https://rh-mobb.github.io/helm-charts/
    
  3. Update your Repositories

     helm repo update
    
  4. Install the mobb/rosa-federated-prometheus Helm Chart

     helm install -n $NAMESPACE monitoring \
       --set grafana-cr.basicAuthPassword='mypassword' \
       --set fullnameOverride='monitoring' \
       --version 0.5.1 \
       mobb/rosa-federated-prometheus
    

    Validate Prometheus

  5. Ensure the new Prometheus instance’s Pods are running

     kubectl get pods -n ${NAMESPACE} -l app=prometheus -o wide
    

    You should see the following:

     NAME                                 READY   STATUS    RESTARTS   AGE     IP             NODE                                        NOMINATED NODE   READINESS GATES
     prometheus-federation-prometheus-0   3/3     Running   1          7m58s   10.131.0.104   ip-10-0-215-84.us-east-2.compute.internal   <none>           <none>
     prometheus-federation-prometheus-1   3/3     Running   1          7m58s   10.128.2.21    ip-10-0-146-85.us-east-2.compute.internal   <none>           <none>
    
  6. Log into the new Prometheus instance

    Fetch the Route:

     kubectl -n ${NAMESPACE} get route prometheus-route
    

    You should see the following:

     NAME               HOST/PORT                                                                     PATH   SERVICES                   PORT            TERMINATION   WILDCARD
    prometheus-route   prometheus-route-custom-prometheus.apps.mycluster.jnmf.p1.openshiftapps.com          monitoring-prometheus-cr   web-proxy       reencrypt     None
    

    Open the Prometheus Route in your browser (the HOST/PATH field from above)

    It should take you through authorization and then you should see the Prometheus UI.

  7. add /targets to the end of the URL to see the list of available targets

    screenshot of prometheus targets screen

  8. Switch out the trailing path to be graph?g0.range_input=1h&g0.expr=kubelet_running_containers&g0.tab=0 to see the graph of the number of running containers fetched from cluster monitoring.

    screenshot of prometheus graph screen

  9. click on Alerts in the menu to see our example Alert

Validate Alert Manager

  1. forward a port to Alert Manager

     kubectl -n ${NAMESPACE} port-forward svc/monitoring-alertmanager-cr 9093:9093
    
  2. Browse to http://localhost:9093/#/alerts to see the alert “ExampleAlert”

    Screenshot of Alert Manager

Validate Grafana and Dashboards

  1. Find the Grafana Route

     kubectl get route grafana-route
    
     NAME            HOST/PORT                                                                PATH   SERVICES          PORT            TERMINATION   WILDCARD
    grafana-route   grafana-route-federated-metrics.apps.metrics.9l1z.p1.openshiftapps.com   /      grafana-service   grafana-proxy   reencrypt     None
    
  2. Log into grafana using your cluster’s idp

  3. Click login and login to Grafana as admin with the password you set when doing helm install.

  4. Click on Configuration -> Datasources and check that the prometheus data source is loaded.

    Sometimes due to Kubernetes resource ordering the Data Source may not be loaded. We can force the Operator to reload it by running kubectl annotate -n $NAMESPACE grafanadatasources.integreatly.org federated reroll=true

  5. Click on Dashboards -> Manage and click on the “Use Method / Cluster” dashboard.

    Screenshot of Grafana USE Dashboard

Cleanup

  1. Delete the helm release

     helm -n $NAMESPACE delete monitoring
    
  2. Delete the namespace

     kubectl delete namespace $NAMESPACE