Home GitHub

Disclaimer: Mobb.ninja is not official Red Hat documentation - These guides may be experimental, proof of concept or early adoption. Officially supported documentation is available at https://docs.openshift.com.

Federating System and User metrics to S3 in RedHat OpenShift for AWS

Paul Czarkowski

06/07/2021

This guide walks through setting up federating Prometheus metrics to S3 storage.

ToDo - Add Authorization in front of Thanos APIs

Pre-Prequsites

  1. A ROSA cluster

  2. clone this repo down locally

     git clone https://github.com/rh-mobb/documentation
     cd docs/rosa/federated-metrics
    

AWS Preperation

  1. Create IAM user

     aws iam create-user --user-name thanos-receiver | jq
    
  2. Update the s3-policy.json file with the ARN from the output.

  3. Create an S3 storage account

     aws s3 mb s3://my-thanos-metrics
    
  4. Grant access for the thanos user to the s3 bucket

aws s3api put-bucket-policy –bucket my-thanos-metrics
–policy file://s3-policy.json

  1. Get the account key and secret and update in thanos-store-credentials.yaml
aws iam create-access-key --user-name thanos-receiver | jq .
  1. Create the Thanos Store Credentials Secret
oc new-project thanos-receiver
oc apply -f thanos-store-credentials.yaml

Enabling User Workload Monitoring

See docs for more indepth details.

  1. Check the if user workload is enabled (enabledUserWorkload: true)

     oc -n openshift-monitoring get configmap cluster-monitoring-config  \
       -o json | jq -r '.data."config.yaml"'
    
  2. If not, enable User Workload Monitoring by doing one of the following

    If the data.config.yaml is not {} you should edit it and add the enableUserWorkload: true line manually.

     oc -n openshift-monitoring edit configmap cluster-monitoring-config
    

    Otherwise if its {} then you can run the following command safely.

     oc patch configmap cluster-monitoring-config -n openshift-monitoring \
        -p='{"data":{"config.yaml": "enableUserWorkload: true\n"}}'
    
  3. Check that the User workload monitoring is starting up

     oc -n openshift-user-workload-monitoring get pods
    

Deploy Thanos Store Gateway

  1. Deploy the thanos store

     oc apply -n thanos-receiver -f thanos-store.yaml
    
  2. Deploy Thanos Receiver

    Note we should be securing this via OIDC / Bearer Tokens

     oc -n thanos-receiver apply -f thanos-receive.yaml
    
  3. Append remoteWrite settings to the cluster-monitoring config to forward cluster metrics to Thanos.

     oc -n openshift-monitoring edit configmaps cluster-monitoring-config
    
       data:
         config.yaml: |
           ...
           prometheusK8s:
           ...
             remoteWrite:
               - url: "http://thanos-receive.thanos-receiver.svc.cluster.local:9091/api/v1/receive"
    
  4. Append remoteWrite settings to the user-workload-monitoring config to forward user workload metrics to Thanos.

    Check if the User Workload Config Map exists:

     oc -n openshift-user-workload-monitoring get \
       configmaps user-workload-monitoring-config
    

    If the config doesn’t exist run:

     oc apply -f user-workload-monitoring-config.yaml
    

    Otherwise update it with the following:

     oc -n openshift-user-workload-monitoring edit \
       configmaps user-workload-monitoring-config
    
       data:
         config.yaml: |
           ...
           prometheus:
           ...
             remoteWrite:
               - url: "http://thanos-receive.thanos-receiver.svc.cluster.local:9091/api/v1/receive"
    

Deploy Thanos Queryier

  1. Deploy the thanos querier

     oc apply -n thanos-receiver -f thanos-querier.yaml
    

Deploy Grafana

  1. create the grafana operator in the thanos-receiver namespace

     oc apply -n thanos-receiver -f thanos-grafana-operator.yaml
    
  2. create grafana instance and datasource for thanos

    Change the password to something less default.

     oc -n thanos-receiver apply -f thanos-grafana.yaml
    
  3. load up cluster metrics dashboards

    Note: these were generated by the generate-dashboards.sh script.

     oc -n thanos-receiver apply -f dashboards.yaml
    
  4. get the Route URL for Grafana (remember its https) and login using username root and the password you updated to (or the default of secret).

     oc -n thanos-receiver get route grafana-route
    
  5. Once logged in go to Dashboards->Manage and expand the thanos-receiver group and you should see the cluster metrics dashboards. Click on the Use Method / Cluster Dashboard and you should see metrics. \o/.

screenshot of grafana with federated cluster metrics