Federating System and User metrics to Azure Files in Azure Red Hat OpenShift
Authors:
Paul Czarkowski,
Kumudu Herath
Last Editor:
Dustin Scott
Published Date:
4 June 2021
Modified Date: 25 May 2023
By default Azure Red Hat OpenShift (ARO) stores metrics in Ephemeral volumes, and its advised that users do not change this setting. However its not unreasonable to expect that metrics should be persisted for a set amount of time.
This guide shows how to set up Thanos to federate both System and User Workload Metrics to a Thanos gateway that stores the metrics in Azure Files and makes them available via a Grafana instance (managed by the Grafana Operator).
ToDo - Add Authorization in front of Thanos APIs
Pre-Prequsites
An ARO cluster
Set some environment variables to use throughout to suit your environment
Note: AZR_STORAGE_ACCOUNT_NAME must be unique
export AZR_RESOURCE_LOCATION=eastus export AZR_RESOURCE_GROUP=openshift export AZR_STORAGE_ACCOUNT_NAME=arofederatedmetrics export CLUSTER_NAME=openshift export NAMESPACE=aro-thanos-af
Azure Preperation
Create an Azure storage account
modify the arguments to suit your environment
az storage account create \ --name $AZR_STORAGE_ACCOUNT_NAME \ --resource-group $AZR_RESOURCE_GROUP \ --location $AZR_RESOURCE_LOCATION \ --sku Standard_RAGRS \ --kind StorageV2
Get the account key and update the secret in
thanos-store-credentials.yaml
AZR_STORAGE_KEY=$(az storage account keys list -g $AZR_RESOURCE_GROUP \ -n $AZR_STORAGE_ACCOUNT_NAME --query "[0].value" -o tsv)
Create a namespace to use
oc new-project $NAMESPACE
Add the MOBB chart repository to your Helm
helm repo add mobb https://rh-mobb.github.io/helm-charts/
Update your repositories
helm repo update
Use the
mobb/operatorhub
chart to deploy the grafana operatorhelm upgrade -n $NAMESPACE $NAMESPACE-operators \ mobb/operatorhub --version 0.1.1 --install \ --values https://raw.githubusercontent.com/rh-mobb/helm-charts/main/charts/aro-thanos-af/files/grafana-operator.yaml
Use the
mobb/operatorhub
chart to deploy the resource-locker operator> Note: Skip this if you already have the resource-locker operator installed, or if you do not plan to use User Workload Metrics
helm upgrade -n resource-locker-operator resource-locker-operator \ mobb/operatorhub --version 0.1.1 --create-namespace --install \ --values https://raw.githubusercontent.com/rh-mobb/helm-charts/main/charts/aro-thanos-af/files/resourcelocker-operator.yaml
Deploy ARO Thanos Azure Files Helm Chart (mobb/aro-thanos-af)
> Note:
enableUserWorkloadMetrics=true
will overwrite configs for cluster and userworkload metrics, remove it from the helm command below if you already have custom settings. The Addendum at the end of this doc will explain the changes you’ll need to make instead.helm upgrade -n $NAMESPACE aro-thanos-af \ --install mobb/aro-thanos-af --version 0.2.0 \ --set "aro.storageAccount=$AZR_STORAGE_ACCOUNT_NAME" \ --set "aro.storageAccountKey=$AZR_STORAGE_KEY" \ --set "aro.storageContainer=$CLUSTER_NAME" \ --set "enableUserWorkloadMetrics=true"
Validate Grafana is installed and seeing metrics from Azure Files
get the Route URL for Grafana (remember its https) and login using username
root
and the password you updated to (or the default ofsecret
).oc -n $NAMESPACE get route grafana-route
Once logged in go to Dashboards->Manage and expand the thanos-receiver group and you should see the cluster metrics dashboards. Click on the Use Method / Cluster Dashboard and you should see metrics. \o/.
Note: If it complains about a missing datasource run the following:
oc annotate -n $NAMESPACE grafanadatasource aro-thanos-af-prometheus "retry=1"
Cleanup
Uninstall the
aro-thanos-af
charthelm delete -n $NAMESPACE aro-thanos-af
Uninstall the
federated-metrics-operators
charthelm delete -n $NAMESPACE federated-metrics-operators
Delete the
aro-thanos-af
namespaceoc delete namespace $NAMESPACE
Delete the storage account
az storage account delete \ --name $AZR_STORAGE_ACCOUNT_NAME \ --resource-group $AZR_RESOURCE_GROUP
Addendum
Enabling User Workload Monitoring
See docs for more indepth details.
Check the cluster-monitoring-config ConfigMap object
oc -n openshift-monitoring get configmap cluster-monitoring-config -o yaml
Enable User Workload Monitoring by doing one of the following
If the
data.config.yaml
is not{}
you should edit it and add theenableUserWorkload: true
line manually.oc -n openshift-monitoring edit configmap cluster-monitoring-config
Otherwise if its
{}
then you can run the following command safely.oc patch configmap cluster-monitoring-config -n openshift-monitoring \ -p='{"data":{"config.yaml": "enableUserWorkload: true\n"}}'
Check that the User workload monitoring is starting up
oc -n openshift-user-workload-monitoring get pods
Append remoteWrite settings to the user-workload-monitoring config to forward user workload metrics to Thanos.
Check if the User Workload Config Map exists:
oc -n openshift-user-workload-monitoring get \ configmaps user-workload-monitoring-config
If the config doesn’t exist run:
cat « EOF | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheus: remoteWrite: - url: “http://thanos-receive.$NAMESPACE.svc.cluster.local:9091/api/v1/receive” EOF ```
**Otherwise update it with the following:**
```bash
oc -n openshift-user-workload-monitoring edit \
configmaps user-workload-monitoring-config
```
```yaml
data:
config.yaml: |
...
prometheus:
...
remoteWrite:
- url: "http://thanos-receive.thanos-receiver.svc.cluster.local:9091/api/v1/receive"
```