Federating System and User metrics to S3 in Red Hat OpenShift for AWS
This content is authored by Red Hat experts, but has not yet been tested on every supported configuration.
This guide walks through setting up federating Prometheus metrics to S3 storage.
ToDo - Add Authorization in front of Thanos APIs
Prerequisites
Set up environment
Create environment variables
export CLUSTER_NAME=my-cluster export S3_BUCKET=my-thanos-bucket export REGION=us-east-2 export NAMESPACE=federated-metrics export SA=aws-prometheus-proxy export SCRATCH_DIR=/tmp/scratch export OIDC_PROVIDER=$(oc get authentication.config.openshift.io cluster -o json | jq -r .spec.serviceAccountIssuer| sed -e "s/^https:\/\///") export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) export AWS_PAGER="" rm -rf $SCRATCH_DIR mkdir -p $SCRATCH_DIR
CopyCreate namespace
oc new-project $NAMESPACE
Copy
AWS Preperation
Create an S3 bucket
aws s3 mb --region $REGION s3://$S3_BUCKET
CopyCreate a Policy for access to S3
cat <<EOF > $SCRATCH_DIR/s3-policy.json { "Version": "2012-10-17", "Statement": [ { "Sid": "Statement", "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetObject", "s3:DeleteObject", "s3:PutObject", "s3:PutObjectAcl" ], "Resource": [ "arn:aws:s3:::$S3_BUCKET/*", "arn:aws:s3:::$S3_BUCKET" ] } ] } EOF
CopyApply the Policy
S3_POLICY=$(aws iam create-policy --policy-name $CLUSTER_NAME-thanos \ --policy-document file://$SCRATCH_DIR/s3-policy.json \ --query 'Policy.Arn' --output text) echo $S3_POLICY
CopyCreate a Trust Policy
cat <<EOF > $SCRATCH_DIR/TrustPolicy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": [ "system:serviceaccount:${NAMESPACE}:${SA}" ] } } } ] } EOF
CopyCreate Role for AWS Prometheus and CloudWatch
S3_ROLE=$(aws iam create-role \ --role-name "$CLUSTER_NAME-thanos-s3" \ --assume-role-policy-document file://$SCRATCH_DIR/TrustPolicy.json \ --query "Role.Arn" --output text) echo $S3_ROLE
CopyAttach the Policies to the Role
aws iam attach-role-policy \ --role-name "$CLUSTER_NAME-thanos-s3" \ --policy-arn $S3_POLICY
Copy
Deploy Operators
Add the MOBB chart repository to your Helm
helm repo add mobb https://rh-mobb.github.io/helm-charts/
CopyUpdate your repositories
helm repo update
CopyUse the
mobb/operatorhub
chart to deploy the needed operatorshelm upgrade -n $NAMESPACE custom-metrics-operators \ mobb/operatorhub --install \ --values https://raw.githubusercontent.com/rh-mobb/helm-charts/main/charts/rosa-thanos-s3/files/operatorhub.yaml
Copy
Deploy Thanos Store Gateway
We use Grafana Alloy to scrape the prometheus metrics and ship them to Thanos, which will then store them in S3. Currently Grafana Alloy requires running as a specific user so we must set a SecurityContextConstraint to allow it.
oc adm policy add-scc-to-user anyuid -z rosa-thanos-s3-alloy
CopyDeploy ROSA Thanos S3 Helm Chart
helm upgrade -n $NAMESPACE rosa-thanos-s3 --install mobb/rosa-thanos-s3 \ --set "aws.roleArn=$S3_ROLE" \ --set "rosa.clusterName=$CLUSTER_NAME" \ --set "aws.region=$REGION" \ --set "aws.bucket=$S3_BUCKET"
CopyAppend remoteWrite settings to the user-workload-monitoring config to forward user workload metrics to Thanos.
Check if the User Workload Config Map exists:
oc -n openshift-user-workload-monitoring get \ configmaps user-workload-monitoring-config
CopyIf the config doesn’t exist run:
cat << EOF | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheus: remoteWrite: - url: "http://thanos-receive.${NAMESPACE}.svc.cluster.local:9091/api/v1/receive" EOF
CopyOtherwise update it with the following:
oc -n openshift-user-workload-monitoring edit \ configmaps user-workload-monitoring-config
Copydata: config.yaml: | ... prometheus: ... remoteWrite: - url: "http://thanos-receive.thanos-receiver.svc.cluster.local:9091/api/v1/receive"
Copy
Check metrics are flowing by logging into Grafana
Get the Route URL for Grafana (remember its https) and login using username
root
and the password you updated to (or the default ofsecret
).oc -n $NAMESPACE get route rosa-thanos-s3-grafana-cr-route
CopyOnce logged in go to Dashboards->Manage and expand the federated-metrics group and you should see the cluster metrics dashboards. Click on the Use Method / Cluster Dashboard and you should see metrics. \o/.
Cleanup
Delete the Helm Charts
helm delete -n $NAMESPACE rosa-thanos-s3 helm delete -n $NAMESPACE custom-metrics-operators
CopyDelete the namespace
oc delete project $NAMESPACE
CopyDelete the S3 bucket
aws s3 rb --force s3://$S3_BUCKET
CopyDelete the AWS IAM Role and Policy
aws iam detach-role-policy \ --role-name "$CLUSTER_NAME-thanos-s3" \ --policy-arn $S3_POLICY aws iam delete-role --role-name "$CLUSTER_NAME-thanos-s3" aws iam delete-policy --policy-arn $S3_POLICY
Copy