Skip to content
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
8032187
wip: metrics
eoghanlawless Apr 9, 2025
ab50cdc
feat: /v2/metrics
eoghanlawless Apr 10, 2025
fe37e01
wip: add initial metrics
eoghanlawless Apr 11, 2025
79ef054
feat: implement response counter
eoghanlawless Apr 24, 2025
152e70a
merge: main
eoghanlawless Apr 24, 2025
e20ab67
test: fix unit tests
eoghanlawless Apr 24, 2025
48299be
chore: generate api
eoghanlawless Apr 24, 2025
c380aaa
test: update to workflow
eoghanlawless Apr 24, 2025
976c0a7
version: 2.1.0
eoghanlawless Apr 24, 2025
c98eb07
chore: update workflow
eoghanlawless Apr 24, 2025
ea33af8
merge: main
eoghanlawless Apr 28, 2025
fd738d9
revert: local testing change
eoghanlawless Apr 28, 2025
b66ded5
feat: implement metrics service
eoghanlawless Apr 28, 2025
21a532d
feat: add cluster-manager monitor
eoghanlawless Apr 29, 2025
29311d3
merge: main
eoghanlawless Apr 29, 2025
bd56bec
version: 2.1.1
eoghanlawless Apr 29, 2025
23e5f2a
chore: remove test changes
eoghanlawless Apr 29, 2025
d96b803
version: api 2.1.0
eoghanlawless Apr 29, 2025
5aabbf1
version: 2.1.0
eoghanlawless Apr 29, 2025
38e324a
chore: disable metrics by default
eoghanlawless Apr 29, 2025
e47e433
chore: /v2/metrics -> /metrics
eoghanlawless Apr 29, 2025
c99c801
chore: remove verbose debug log
eoghanlawless Apr 29, 2025
a705c2e
chore: address comments
eoghanlawless Apr 29, 2025
40e5ab5
Merge branch 'main' into metrics
jokuniew Apr 30, 2025
25dad75
chore: address comments
eoghanlawless Apr 30, 2025
5743c20
chore: remove unused var
eoghanlawless Apr 30, 2025
d4253e5
test: add metrics package unit tests
eoghanlawless Apr 30, 2025
bba55cf
fix: remove duplicate port
eoghanlawless Apr 30, 2025
12dcd67
fix: servicemonitor ports
eoghanlawless Apr 30, 2025
cf1c26b
chore: specify rest port protocol
eoghanlawless May 1, 2025
309d629
chore: specify target port and protocol
eoghanlawless May 1, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/validate-openapi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ jobs:
run: |
if [[ `git status --porcelain` ]]; then
echo "### Error: Changes detected after running make generate-api"
echo "### git status --porcelain"
git status --porcelain
echo "### git diff"
git diff
echo "### Error: Changes detected after running make generate-api" >> $GITHUB_STEP_SUMMARY
exit 1
Expand Down
14 changes: 7 additions & 7 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -290,8 +290,8 @@ helm-clean: ## Clean helm chart build annotations.

.PHONY: helm-test
helm-test: ## Template the charts.
for d in $(HELM_DIRS); do \
helm template intel $$d; \
@for d in $(HELM_DIRS); do \
helm --debug template --namespace orch-cluster intel $$d; \
done

.PHONY: helm-build
Expand Down Expand Up @@ -504,9 +504,9 @@ redeploy: docker-build docker-load ## Redeploy the pod with the latest codes.
generate-api: check-oapi-codegen-version ## Generate Go client, server, client and types from OpenAPI spec with oapi-codegen
@echo "Generating..."
oapi-codegen -generate spec -o pkg/api/spec.gen.go -package api api/openapi/openapi.yaml
oapi-codegen -generate client -o pkg/api/client.gen.go -package api api/openapi/openapi.yaml
oapi-codegen -generate types -o pkg/api/types.gen.go -package api api/openapi/openapi.yaml
oapi-codegen -generate std-http,strict-server -o pkg/api/server.gen.go -package api api/openapi/openapi.yaml
oapi-codegen -generate client -o pkg/api/client.gen.go -exclude-tags metrics -package api api/openapi/openapi.yaml
oapi-codegen -generate types -o pkg/api/types.gen.go -exclude-tags metrics -package api api/openapi/openapi.yaml
oapi-codegen -generate std-http,strict-server -exclude-tags metrics -o pkg/api/server.gen.go -package api api/openapi/openapi.yaml

.PHONY: check-oapi-codegen-version
check-oapi-codegen-version: ## Check oapi-codegen version
Expand Down Expand Up @@ -538,8 +538,8 @@ dev-image: ## Build dev image and push to sandbox
-f deployment/images/Dockerfile.cluster-manager
${DOCKER_ENV} docker push ${DOCKER_DEV_IMG}

.PHONY: dev-helm # Build dev helm chart and push to sandbox
dev-helm: ## Build dev helm chart and push to sandbox
.PHONY: dev-chart # Build dev helm chart and push to sandbox
dev-chart: ## Build dev helm chart and push to sandbox
@if test -z $(DEV_TAG); \
then echo "Please specify dev tag, make dev DEV_TAG=<dev-tag> " && exit 1; \
fi
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.1.0-dev
2.1.0
55 changes: 37 additions & 18 deletions api/openapi/openapi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ openapi: 3.0.3
info:
title: Cluster Manager 2.0
description: This document defines the schema for the Cluster Manager 2.0 REST API.
version: 2.0.0
version: 2.1.0

security:
- HTTP: []
Expand Down Expand Up @@ -143,7 +143,6 @@ paths:
"500":
$ref: '#/components/responses/500-InternalServerError'


/v2/clusters/{name}:
parameters:
- $ref: '#/components/parameters/ActiveProjectIdHeader'
Expand Down Expand Up @@ -186,6 +185,7 @@ paths:
$ref: '#/components/responses/404-NotFound'
"500":
$ref: '#/components/responses/500-InternalServerError'

/v2/clusters/{nodeId}/clusterdetail:
parameters:
- $ref: '#/components/parameters/ActiveProjectIdHeader'
Expand All @@ -211,6 +211,7 @@ paths:
$ref: '#/components/responses/400-BadRequest'
"404":
$ref: '#/components/responses/404-NotFound'

/v2/clusters/{name}/nodes:
parameters:
- $ref: '#/components/parameters/ActiveProjectIdHeader'
Expand Down Expand Up @@ -387,21 +388,6 @@ paths:
$ref: '#/components/responses/404-NotFound'
"500":
$ref: '#/components/responses/500-InternalServerError'
/v2/healthz:
get:
description: Gets the Cluster Manager REST API healthz status.
security: [] # skips authentication
tags:
- Health Check
responses:
"200":
description: OK
content:
application/json:
schema:
type: string
"500":
$ref: '#/components/responses/500-InternalServerError'

/v2/templates:
parameters:
Expand Down Expand Up @@ -478,7 +464,6 @@ paths:
$ref: '#/components/responses/404-NotFound'
"500":
$ref: '#/components/responses/500-InternalServerError'

post:
description: Import templates
tags:
Expand Down Expand Up @@ -553,6 +538,7 @@ paths:
$ref: '#/components/responses/404-NotFound'
"500":
$ref: '#/components/responses/500-InternalServerError'

/v2/templates/{name}/versions:
parameters:
- $ref: '#/components/parameters/ActiveProjectIdHeader'
Expand Down Expand Up @@ -583,6 +569,7 @@ paths:
$ref: '#/components/responses/404-NotFound'
"500":
$ref: '#/components/responses/500-InternalServerError'

/v2/templates/{name}/default:
parameters:
- $ref: '#/components/parameters/ActiveProjectIdHeader'
Expand Down Expand Up @@ -615,6 +602,38 @@ paths:
"500":
$ref: '#/components/responses/500-InternalServerError'

/v2/healthz:
get:
description: Gets the Cluster Manager REST API healthz status.
security: [] # skips authentication
tags:
- Health Check
responses:
"200":
description: OK
content:
application/json:
schema:
type: string
"500":
$ref: '#/components/responses/500-InternalServerError'

/v2/metrics:
get:
description: Gets the Cluster Manager REST API prometheus metrics.
security: [] # skips authentication
tags:
- metrics
responses:
"200":
description: OK
content:
application/json:
schema:
type: string
"500":
$ref: '#/components/responses/500-InternalServerError'

components:
securitySchemes:
HTTP:
Expand Down
4 changes: 2 additions & 2 deletions deployment/charts/cluster-manager/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,6 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 2.1.0-dev
appVersion: 2.1.0-dev
version: 2.1.0
appVersion: 2.1.0
annotations: {}
18 changes: 0 additions & 18 deletions deployment/charts/cluster-manager/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -63,21 +63,3 @@ Create the name of the service account to use
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}

{{/*
Metrics service labels
*/}}
{{- define "templateController.metricsServiceLabels" -}}
{{- with .Values.templateController.metrics.service.labels }}
{{ toYaml . }}
{{- end }}
{{- end }}

{{/*
Service monitor labels
*/}}
{{- define "templateController.serviceMonitorLabels" -}}
{{- with .Values.templateController.metrics.serviceMonitor.labels }}
{{- toYaml . }}
{{- end }}
{{- end }}
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,20 @@ spec:
{{- if .Values.clusterManager.args.inventory }}
- '-inventory-endpoint={{ .Values.clusterManager.args.inventory }}'
{{- end }}
{{- if .Values.metrics.enabled }}
- --metrics-port=:{{ .Values.metrics.service.port }}
{{- end }}
{{- range $key, $value := .Values.clusterManager.extraArgs }}
- -{{ $key }}={{ $value }}
{{- end }}
ports:
- containerPort: {{ .Values.clusterManager.service.rest.port }}
- name: rest
containerPort: {{ .Values.clusterManager.service.rest.port }}
{{if .Values.metrics.enabled }}
- name: metrics
containerPort: {{ .Values.metrics.service.port }}
protocol: TCP
{{- end }}
readinessProbe:
httpGet:
path: {{ .Values.clusterManager.readinessProbe.httpGet.path }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ spec:
- --leader-elect
- --health-probe-bind-address=:8081
- --webhook-cert-path=/tmp/k8s-webhook-server/serving-certs
{{- if .Values.templateController.metrics.service.enabled }}
- --metrics-bind-address=:{{ .Values.templateController.metrics.service.port }}
{{- if .Values.metrics.enabled }}
- --metrics-bind-address=:{{ .Values.metrics.service.port }}
- --metrics-secure=false
{{- end }}
{{- with .Values.templateController.extraArgs }}
Expand All @@ -63,8 +63,8 @@ spec:
{{- toYaml . | nindent 8 }}
{{- end }}
ports:
{{if .Values.templateController.metrics.service.enabled }}
- containerPort: {{ .Values.templateController.metrics.service.port }}
{{if .Values.metrics.enabled }}
- containerPort: {{ .Values.metrics.service.port }}
name: metrics
protocol: TCP
{{- end }}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# SPDX-FileCopyrightText: (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

{{ if .Values.metrics.enabled -}}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: cluster-manager-metrics
namespace: {{ .Release.Namespace }}
labels:
{{- toYaml .Values.metrics.serviceMonitor.labels | indent 4 }}
spec:
endpoints:
- path: /metrics
port: metrics
scheme: http
- path: /v2/metrics
port: metrics
scheme: http
namespaceSelector:
matchNames:
- {{ .Release.Namespace }}
selector:
matchExpressions:
- key: prometheus.io/service-monitor
operator: NotIn
values:
- "false"
- key: app
operator: In
values:
- {{ .Values.metrics.service.labels.clusterManager.app | quote }}
- {{ .Values.metrics.service.labels.templateController.app | quote }}
---
apiVersion: v1
kind: Service
metadata:
name: cluster-metrics
namespace: {{ .Release.Namespace }}
labels:
{{- toYaml .Values.metrics.service.labels.clusterManager | nindent 4 }}
spec:
ports:
- name: metrics
protocol: TCP
port: {{ .Values.metrics.service.port }}
targetPort: {{ .Values.metrics.service.port }}
selector:
app: "{{.Chart.Name}}-cm"
---
apiVersion: v1
kind: Service
metadata:
name: templates-metrics
namespace: {{ .Release.Namespace }}
labels:
{{- toYaml .Values.metrics.service.labels.templateController | nindent 4 }}
spec:
ports:
- name: metrics
protocol: TCP
port: {{ .Values.metrics.service.port }}
targetPort: {{ .Values.metrics.service.port }}
selector:
app: "{{.Chart.Name}}-controller"
{{- end -}}

This file was deleted.

13 changes: 7 additions & 6 deletions deployment/charts/cluster-manager/templates/service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,34 +4,35 @@
apiVersion: v1
kind: Service
metadata:
name: {{template "cluster-manager.fullname" .}}
name: {{ include "cluster-manager.fullname" . }}
namespace: {{.Release.Namespace}}
labels:
app: "{{.Chart.Name}}-cm"
spec:
selector:
app: "{{.Chart.Name}}-cm"
type: {{.Values.clusterManager.service.type}}
type: {{ .Values.clusterManager.service.type }}
ports:
- name: "rest"
port: {{.Values.clusterManager.service.rest.port}}
- name: rest
port: {{ .Values.clusterManager.service.rest.port }}

{{- if .Values.openpolicyagent.enabled -}}
{{- if .Values.service.opa.enabled }}
---
apiVersion: v1

Check notice

Code scanning / Trivy

limit range usage Low

Artifact: deployment/charts/cluster-manager/templates/service.yaml
Type: helm
Vulnerability KSV039
Severity: LOW
Message: limit range policy with a default request and limit, min and max request, for each container should be configure
Link: KSV039

Check notice

Code scanning / Trivy

resource quota usage Low

Artifact: deployment/charts/cluster-manager/templates/service.yaml
Type: helm
Vulnerability KSV040
Severity: LOW
Message: resource quota policy with hard memory and cpu quota per namespace should be configure
Link: KSV040
kind: Service
metadata:
name: {{ include "cluster-manager.fullname" . }}-opa
namespace: {{.Release.Namespace}}
labels:
{{- include "cluster-manager.labels" . | nindent 4 }}
spec:
type: {{ .Values.service.opa.type }}
ports:
- port: {{ .Values.service.opa.port }}
targetPort: opa
targetPort: {{ .Values.service.opa.port }}
protocol: TCP
name: http-opa
name: opa
selector:
{{- include "cluster-manager.selectorLabels" . | nindent 4 }}
{{- end}}
Expand Down

This file was deleted.

Loading
Loading