Skip to content

Commit d8e5507

Browse files
author
Gabriel Saratura
committed
ADR for billing system
1 parent 6afd93d commit d8e5507

File tree

2 files changed

+248
-1
lines changed

2 files changed

+248
-1
lines changed
Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
= ADR 0033 - Event-Based Billing
2+
:adr_author: Gabriel Saratura
3+
:adr_owner: Schedar
4+
:adr_reviewers:
5+
:adr_date: 2025-06-02
6+
:adr_upd_date: 2025-06-02
7+
:adr_status: draft
8+
:adr_tags: billing,odoo
9+
10+
include::partial$adr-meta.adoc[]
11+
12+
[NOTE]
13+
.Summary
14+
====
15+
For sending billing events to Odoo (https://docs.central.vshn.ch/event-billing.html[Event Based Billing^]), will use the existing AppCat controller to manage billing events and Kubernetes CRDs as the persistence mechanism.
16+
====
17+
18+
== Requirements
19+
20+
. Reuse existing tools and/or components of AppCat to build a resilient event-based billing solution.
21+
. Enable resending data to Odoo (historical record retention is required).
22+
. Persist the state of each billing request.
23+
. Do not block service creation, deletion, or update due to errors in the billing system.
24+
. [[odoo-sync-state]] Provide a way to verify sync state between Odoo and the Kubernetes cluster.
25+
. Do not lose any event (created, deleted, scaled), as this has a direct financial impact.
26+
27+
28+
== Solution Options
29+
30+
After careful evaluation, the two most promising solutions for implementing event-based billing are:
31+
32+
. <<AppCat Controller>>
33+
. <<Runtime Library via Crossplane Composition Functions>>
34+
35+
=== AppCat Controller
36+
37+
Transitioning from metered to event-based billing requires leveraging Kubernetes controllers more extensively.
38+
Our existing AppCat controller already handles event forwarding and webhooks, making it a natural candidate for integrating billing logic.
39+
40+
*Pros:*
41+
42+
* Full control over the lifecycle of VSHN custom resource services.
43+
* Customizable retry logic.
44+
* Clear separation of concerns between billing and service management.
45+
* Flexible support for different persistence backends.
46+
47+
*Cons:*
48+
49+
* More difficult to block or revert service operations when billing events fail to post to Odoo.
50+
51+
=== Runtime Library via Crossplane Composition Functions
52+
53+
We can embed billing logic into our existing runtime library for Crossplane Composition Functions, thereby coupling service lifecycle events directly with billing logic.
54+
55+
*Pros:*
56+
57+
* Reconciliation happens directly during create, update, or delete operations.
58+
* Greater control over services when billing fails.
59+
* Billing logic is treated as a first-class part of provisioning/deprovisioning.
60+
61+
*Cons:*
62+
63+
* Persistence integration becomes more complex.
64+
* No clear separation between billing and service logic.
65+
* Unclear where billing actually occurs, which can reduce maintainability.
66+
* We can't actually react on deletion events. Crossplane doesn't propagate them to the functions.
67+
68+
== Persistence Options
69+
70+
Based on past experience, we anticipate the need to resend older data to Odoo due to potential issues either in AppCat or Odoo.
71+
A lightweight, reliable mechanism to store and replay billing data is essential.
72+
73+
Required capabilities:
74+
75+
* Filtering and querying historical events.
76+
* Manual replay.
77+
* Partial delivery of historical events.
78+
* Operational simplicity
79+
80+
=== SQLite
81+
82+
SQLite is a simple, embedded SQL database engine suitable for local persistence needs.
83+
84+
*Pros:*
85+
86+
* Minimal setup; no external infrastructure required.
87+
* Fast for local and sequential read/write operations.
88+
* Full SQL support.
89+
* ACID-compliant (supports WAL mode).
90+
* Self-contained `.db` file that's easy to handle and back up.
91+
* Supports pagination and filtering by retry state or timestamp.
92+
93+
*Cons:*
94+
95+
* Not suitable for concurrent writes across multiple pods.
96+
* Requires manual effort for backups, failover, and compaction.
97+
* Not distributed or highly available.
98+
* Lacks integration with Kubernetes tools like `kubectl`.
99+
* Not inherently event-driven.
100+
101+
=== Custom Kubernetes CRDs
102+
103+
Custom Resource Definitions (CRDs) can be used to model billing events as native Kubernetes objects.
104+
105+
*Pros:*
106+
107+
* Native integration with Kubernetes and observable via `kubectl`.
108+
* Supports event-driven architectures through controllers.
109+
* State tracking via `.status` fields.
110+
* Reusable by other tools/controllers within the cluster.
111+
* Scales horizontally (no single-writer limitation).
112+
113+
*Cons:*
114+
115+
* Excessive CR volume may cause etcd bloat, impacting cluster performance.
116+
* Increased API server traffic.
117+
* Requires boilerplate for CRD definitions and status handling.
118+
* No native support for complex queries (unlike SQL).
119+
* Manual schema migration is necessary.
120+
* No built-in audit trail beyond resource versioning.
121+
122+
== Decision
123+
124+
=== Use Controller + Custom Kubernetes CRD
125+
126+
The recommendation is to extend the existing AppCat controller to manage event-based billing and using Kubernetes CRDs as the persistence mechanism.
127+
128+
*Justification:*
129+
130+
* A controller is the natural place for billing, as it sits adjacent to service lifecycle management without coupling to it.
131+
* CRDs integrate well into our Kubernetes-native toolset and align with GitOps principles.
132+
* Data inspection and interaction via `kubectl` is simple and consistent with existing workflows.
133+
* While CRs are harder to query than SQL databases, we can mitigate this by providing predefined `kubectl` query templates for common tasks.
134+
* Kubernetes retry mechanisms can be leveraged for automatic re-delivery of failed events.
135+
* By using `patch` operations on CRs, we can flag specific events for manual resending to Odoo.
136+
* With careful CRD schema design (example: using one CR per service instead of one per event), we can avoid overwhelming etcd.
137+
* If detailed auditing is needed, it can be delegated to an external logging or database system.
138+
139+
This hybrid approach gives us robust control, observability, and operational flexibility for event-based billing with minimal compromise.
140+
141+
=== Billing Custom Resource (CR)
142+
143+
Each Billing Custom Resource (CR) describes a single service instance and its full lifecycle - from creation to deletion.
144+
145+
It consists of two main sections:
146+
147+
1. **Static data** - Defined under `.spec.odoo`. These values remain constant throughout the service's lifecycle.
148+
2. **Dynamic data** - Defined under `.status.events`. This section evolves over time, reflecting lifecycle changes such as scaling actions or SLA updates.
149+
150+
All lifecycle events (e.g., creation, scaling, deletion) are recorded within the same resource, enabling full event history reconstruction.
151+
This also allows operations such as **resending** events via annotations.
152+
153+
The `.status.events` array must be ordered in **descending** order by `timestamp`, with the most recent event listed first.
154+
155+
Event resending is supported automatically and includes an **exponential backoff retry mechanism**.
156+
157+
All CRs will be created within a single, dedicated namespace.
158+
159+
This design provides better isolation and aligns with the Kubernetes and Crossplane direction of deprecating cluster-scoped resources.
160+
Scoping CRs to a namespace offers several advantages:
161+
162+
* Enables referencing other namespaced resources like ConfigMaps, if required in the future.
163+
* Simplifies access control and resource lifecycle management.
164+
* Keeps CRs co-located with their controller, which also runs in the same namespace.
165+
166+
Centralizing CRs in one namespace enhances organization, improves security, and promotes operational simplicity.
167+
168+
A resource is considered `Synced` only when **all** `.status.events[].state` values are `sent`.
169+
170+
There is currently no need to limit the number of stored events, as the expected volume per CR is low and manageable.
171+
172+
==== Billing CR Example
173+
174+
[source,yaml]
175+
----
176+
apiVersion: appcat.vshn.io/v1
177+
kind: BillingService
178+
metadata:
179+
annotations:
180+
appcat.vshn.io/resend: "all|not-sent|failed" #<1>
181+
name: <instance-xrd> #<2>
182+
namespace: syn-appcat #<3>
183+
finalizers:
184+
- delete-protection #<4>
185+
spec:
186+
keepAfterDeletion: 365 #<5>
187+
odoo: #<6>
188+
instanceID: "a"
189+
salesOrderID: "SO0042"
190+
itemDescription: "Human readable description"
191+
itemGroupDescription: "My Item Group"
192+
unitID: "vshn_event_billing.uom_instance_hour"
193+
status:
194+
events: #<7>
195+
- type: "deleted"
196+
productId: "Y"
197+
size: "3"
198+
timestamp: "2025-06-20T13:00:00Z"
199+
state: "sent|pending|failed" #<8>
200+
- type: "scaled"
201+
productId: "Y"
202+
size: "3"
203+
timestamp: "2025-05-20T13:00:00Z"
204+
state: "sent|pending|failed"
205+
- type: "scaled"
206+
productId: "Y"
207+
size: "2"
208+
timestamp: "2025-04-20T13:00:00Z"
209+
state: "sent|pending|failed"
210+
- type: "created"
211+
productId: "X"
212+
size: "1"
213+
timestamp: "2025-03-20T13:00:00Z"
214+
state: "sent|pending|failed"
215+
conditions:
216+
- lastTransitionTime: "2024-05-25T15:35:02Z"
217+
reason: ReconcileSuccess
218+
status: "True"
219+
type: Synced
220+
- lastTransitionTime: "2023-05-25T18:45:38Z"
221+
reason: Available
222+
status: "True"
223+
type: Ready
224+
----
225+
226+
<1> An on-demand trigger to resend events from the `status.events` list based on their `state`.
227+
<2> Unique name of the composite - serves as the identifier for the Billing CR.
228+
<3> All Billing CRs reside in the `syn-appcat` - framework's management namespace.
229+
<4> A finalizer from the controller to protect from accidental deletion.
230+
<5> The field defines after how many days the CR should be deleted after the service is removed.
231+
<6> The `spec.odoo` section contains static metadata, consistent across all events.
232+
<7> The `status.events` array holds dynamic billing event fields, typically following lifecycle changes.
233+
<8> The `state` field tracks event delivery status to Odoo: `sent`, `pending`, or `failed`.
234+
235+
[NOTE]
236+
====
237+
For a complete reference of all fields in this CR, see the https://docs.central.vshn.ch/event-billing-ingestion.html[Odoo documentation].
238+
====
239+
240+
[NOTE]
241+
.xref:odoo-sync-state[Odoo Sync State]
242+
====
243+
Odoo currently provides REST API endpoints that can be used to check sync status between Billing CRs and Odoo.
244+
245+
This will be addressed in a future iteration of the AppCat Billing System.
246+
====

docs/modules/ROOT/partials/nav-adrs.adoc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,4 +29,5 @@
2929
** xref:adr/0029-converged-service-provisioning-implementation.adoc[]
3030
** xref:adr/0030-function-revisions.adoc[]
3131
** xref:adr/0031-naming-scheme-for-servala-cluster-names-and-urls.adoc[]
32-
** xref:adr/0032-ci-pipeline.adoc[]
32+
** xref:adr/0032-ci-pipeline.adoc[]
33+
** xref:adr/0033-event-based-billing-oddo.adoc[]

0 commit comments

Comments
 (0)