|
| 1 | += ADR 0033 - Event-Based Billing |
| 2 | +:adr_author: Gabriel Saratura |
| 3 | +:adr_owner: Schedar |
| 4 | +:adr_reviewers: |
| 5 | +:adr_date: 2025-06-02 |
| 6 | +:adr_upd_date: 2025-06-02 |
| 7 | +:adr_status: draft |
| 8 | +:adr_tags: billing,odoo |
| 9 | + |
| 10 | +include::partial$adr-meta.adoc[] |
| 11 | + |
| 12 | +[NOTE] |
| 13 | +.Summary |
| 14 | +==== |
| 15 | +For sending billing events to Odoo (https://docs.central.vshn.ch/event-billing.html[Event Based Billing^]), will use the existing AppCat controller to manage billing events and Kubernetes CRDs as the persistence mechanism. |
| 16 | +==== |
| 17 | + |
| 18 | +== Requirements |
| 19 | + |
| 20 | +. Reuse existing tools and/or components of AppCat to build a resilient event-based billing solution. |
| 21 | +. Enable resending data to Odoo (historical record retention is required). |
| 22 | +. Persist the state of each billing request. |
| 23 | +. Do not block service creation, deletion, or update due to errors in the billing system. |
| 24 | +. [[odoo-sync-state]] Provide a way to verify sync state between Odoo and the Kubernetes cluster. |
| 25 | +. Do not lose any event (created, deleted, scaled), as this has a direct financial impact. |
| 26 | + |
| 27 | + |
| 28 | +== Solution Options |
| 29 | + |
| 30 | +After careful evaluation, the two most promising solutions for implementing event-based billing are: |
| 31 | + |
| 32 | +. <<AppCat Controller>> |
| 33 | +. <<Runtime Library via Crossplane Composition Functions>> |
| 34 | + |
| 35 | +=== AppCat Controller |
| 36 | + |
| 37 | +Transitioning from metered to event-based billing requires leveraging Kubernetes controllers more extensively. |
| 38 | +Our existing AppCat controller already handles event forwarding and webhooks, making it a natural candidate for integrating billing logic. |
| 39 | + |
| 40 | +*Pros:* |
| 41 | + |
| 42 | +* Full control over the lifecycle of VSHN custom resource services. |
| 43 | +* Customizable retry logic. |
| 44 | +* Clear separation of concerns between billing and service management. |
| 45 | +* Flexible support for different persistence backends. |
| 46 | + |
| 47 | +*Cons:* |
| 48 | + |
| 49 | +* More difficult to block or revert service operations when billing events fail to post to Odoo. |
| 50 | + |
| 51 | +=== Runtime Library via Crossplane Composition Functions |
| 52 | + |
| 53 | +We can embed billing logic into our existing runtime library for Crossplane Composition Functions, thereby coupling service lifecycle events directly with billing logic. |
| 54 | + |
| 55 | +*Pros:* |
| 56 | + |
| 57 | +* Reconciliation happens directly during create, update, or delete operations. |
| 58 | +* Greater control over services when billing fails. |
| 59 | +* Billing logic is treated as a first-class part of provisioning/deprovisioning. |
| 60 | + |
| 61 | +*Cons:* |
| 62 | + |
| 63 | +* Persistence integration becomes more complex. |
| 64 | +* No clear separation between billing and service logic. |
| 65 | +* Unclear where billing actually occurs, which can reduce maintainability. |
| 66 | +* We can't actually react on deletion events. Crossplane doesn't propagate them to the functions. |
| 67 | + |
| 68 | +== Persistence Options |
| 69 | + |
| 70 | +Based on past experience, we anticipate the need to resend older data to Odoo due to potential issues either in AppCat or Odoo. |
| 71 | +A lightweight, reliable mechanism to store and replay billing data is essential. |
| 72 | + |
| 73 | +Required capabilities: |
| 74 | + |
| 75 | +* Filtering and querying historical events. |
| 76 | +* Manual replay. |
| 77 | +* Partial delivery of historical events. |
| 78 | +* Operational simplicity |
| 79 | + |
| 80 | +=== SQLite |
| 81 | + |
| 82 | +SQLite is a simple, embedded SQL database engine suitable for local persistence needs. |
| 83 | + |
| 84 | +*Pros:* |
| 85 | + |
| 86 | +* Minimal setup; no external infrastructure required. |
| 87 | +* Fast for local and sequential read/write operations. |
| 88 | +* Full SQL support. |
| 89 | +* ACID-compliant (supports WAL mode). |
| 90 | +* Self-contained `.db` file that's easy to handle and back up. |
| 91 | +* Supports pagination and filtering by retry state or timestamp. |
| 92 | + |
| 93 | +*Cons:* |
| 94 | + |
| 95 | +* Not suitable for concurrent writes across multiple pods. |
| 96 | +* Requires manual effort for backups, failover, and compaction. |
| 97 | +* Not distributed or highly available. |
| 98 | +* Lacks integration with Kubernetes tools like `kubectl`. |
| 99 | +* Not inherently event-driven. |
| 100 | + |
| 101 | +=== Custom Kubernetes CRDs |
| 102 | + |
| 103 | +Custom Resource Definitions (CRDs) can be used to model billing events as native Kubernetes objects. |
| 104 | + |
| 105 | +*Pros:* |
| 106 | + |
| 107 | +* Native integration with Kubernetes and observable via `kubectl`. |
| 108 | +* Supports event-driven architectures through controllers. |
| 109 | +* State tracking via `.status` fields. |
| 110 | +* Reusable by other tools/controllers within the cluster. |
| 111 | +* Scales horizontally (no single-writer limitation). |
| 112 | + |
| 113 | +*Cons:* |
| 114 | + |
| 115 | +* Excessive CR volume may cause etcd bloat, impacting cluster performance. |
| 116 | +* Increased API server traffic. |
| 117 | +* Requires boilerplate for CRD definitions and status handling. |
| 118 | +* No native support for complex queries (unlike SQL). |
| 119 | +* Manual schema migration is necessary. |
| 120 | +* No built-in audit trail beyond resource versioning. |
| 121 | + |
| 122 | +== Decision |
| 123 | + |
| 124 | +=== Use Controller + Custom Kubernetes CRD |
| 125 | + |
| 126 | +The recommendation is to extend the existing AppCat controller to manage event-based billing and using Kubernetes CRDs as the persistence mechanism. |
| 127 | + |
| 128 | +*Justification:* |
| 129 | + |
| 130 | +* A controller is the natural place for billing, as it sits adjacent to service lifecycle management without coupling to it. |
| 131 | +* CRDs integrate well into our Kubernetes-native toolset and align with GitOps principles. |
| 132 | +* Data inspection and interaction via `kubectl` is simple and consistent with existing workflows. |
| 133 | +* While CRs are harder to query than SQL databases, we can mitigate this by providing predefined `kubectl` query templates for common tasks. |
| 134 | +* Kubernetes retry mechanisms can be leveraged for automatic re-delivery of failed events. |
| 135 | +* By using `patch` operations on CRs, we can flag specific events for manual resending to Odoo. |
| 136 | +* With careful CRD schema design (example: using one CR per service instead of one per event), we can avoid overwhelming etcd. |
| 137 | +* If detailed auditing is needed, it can be delegated to an external logging or database system. |
| 138 | + |
| 139 | +This hybrid approach gives us robust control, observability, and operational flexibility for event-based billing with minimal compromise. |
| 140 | + |
| 141 | +=== Billing Custom Resource (CR) |
| 142 | + |
| 143 | +Each Billing Custom Resource (CR) describes a single service instance and its full lifecycle - from creation to deletion. |
| 144 | + |
| 145 | +It consists of two main sections: |
| 146 | + |
| 147 | +1. **Static data** - Defined under `.spec.odoo`. These values remain constant throughout the service's lifecycle. |
| 148 | +2. **Dynamic data** - Defined under `.status.events`. This section evolves over time, reflecting lifecycle changes such as scaling actions or SLA updates. |
| 149 | + |
| 150 | +All lifecycle events (e.g., creation, scaling, deletion) are recorded within the same resource, enabling full event history reconstruction. |
| 151 | +This also allows operations such as **resending** events via annotations. |
| 152 | + |
| 153 | +The `.status.events` array must be ordered in **descending** order by `timestamp`, with the most recent event listed first. |
| 154 | + |
| 155 | +Event resending is supported automatically and includes an **exponential backoff retry mechanism**. |
| 156 | + |
| 157 | +All CRs will be created within a single, dedicated namespace. |
| 158 | + |
| 159 | +This design provides better isolation and aligns with the Kubernetes and Crossplane direction of deprecating cluster-scoped resources. |
| 160 | +Scoping CRs to a namespace offers several advantages: |
| 161 | + |
| 162 | +* Enables referencing other namespaced resources like ConfigMaps, if required in the future. |
| 163 | +* Simplifies access control and resource lifecycle management. |
| 164 | +* Keeps CRs co-located with their controller, which also runs in the same namespace. |
| 165 | + |
| 166 | +Centralizing CRs in one namespace enhances organization, improves security, and promotes operational simplicity. |
| 167 | + |
| 168 | +A resource is considered `Synced` only when **all** `.status.events[].state` values are `sent`. |
| 169 | + |
| 170 | +There is currently no need to limit the number of stored events, as the expected volume per CR is low and manageable. |
| 171 | + |
| 172 | +==== Billing CR Example |
| 173 | + |
| 174 | +[source,yaml] |
| 175 | +---- |
| 176 | +apiVersion: appcat.vshn.io/v1 |
| 177 | +kind: BillingService |
| 178 | +metadata: |
| 179 | + annotations: |
| 180 | + appcat.vshn.io/resend: "all|not-sent|failed" #<1> |
| 181 | + name: <instance-xrd> #<2> |
| 182 | + namespace: syn-appcat #<3> |
| 183 | + finalizers: |
| 184 | + - delete-protection #<4> |
| 185 | +spec: |
| 186 | + keepAfterDeletion: 365 #<5> |
| 187 | + odoo: #<6> |
| 188 | + instanceID: "a" |
| 189 | + salesOrderID: "SO0042" |
| 190 | + itemDescription: "Human readable description" |
| 191 | + itemGroupDescription: "My Item Group" |
| 192 | + unitID: "vshn_event_billing.uom_instance_hour" |
| 193 | +status: |
| 194 | + events: #<7> |
| 195 | + - type: "deleted" |
| 196 | + productId: "Y" |
| 197 | + size: "3" |
| 198 | + timestamp: "2025-06-20T13:00:00Z" |
| 199 | + state: "sent|pending|failed" #<8> |
| 200 | + - type: "scaled" |
| 201 | + productId: "Y" |
| 202 | + size: "3" |
| 203 | + timestamp: "2025-05-20T13:00:00Z" |
| 204 | + state: "sent|pending|failed" |
| 205 | + - type: "scaled" |
| 206 | + productId: "Y" |
| 207 | + size: "2" |
| 208 | + timestamp: "2025-04-20T13:00:00Z" |
| 209 | + state: "sent|pending|failed" |
| 210 | + - type: "created" |
| 211 | + productId: "X" |
| 212 | + size: "1" |
| 213 | + timestamp: "2025-03-20T13:00:00Z" |
| 214 | + state: "sent|pending|failed" |
| 215 | + conditions: |
| 216 | + - lastTransitionTime: "2024-05-25T15:35:02Z" |
| 217 | + reason: ReconcileSuccess |
| 218 | + status: "True" |
| 219 | + type: Synced |
| 220 | + - lastTransitionTime: "2023-05-25T18:45:38Z" |
| 221 | + reason: Available |
| 222 | + status: "True" |
| 223 | + type: Ready |
| 224 | +---- |
| 225 | + |
| 226 | +<1> An on-demand trigger to resend events from the `status.events` list based on their `state`. |
| 227 | +<2> Unique name of the composite - serves as the identifier for the Billing CR. |
| 228 | +<3> All Billing CRs reside in the `syn-appcat` - framework's management namespace. |
| 229 | +<4> A finalizer from the controller to protect from accidental deletion. |
| 230 | +<5> The field defines after how many days the CR should be deleted after the service is removed. |
| 231 | +<6> The `spec.odoo` section contains static metadata, consistent across all events. |
| 232 | +<7> The `status.events` array holds dynamic billing event fields, typically following lifecycle changes. |
| 233 | +<8> The `state` field tracks event delivery status to Odoo: `sent`, `pending`, or `failed`. |
| 234 | + |
| 235 | +[NOTE] |
| 236 | +==== |
| 237 | +For a complete reference of all fields in this CR, see the https://docs.central.vshn.ch/event-billing-ingestion.html[Odoo documentation]. |
| 238 | +==== |
| 239 | + |
| 240 | +[NOTE] |
| 241 | +.xref:odoo-sync-state[Odoo Sync State] |
| 242 | +==== |
| 243 | +Odoo currently provides REST API endpoints that can be used to check sync status between Billing CRs and Odoo. |
| 244 | +
|
| 245 | +This will be addressed in a future iteration of the AppCat Billing System. |
| 246 | +==== |
0 commit comments