Backend service to ingest NEM12 CSV meter readings at scale.
What is the rationale for the technologies you have decided to use?
- Kotlin + Spring Boot: I chose Kotlin for its concise syntax and modern features, and Spring Boot for its wide adoption in building REST APIs. Also because this is the tech stack I used for my last project, so I was most the comfortable with this.
- Flyway: To manage database schema changes in a reliable and version-controlled way.
- Docker Compose: To spin things up locally without fuss.
- JPA (Hibernate): For seamless ORM support and easier data persistence.
- JUnit5 + Mockito: For unit and integration testing.
What would you have done differently if you had more time?
- Added more robust validations and schema-based parsing
- Add a UI/status endpoint for feedback.
- Consider using a queue like kafka to handle large file sizes .We can push the parsed rows to kafka instead of holding them in memory. This queue can act like a buffer to hold the data and ingestion can happen asynchronously (separate from parsing logic). This can improve api response time.
- Consider automating the file upload process, by using airflow jobs (or something similar). airflow DAG when triggered can call the upload endpoint. This can happen at a scheduled interval (maybe next due date for readings).
- Add datadog for monitoring and alerting.
- Add API documentation with swagger
- I missed to add security, I would definitely add that in a production grade setup.
What is the rationale for the design choices that you have made?
- Modular design - I have tried to keep the design modular, keeping modules like parser, validator independent making reusability and testability easier.
- Testing- I have tried to keep test cases as simple to read as possible so that they can we used as dev documentation as well.
- Validation & Error Reporting - In case of file structure issues there is full rejection, in other cases there is partial rejection and proper error report is returned to make it easy to retry.
- Batch upsert - To avoid overwhelming the DB and reduce memory pressure, I added a chunked saveAll logic to insert records in batches.
- Re-uploading - I am avoiding re insertion of rows in case of re-uploads.