NOTICE
This decision has been migrated to the Technical Council's Decision Log as part of a consolidation effort. See: DR-000008 - Data export by using Spring Batch (aka Export Manager)
Terms
Data export - a general solution that should be applied for all new features intended to export data from Folio modules to the destination (file, database, etc.).
A potential application of this solution isn't limited only by SRS/Inventory Data export feature (ui-data-export and mod-data-export modules). It should be a system-wide solution leveraged for all data export business cases.
Business goals
The new Data export approach was designed for the following features:
- Export orders in bulk via delimited file format: https://issues.folio.org/browse/UXPROD-2318
- Build a report with rollover errors and store in CSV file: https://issues.folio.org/browse/MODFISTO-173
- Circulation log export to CSV: https://issues.folio.org/browse/UXPROD-2691
- Cornell Library's go-live requirements to transfer fees/fines to the Cornell bursar system: https://issues.folio.org/browse/UXPROD-2862
- The ability to import/export fund updates via CSV file in order to bulk edit funds: https://issues.folio.org/browse/UXPROD-199
There is no intent to replace any existing data export solution for now. If later there will be a requirement to significantly extend any existing Data export solution, the new approach should be applied.
Architecture design
To eliminate limitations of the existing mod-data-export module (see Data export by using mod-data-export) and speed up development 2 new modules should be implemented based on the Spring way approach (see Pic. 1):
- mod-data-export-spring
- mod-data-export-worker
Pic. 1.
mod-data-export-spring
The module that is responsible for:
- Receiving data export Job requests via REST API
- Storing Job requests in a database and providing Job search capabilities
- Sending Job commands to mod-data-export-worker module(s) via Kafka
- Receiving Job status updates from mod-data-export-worker via Kafka and updating them in the database
- Providing file download link, once data export is finished
The module should be based on the Spring way approach. Sample Spring way module: mod-spring-template.
Technologies that should be leveraged to implement the module:
- Spring framework
- Spring Boot
- Spring Kafka (to easy up work with Kafka)
- folio-spring-base (for integration with Okapi)
- HikariCP (connection pool)
- Liquibase (for database schema migration)
- PostgreSQL as database
- Lombok
Data model
Pic. 2.
mod-data-export-worker
The module retrieves data from other Folio modules via their REST API and adds it to CSV file parts. Once all required data is retrieved, the worker uploads the file parts to the Folio Object storage (AWS S3, MinIO, etc.) via MinIO and assembles the target CSV file from the parts there. Then the target files can moved to some file storage like AWS S3, MinIO, FTP/SFTP, etc.
Once the file is uploaded, the module generates a download URL and sends it back to mod-data-export-spring via Kafka.
The worker should be able to export data from the following modules:
- mod-audit
- mod-orders
- mod-finance
The worker should be extensible to easy up:
- The implementation of data export form other modules
- Introduction of the new export file types
Technologies that should be leveraged to implement the module:
- Spring framework
- Spring Batch
- Spring Boot
- Spring Kafka (to easy up work with Kafka)
- folio-spring-base (for integration with Okapi)
- HSQLDB as in-memory database for Spring Batch
- Lombok
mod-data-export-worker is based on the Spring Batch framework, which has the following advantages:
- Good separation of concerns. It has concepts of:
- Jobs
- Steps
- Tasks
- Task partitions
- Readers
- Writers
- etc.
It makes it easy to implement Batch export workers and extend them later. - Chunk based processing
Spring batch is designed a way to process batch jobs chunk by chunk.
Each chunk is extracted from a data source, transformed (if required), and loaded to a destination file/storage. - Parallel Job execution by using multiple threads.
A Task could be split into smaller parts by Partitioner, then each part could be processed by a separate thread. - Ability to save job state in a database and continue execution from the place at which the job was interrupted.
- It also provides a lot of different hooks and custom steps, which also makes it easy to create a new job or extend an existing one.
NOTE: Mod-data-export-worker can be used with PostgreSQL database for spring batch. According investigation in scope of MODEXPW-215 using PostgreSQL with Spring Batch requires:
- Providing READ_COMMITTED isolation level for spring batch config for job repository .
- Using unique set of job parameters for job launching as spring batch does not allow to run several instances of job with same job's parameters.
Kafka
The following topics should be created in Kafka:
- <tenant>.data-export.job.command
The topic to send start job commands from mod-data-export-spring to mod-data-export-workers. Each worker gets messages from a dedicated topic partition(s).
It helps to make sure that the same Job isn't executed by multiple workers. The worker sends message processing acknowledgment only after Job is finished.
If the worker fails in the middle of the job execution, it can retrieve uncommitted messages from Kafka again and re-process the Job. - <tenant>.data-export.job.update
The topic to send Job status updates from mod-data-export-workers to mod-data-export-spring. The later updates Job status in PostgreSQL database according to received messages.
Scalability
The solution could be easily scaled horizontally by increasing the number of mod-data-export-spring and mod-data-export-worker instances.
Both modules are stateless. mod-data-export-spring persists its state in the PostgreSQL database. All Start job requests for mod-data-export-worker are stored in Kafka.
Security
All modules should leverage Folio module security components and best practices.
Kafka security is out of the scope of this document.
14 Comments
Ann-Marie Breaux
Would this mean moving all types of export (from many different apps) under the current UI for Data Export? If so, has that decision been vetted with all the affected POs and the community? That's a pretty fundamental scope change for the Data Export app, which is currently handling Inventory and SRS data.
Vasily Gancharov
Ann-Marie Breaux , I propose to create a new Data export page, where a user can find all export jobs and results. This page should co-exist with the existing Data export page. For now, the new page will be used only for the features listed in the "Business goals" section of this document. I don't think it makes sense to update all existing Data export solutions to use the new UI. Every data export solution could be updated to use it, once there is a business need and enough capacity for it.
Ann-Marie Breaux
Good to know. Thanks, Vasily Gancharov
Marc Johnson
Vasily Gancharov
I thought that it was agreed at last week's Technical Leads meeting that UI mock ups were going to be created and a conversation had with the users to evaluate how to approach this. Is my recollection incorrect or is this proposal based upon already having had conversations with the relevant POs / SIGs?
Ann-Marie Breaux
In my mind, the data export that the Data Export app does is very different from the kinds that are considered in the scope. In any event, so long as the Data Export app is not messed with, and the POs/SMEs have a chance to weigh in before it is, then I'm not worried about it. Please be aware that there is not signoff from the POs/SMEs to smushing all Export into one interface yet. And for that matter, I'm not sure any workflow that involves leaving the app a user is currently in (orders, requests, etc) to go to another app and initiate export, would be acceptable unless it's a seamless process for the user.
Ann-Marie Breaux
There's also other apps that export to CSV besides the ones you've listed here. For example, Requests, Users (export loan info), Licenses
Marc Johnson
Vasily Gancharov
FOLIO-2986 - Getting issue details... STATUS defined a new export module specifically for exporting fees / fines to a bursar office. Do you know if folks considered using the new approach defined here for this export?
Aleksei Prokhorov
FOLIO-2986 - Getting issue details... STATUS is for UXPROD-2862 - Getting issue details... STATUS as it is mentioned in FOLIO-2986 - Getting issue details... STATUS .
FOLIO-2987 - Getting issue details... STATUS and FOLIO-2988 - Getting issue details... STATUS are for UXPROD-2691 - Getting issue details... STATUS where we will use this data export approach.
Marc Johnson
Aleksei ProkhorovVasily Gancharov
I don't understand. If this work is going to use the proposal in this design (centralised export) why is there a need for mod-bursar-export as well?
Vasily Gancharov
Marc Johnson, thank you for the link. They will use the new Data export approach, as we discussed with Aleksei Prokhorov.
Marc Johnson
Vasily GancharovAleksei Prokhorov
Does that mean there no longer needs to be a mod-bursar-export module (and hence no new repository either)?
Vasily Gancharov
The Scout team will work on UXPROD-2691 - Getting issue details... STATUS , for which they should leverage the new Data export approach. I will discuss the mod-bursar-export with the people, who designed it. Potentially, the Data export approach could be applied to FOLIO-2986 - Getting issue details... STATUS as well.
Marc Johnson
Vasily GancharovThanks
Vasily Gancharov
Thank you Marc Johnson