Spike: Distributed transaction use cases and solutions

Requirements: FOLIO-2269 - Getting issue details... STATUS

Use Cases

1. Check Out / Check In

The most relevant example from circulation is when the item status is changed as part of circulation processes.

Updating item status during check out and check in

Typically, when an item is checked out, a loan is created within circulation and the item status in inventory is changed to checked out. If one of these operations fails, the other should be undone, which likely means the item status should be done first as it is likely easier to reverse.

Similarly, when checking in an item, a loan could be closed, and the item status is likely to change, either to available or a request fulfilment related status.

This could get more complicated when there are other side effects applied from these changes, for example, loan action history or item status history.

2. Orders

Spike: Spike: Guarantee of the PO Line status up-to-date

2.1 Automatically adjust order status based on poLine's paymentStatus and receivingStatus

Story:  MODORDERS-218 - Getting issue details... STATUS

The last step in receiving/check-in flow is to update the status of PO Line to "Partially Received" or "Fully Received". It's possible that something may go wrong and status won't be updated even though the items were actually received. So the mechanism to guarantee the status is up-to-date needs to be worked out.

Current solution

The event bus message is sent for particular cases: PO Line is successfully updated or receiving/check-in successfully resulted to PO Line receiving status update. The order status handler is registered for event bus consumer. The handler receives array of order ids and performs needed logic.

2.2 Ensure receiptStatus consistency between piece and poLine

Story:  MODORDERS-173 - Getting issue details... STATUS

Since Piece and PO Line records are stored separate from one another and are managed via their own APIs there's the possibility that inconsistencies may arise. The receiptStatus fields could get out of sync. So solve this problem, whenever a piece record's receivingStatus changes, we should emit an event which triggers the "calculation" of the corresponding PO Line's receptStatus.

Current solution

The receiptStatus of PO Line is calculated whenever a related Piece record's receivingStatus is changed. This happens asynchronously via the vertx event bus.

3. Invoices

3.1 Calculate Invoice Totals

Story:  MODINVOICE-52 - Getting issue details... STATUS

Whenever an invoice is created, updated or retrieved, the invoice.subTotal, invoice.adjustmentsTotal, and invoice.total need to be calculated and returned in the response.

Current solution

Calculating "on the fly" on GET request - invoice totals are never persisted - the only exception is if invoice.lockTotal == true, then invoice.total is persisted and never replaced w/ a calculated value.

3.2 Avoiding partially paid invoices

Spike: Spike: Avoiding partially paid invoices POC

When an invoice is paid, numerous transactions will be generated: in orders – Either all or none of these need to be successfully processed; we can't have partially paid invoices.  Furthermore, we also need to update running totals in several places (budget, ledger, etc.).

Current solution
  • Instead of performing calculations in the business logic module like we normally would, move this to the storage module. 
  • Accumulate transactions in a temporary table until all transactions for an invoice are present, then apply them all at once, updating all the required tables together in a database transaction.
  • Requires several APIs to be idempotent and for a summary/manifest of the transactions to expect to be provided.

Possible Messaging Based Solutions

1. Check Out / Check In

1.1 Check Out - Avoiding inconsistencies between loans and items

When an item is checked out to a patron, a loan is created in circulation and the item in inventory is updated. Both of these need to happen for the check out to succeed.

If the check out is part of the fulfilment of a request 

Using messaging (and multiple representations of the same entity) creates the opportunity to contain all direct storage updates to within circulation and update inventory later.

Prerequisites

  • An item record has been introduced into circulation
  • Messaging system must be able to send messages directly to a module (for command based approach)
  • Message system must be able to publish messages to any subscriber (for event based approach)

Questions

  • Who owns that an item is checked out, is it circulation or inventory? This question is important because it likely changes the nature of the integration
    • If inventory owns the fact that an item is checked out, then updating the item might be a required part of the check out process
    • If circulation owns the fact that an item is checked out, then updating the item could happen afterwards, if a degree of eventual consistency is acceptable

Event Based Process (circulation owns that an item is checked out)

  • Client attempts to check out an item (business logic steps omitted)
    • Create the loan in circulation storage
    • Update the item in circulation storage
    • Update the request (if required) in circulation storage
    • Circulation publishes an item checked out  event
  • Circulation responds to client confirming check out
  • Inventory receives an item checked out event
  • Inventory reacts by updating it's representation of the item in storage
Alternative Flow - Inventory Item Update Fails
  • Inventory  may choose to retry the storage update
  • Circulation is unaware that this update has failed
  • There would need to be a way to detect and correct this inconsistency
Considerations
  • Circulation is not dependent upon inventory for performing check outs (assumes there is already a synchronisation process in place for inventory information)
  • Whose responsibility is it to detect inconsistencies?
  • Event can be used by other processes

Command Based Process (inventory owns that an item is checked out)

  • Client attempts to check out an item (business logic steps omitted)
    • Create the loan in circulation storage
    • Update the item in circulation storage
    • Update the request (if required) in circulation storage
    • Circulation sends a command to change the status of an event to inventory 
    • Inventory reacts by updating it's representation of the item in storage
  • Circulation responds to client confirming check out
Alternative Flow - Inventory Item Update Fails
  • Circulation may choose to retry issuing the command to inventory
  • Check out would fail (and other compensating actions taken) if item in inventory cannot be updated
Considerations
  • How does circulation know that the command has failed? Is is based upon:
    • a request / reply protocol (meaning the messaging infrastructure needs to support reply addresses and correlation)
    • the absence of an error in a given time
    • polling for the state change in inventory
  • Inventory must be available for check out to occur, even with the use of messaging (unless move the command issuing to after check out succeeds, then need similarly compensation / recovery to event based approach)

Notes

  • Individual storage module requests could still fail, this would need to be addressed separately, with appropriate compensating actions
  • The UI / client uses the state of these records to guide the user experience, so it is not possible to defer some of these operations till after the check out (e.g. in the background)
  • It is not possible to resubmit the check out as it is not idempotent, it refuses any attempt to check out an item that is already checked out or already has an open loan

2. Orders

3. Invoices

3.2 Avoiding partially paid invoices

While the current solution handles the "all or nothing" aspect of the finance side, it fails to address the problem as a whole.  The part that's missing is that we also need to update the invoice record, and this can't be done in a database transaction with the finance table updates since they're in different modules/tablespaces.  Put explicitly, the problem is that the finance updates might succeed, but then the subsequent call to invoice-storage to update the invoice status could fail, leaving the data in an inconsistent state.  In theory, if the payments/credits calls are all idempotent, which they should be, the entire operation of "paying the invoice" could be retried.  The finance calls would wind up having no effect as they've already been applied.  This would be one way to recover from the situation and get the data synchronized again.  However, having the grey area of the invoice really being paid, but just not marked as such is undesirable, and should be avoided if possible. 

There's another aspect of this that hasn't been mentioned yet.  Upon paying an invoice, we also want to updated associated orders, marking them as either "partially paid" or "fully paid".  This is really where we'd really benefit from having a pub/sub mechanism.

  • the invoice module performs the finance side of things via standard API calls as described earlier - the rest of the process hinges on the outcome of this.
  • if the finance side of things succeeds, we now need to update the status of several different records, invoice, orders, etc.
    • publish a message saying "this invoice was paid"
    • the invoice module would subscribe to this topic and update the appropriate invoice's status (assumption that it's OK to produce and consume a message in the same module)
    • the orders module would subscribe to this topic and update the appropriate order's payment status
  • if the finance side fails, we simply return an appropriate error response and paying the invoice can be retired, making adjustments (e.g. additional allocations to funds, etc.) as needed beforehand.

Notes:

  • Here I think it's important that the pub-sub/messaging mechanism provides guaranteed delivery and message durability
  • Message consumers should retry operations if they fail
  • We're essentially settling for eventual consistency here
  • I have the business logic modules as subscribers/consumers here since there's likey some logic that should be applied when an even is received. This is especially true on the orders side as a couple things need to be checked before determining what the payment status of each of the orders involved should be set to.
  • There could still be some interesting race conditions w/ the UI... if the API call to "pay the invoice" returns before the async processing performed by the message consumers succeeds, and the UI retrieves the invoice record again, the status might still show as "approved" and not "paid".  Assuming the UI uses this to determine whether or not to display the "pay invoice" button, this could be confusing to the user and result in the user clicking that button again.  For this reason not only the finance side of things needs to be idempotent, but also the message consumers need to be as well - or at least need to be able to distinguish between unique and duplicate events.

Topics

What do we mean by distributed transactions

With the example above, it is relatively apparent that there are changes which need coordinating across contexts (circulation and inventory).

As FOLIO uses distributed business logic and storage modules, we could also consider any operation that involves more than one record (maybe of the same type, maybe of different types) as a distributed transaction, even though these changes are in the same context.

Using an example from above, during a check in, a loan might be closed and request fulfilment might start. It might be that both of these operations need to be reversed if one fails (it might also be that repeating the process is acceptable).

TODO:

  • Devise and propose solutions to each of these using generic messaging queue w/ guaranteed delivery.  Marc Johnsonand Craig McNally