Optimization of Cross-Border Payment Costs Using Reinforcement Learning and Network Flow Models
Main Article Content
Cross-border payments exhibit persistent inefficiencies driven by multi-rail fragmentation, time-varying FX spreads, congestion, and operational outages. This study proposes a hybrid optimization architecture that couples a constrained minimum-cost flow solver with a reinforcement learning (RL) policy that applies bounded adaptive adjustments to feasible routing and liquidity plans. Experiments were conducted on 120,000 transactions across 18 corridors under 24 stress scenarios with 50 stochastic seeds, yielding 1,200 independent runs. The proposed RL+Flow method achieved the lowest mean cost at 1.84 USD per transaction, outperforming flow-only (2.11 USD), RL-only (2.24 USD), and cheapest-rail routing (2.36 USD). Cost decomposition showed a 14.8% reduction in FX-spread costs and a 31.6% reduction in delay penalties relative to flow-only, while maintaining conservative fee profiles. Service quality improved concurrently, with late settlement rates reduced to 1.9% versus 4.8% (flow-only), 6.1% (RL-only), and 7.4% (cheapest-rail), and failure rates reduced to 0.42% versus 0.75%, 0.88%, and 1.17% under outage-focused regimes. Corridor analysis indicated larger incremental savings in dense corridors, reaching an average −11.9% cost change versus flow-only, compared with −6.2% in sparse long-tail corridors. Ablation results showed that removing flow warm-start increased mean cost to 2.06 USD and doubled failures to 0.78%, while removing reliability filtering raised failures to 0.91%. Overall, the results indicate that feasibility-first flow optimization combined with risk-aware adaptive RL yields robust cost and reliability gains under realistic nonstationary conditions.