REINFORCEMENT LEARNING APPROACHES TO OPTIMIZE IT SERVICE MANAGEMENT UNDER DATA SECURITY CONSTRAINTS
DOI:
https://doi.org/10.63125/z7q4cy92Keywords:
Reinforcement Learning, IT Service Management, Data Security Constraints, Off-Policy Evaluation, Governance-Aware OptimizationAbstract
This quantitative study evaluated reinforcement learning (RL) approaches to optimize IT Service Management (ITSM) decision-making under explicit data security constraints using a retrospective observational design and historical operational logs. The analytic dataset was extracted from a 12-month observation window and contained 128,450 raw incident and service-request tickets. After deterministic preprocessing, including timestamp normalization, deduplication, linkage resolution, and censoring under pre-specified rules, 112,680 tickets were retained for analysis, representing 46.3% incidents and 53.7% service requests across 10 service lines and 5 regions. Mandatory ITSM fields were complete for 96.8% of retained records, and 78.4% of tickets were linked to usable telemetry context for state construction. Descriptive statistics indicated distributional asymmetry typical of service systems: the median time-to-acknowledge was 0.42 hours and the median time-to-resolve was 14.6 hours, with substantial tail behavior (P90 resolution time 63.4 hours; P95 resolution time 97.6 hours). Governance indicators were consistently recorded, with audit-log completeness averaging 98.9%. Correlational analysis showed that time-to-resolve was positively associated with congestion and workflow friction, including relationships with backlog intensity (0.46), ticket aging (0.52), reassignment count (0.41), and escalation occurrence (0.35), motivating multivariable modeling with collinearity screening. Adjusted regression models controlling for ticket severity, category, region, service line, workload context, and major-incident regime indicated that the RL policy was associated with improved performance relative to baseline handling, including reduced time-to-acknowledge (β = −0.08 hours; p < .001) and improved resolution performance (log-linear β = −0.12; p < .001), alongside improved percentile-oriented SLA attainment (P90 threshold OR = 1.28, p < .001; P95 threshold OR = 1.19, p = .001). Process-quality outcomes also improved, including reduced reopen occurrence (OR = 0.86; p < .001) and reduced reassignment occurrence (OR = 0.89; p < .001). Constraint-related models showed no statistically significant increases in privileged-action events (IRR = 0.99; p = .61), exception approvals (IRR = 0.98; p = .44), or restricted-field access (IRR = 1.01; p = .69), and audit completeness remained statistically unchanged, supporting admissibility of security constraints while service outcomes improved.
