Weekly Operations Report

APIM Incident Report

API Management — May 20 – May 27, 2026

Team: APIM / ServicingLoop
Period: Last 7 days
Generated: May 27, 2026 13:45 UTC
15
Total Sev2+
3
Active
10
Mitigated
2
Resolved
0
Outages
728
Sev3
743
Total 7d

On-Call Rotation (Sloop)

APIM Incident Manager
▶ INCOMING
Tom Kerkhove (Primary)
Maxim Kim (Primary)
SaiKiran Vukyam (Primary)
Samir Solanki (Primary)
◀ OUTGOING
Tom Kerkhove (Primary)
Rafal Mielowski (Primary)
Shilpa Mani (Primary)
US Sloop
▶ INCOMING
Zhongyuan Ren (Primary)
Bruce Moe (Backup)
◀ OUTGOING
Gleb Feoktistov (Primary)
Zhongyuan Ren (Backup)
EU Sloop
▶ INCOMING
Macko Treder (Primary)
Kriti Majumdar (Backup)
◀ OUTGOING
Srajan Agrawal (Primary)
Maxim Agapov (Backup)

Active Incidents

3 active
IcM: 803306220Owner: glfeoktiTeam: BackendCreated: May 22, 23:59 UTC
AI Summary
Azure Reliability Red Flag requiring APIM to ensure all service calls to dSTS/dSMS use first-party IPs covered by service-specific Service Tags. APIM has untagged IPs calling these sensitive endpoints. Not customer-impacting but mandatory compliance. ETA tag set to 2026-11-30 with 90-day SDP duration. Must submit Service Tag capacity requests by May 28.
Required Actions
1. Identify untagged IPs via Kusto query on cdocinv.westus2 cluster. 2. Submit Service Tag requests by May 28. 3. Add AzRF.SDPInProgress tag by Jun 4. Must remain Sev2 — non-tracking escalates to EVP. ● MEDIUM — Compliance deadline
IcM: 802378264Team: BackendCreated: May 21, 17:45 UTC
AI Summary
AzureSecurityPack blocked zlib1.dll (Git mingw64) from executing due to Code Integrity policy violation. Unlike previous audit-only detections, this binary was BLOCKED. Remains active — requires binary update or policy exception. Related to 802334013 (mitigated with TSG). ● MEDIUM — Binary blocked, needs update
IcM: 802291010Team: BackendRegion: West Europe (api-am2-prod-01)Created: May 21, 15:06 UTC
AI Summary
SKUv2 customer activation success rate dropped below 95% SLA threshold in West Europe with 3+ unique service/subscription failures in the last 3 hours. Potential customer impact for new activations. Requires investigation into activation pipeline failures on api-am2-prod-01-rp. ● HIGH — Customer activations impacted

Emerging Issues & CRIs

3 tracked
IcM: 51000001038159Owner: glfeoktiCreated: May 26, 16:30 UTCHowFixed: Ad-Hoc steps
AI Summary
Customer-reported APIM service completely down due to unknown network connectivity issue. Required manual intervention with ad-hoc steps to restore. Root cause still under investigation — potential network policy or infrastructure issue affecting service reachability.
IcM: 51000001037978Owner: glfeoktiCreated: May 26, 14:56 UTCHowFixed: Ad-Hoc steps
AI Summary
Large enterprise customer (Publix) experienced primary APIM instance becoming completely unresponsive at the control plane level. Unable to scale to alternative region. Escalated to Sev 1 by customer. Mitigated with manual intervention. Critical scenario — highlights need for control plane resilience during regional failures.
IcM: 21000001036015Owner: glfeoktiCreated: May 23, 21:23 UTCHowFixed: Ad-Hoc steps
AI Summary
Customer APIM gateway ps-prod-be-euw-apim-manageprotect2 went completely down in West Europe. Root cause: ApimBootstrapperService timed out with VMExtensionProvisioningError. Gateway could not provision VM extensions needed for the service. Mitigated with ad-hoc steps (likely VM reimage/restart).

Other Mitigated & Resolved Sev2 Incidents

9 closed
IcM: 21000001035735Owner: srajagrawalCreated: May 23HowFixed: Ad-Hoc steps
AI Summary
Customer unable to modify rate limit policy configuration in their APIM instance. Blocking policy management operations. Mitigated with targeted ad-hoc steps. May indicate a serialization or validation issue in policy save path.
IcM: 51000001033777Owner: glfeoktiCreated: May 22HowFixed: Ad-Hoc steps
AI Summary
An unplanned platform schedule upgrade affected a customer environment. Required manual intervention to stabilize. Classified as ACE customer engagement.
IcM: 51000001033443Owner: glfeoktiCreated: May 22HowFixed: Ad-Hoc steps
AI Summary
Consumption SKU APIM service experienced underlying App Service platform going down. Customer-impacting — Consumption tier relies entirely on platform health. Mitigated with ad-hoc intervention on the platform side.
AzSysLock — Code Integrity Violations (3x this week)
Sev 2Mitigated3x / 7 days
IcMs: 802334013, 802277295, 802152809Owner: v-nbudatiBinaries: zlib1.dll, git.exe
AI Summary
Three AzSysLock Code Integrity violations detected for Git-related binaries (zlib1.dll, git.exe). All mitigated with TSG. Same pattern as previous weeks — Git binaries on APIM VMs do not have proper code signing. ● MEDIUM recurrence — Will recur until binaries updated across fleet.
ASM SLAM — Malicious Domain Communication (2x this week)
Sev 2Mitigated2x / 7 days
IcMs: 802362361, 802277295Owners: glfeokti, v-nbudatiHowFixed: False Alarm / Other
AI Summary
Azure Security Monitoring (ASM SLAM) detected communication to domains classified as malicious from APIM infrastructure. Both confirmed as False Alarm. Likely DNS resolution to shared infrastructure IPs that happen to be flagged. No actual malicious activity.
IcM: 803299608Owner: v-kambhavanaCreated: May 22HowFixed: TSG
AI Summary
Single Premium SKU NonVNET service became 100% unreachable in Central US with 2+ probing attempts. Resolved using standard TSG. Likely transient VM health issue.
IcM: 21000001029432Owner: shubhashCreated: May 20HowFixed: By Design
AI Summary
Capacity exception request for Premium tier in UK South for Centrica. Resolved as By Design — standard capacity management process.

Ownership Summary

OwnerCountKey Themes
glfeokti7Network connectivity CRI, Publix Sev1, Gateway Down WEU, Consumption SKU, unplanned upgrade, ASM SLAM, AzRel Red Flag
v-nbudati3AzSysLock Code Integrity (zlib1.dll, git.exe), ASM SLAM
srajagrawal1Rate limit policy CRI
v-kambhavana1Gateway unreachable Central US
shubhash1Capacity Exception Request (By Design)

Alert Trends — Top Firing Monitors (7 Days)

Monitors that fired 5+ times

Key Takeaways & Action Items