| Incident / DRI / Status | Questions & Answers | Feedback & Work Items |
|---|---|---|
| INC 787095677 Sev2 Mitigated Outage Declared Title: Grandfathered Limits not applied after 0.51 DRI: Macko Created: 2026-04-27 Impact: 2026-04-27 02:00 Ticket: 2604270050003314 | 1. Customer ComplaintMultiple customers with grandfathered limits hitting operation limits after 0.51.27763.0 upgrade. 2. What customer saw"Youve reached the maximum number of operations" despite grandfathered limits. API creation/update blocked. 3. CSS telemetryCustom settings storing limits wiped during 0.51 upgrade. Version 0.50 honored them correctly. 4. Diagnosis & FixCode Bug / RP - 0.51 didnt honor custom limits. Rollback to 0.50 via "APIMSKUV1Rollback", quarantine, fix in 0.52. Ashendre applied CustomSetting Upgrade Apr 28. 5. Monitoring Miss?Yes - no test for limits persistence across upgrades. 6. RepairsFix in 0.52. Integration tests. Alert on "max operations" spike post-upgrade. | Feedback: (placeholder) Work Items:None linked |
| INC 21000000998761 Sev2 Resolved Title: APIM service is down DRI: Macko Created: 2026-04-25 | 1. Customer ComplaintCustomer reported APIM service completely down. 2. What customer sawService unreachable, all traffic returning 500s. 3. CSS telemetryAuto OS rolling upgrade triggered destructive VMSS model update from scale-out. All VMs lost Redis. Health monitor deadlocked. 8x traffic ramp + pre-existing Redis errors contributed. 4. Diagnosis & FixScale-out triggered destructive VMSS upgrade. ~7h outage. Macko mitigated with Reboot Apr 25. 5. Monitoring Miss?Yes - cascading failure not caught until customer reported. 6. RepairsRolling upgrade guardrails. Redis resilience. Reboot as preferred first mitigation. Known pattern documented. | Feedback: (placeholder) Work Items:None |
| INC 51000000994948 Sev2 Resolved Title: Front Door 502 - invalid cert chain DRI: Macko Created: 2026-04-23 Note: Recurred as INC 51000001005891 | 1. Customer ComplaintFront Door returning 502. Incomplete certificate chain on APIM default domain. 2. What customer sawHTTP 502. Missing intermediate CA certs. 3. CSS telemetryAfter OS upgrade, VMs stopped presenting full cert chain. Internal VNET with port 80 blocked preventing CRL/AIA fetching. 4. Diagnosis & FixGateway (Managed) - incomplete chain after OS upgrade on VNET services with port 80 blocked. Macko resolved Apr 23. 5. Monitoring Miss?Yes - no cert chain completeness monitoring. 6. RepairsFull chain always served. Cert chain alerting. VNET dependency checks for CRL/AIA. | Feedback: (placeholder) Work Items:None |
| INC 21000000995987 Sev2 Active Downgraded Sev3 Title: SSL CERTIFICATE_VERIFY_FAILED DRI: Martin Created: 2026-04-23 Customer: Network Rail (Mission Critical) | 1. Customer ComplaintSSL certificate verification failures on backend calls. 2. What customer sawSSL CERTIFICATE_VERIFY_FAILED intermittently. 3. CSS telemetryTransferred to Platform for deeper RCA. 4. Diagnosis & FixActive - Martin investigating. 5. Monitoring Miss?TBD. 6. RepairsTBD. | Feedback: (placeholder) Work Items:None |
| INC 785535632 Sev2 Resolved Title: V2 SKUs Management 502s DRI: Macko Created: 2026-04-24 Impact: 2026-04-23 01:30 Tickets: 2604240030005476, 2604240050001721 | 1. Customer ComplaintMultiple EA customers - intermittent 502 on management plane for V2 SKUs across regions since ~Apr 21. 2. What customer sawManagement API calls returned 502. Retries sometimes succeeded. 3. CSS telemetryHttpIncomingRequests: 502 spike from Apr 21, correlated with SKUv2 deployment. Related: 51000000995495, 51000000996243, 784788127. 4. Diagnosis & FixSMAPI Regression. Macko rolled back SKUv2 Apr 27. 5. Monitoring Miss?Yes - 3-day gap between spike start and declaration. 6. RepairsFix regression. Lower threshold for management 502 alert. | Feedback: (placeholder) Work Items:None linked |
| INC 51000000996010 Sev2 Active Sev 2->3: customer issue Title: HM Electronics SSL CERT_VERIFY_FAILED DRI: Macko Created: 2026-04-23 | 1. Customer ComplaintHM Electronics (Sev A, Premium) - intermittent SSL cert verification failures. 2. What customer sawIntermittent SSL CERT_VERIFY_FAILED on backend calls. 3. CSS telemetryClassified as customer issue. 4. Diagnosis & FixCustomer issue. Downgraded to Sev3. 5. Monitoring Miss?N/A. 6. RepairsN/A. | Feedback: (placeholder) Work Items:None |
| INC 51000000996953 Sev2 Active Sev 2->3: reporting issue Title: APIM does not scale out DRI: Martin Created: 2026-04-24 | 1. Customer ComplaintCustomer reported unable to scale out. 2. What customer sawScale-out appeared broken. 3. CSS telemetryCustomer actually has 60 instances. Problem is orchestration logs/container size reporting. 4. Diagnosis & FixNot a scaling issue - reporting/logging problem. Martin investigating. 5. Monitoring Miss?No (reporting issue). 6. RepairsFix orchestration log reporting. | Feedback: (placeholder) Work Items:None |
| RA - INC 785238144 Sev2 RA Title: Speech resources scale Date: 2026-04-25 Requester: AOAI Bridge: Ajinkya | ~8-10% failed connections via private endpoint. ProxyRequest shows traffic reaching gateway, no server_5xx. Issue likely upstream (ExpressRoute). | Feedback: (placeholder) None |
| RA - INC 787253075 Sev2 RA Title: AOAI Model Group Unhealthy Date: 2026-04-28 Requester: Foundry/Hyena Bridge: (no show) | Requester didnt respond. Later: signs of recovery, DNS error fixed, RA no longer needed. | Feedback: (placeholder) None |