Share via

Azure WS2022 Virtual Machine: Unable to complete SMB transfers due to dup ack storm / connection resets

Justin Bakelaar (Protiviti Inc) 0 Reputation points Microsoft External Staff
2026-05-08T16:07:55.0233333+00:00
  • Windows Server 2022 VMs running under multiple subscriptions are unable to complete SMB transfers over 15GB.
  • These file transfers are occurring over express route IPs from several different on-prem file shares.
  • On the Azure VMs, we have tried disabling MultiChannel on SmbClientConfiguration, disabling Rdma and restarting Lanmanworkstation, increasing session timeout on SmbClientConfiguration, disabling bandwidth throttling on SmbClientConfiguration, and disabling Large Send Offload V2 on NetAdapterAdvancedProperty, and disabling Receive Segment Coalescing - none of which changed the reproducible behavior.
  • Update Mellanox driver from 23.4.26054.1 to 26.1.27016.0 did not change the reproducible behavior.
  • Redeploying the Azure VM to a new host did not change the reproducible behavior.
  • Robocopying the test payload (80GB) from a different on-prem client to the Azure VM (whereby the Azure VM is acting as an SMB server and not an SMB client) succeeds, validating no issues with the Azure storage.
  • While capturing network traffic with WireShark, we can see a dup ack storm when the file copy job freezes at 19% completion, followed by a connection reset.

Logs

A network connection was disconnected.

Instance name: \Device\LanmanRedirector

Server name: \REDACTED

Server address: REDACTED:445

Connection type: TCPIP

InterfaceId: 14

Guidance:

This indicates that the client's connection to the server was disconnected.

Frequent, unexpected disconnects when using an RDMA over Converged Ethernet (RoCE) adapter may indicate a network misconfiguration. RoCE requires Priority Flow Control (PFC) to be configured for every host, switch and router on the RoCE network. Failure to properly configure PFC will cause packet loss, frequent disconnects and poor performance.

A network connection was disconnected.

Instance name: \Device\LanmanRedirector

Server name: \REDACTED

Server address: REDACTED:445

Connection type: TCPIP

InterfaceId: 14

Guidance:

This indicates that the client's connection to the server was disconnected.

Frequent, unexpected disconnects when using an RDMA over Converged Ethernet (RoCE) adapter may indicate a network misconfiguration. RoCE requires Priority Flow Control (PFC) to be configured for every host, switch and router on the RoCE network. Failure to properly configure PFC will cause packet loss, frequent disconnects and poor performance.

**
Capture screenshotUser's image**

Azure Virtual Machines
Azure Virtual Machines

An Azure service that is used to provision Windows and Linux virtual machines.


1 answer

Sort by: Most helpful
  1. Q&A Assist
    2026-05-08T16:08:32.6733333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    The behavior matches an SMB/TCP connectivity problem on the path between the on‑prem file servers and the Azure Windows Server 2022 VMs, not an issue with Azure VM storage or SMB server functionality.

    Key points from the scenario and context:

    1. SMB transfers only fail when the Azure VM acts as SMB client over ExpressRoute to on‑prem shares and the payload is large (>15 GB). The same VM can receive an 80‑GB file from on‑prem when acting as server, which validates that the VM’s storage and SMB server stack are healthy.
    2. WireShark shows a TCP duplicate ACK storm followed by a connection reset at ~19% completion, which is consistent with packet loss or congestion on the network path.
    3. The SMB client logs explicitly mention that frequent disconnects can be caused by misconfigured RDMA over Converged Ethernet (RoCE) networks, especially missing Priority Flow Control (PFC), which leads to packet loss and poor performance.
    4. The context notes that SMB performance is closely tied to network performance and that packet loss will trigger TCP congestion control and throttling, which can severely impact large transfers.

    Given this, the most likely root cause is a network‑side misconfiguration or limitation on the ExpressRoute/on‑prem side, particularly around RDMA/RoCE or network offload features, rather than a defect in Windows Server 2022 SMB.

    Recommended actions (focused on the network path and SMB/TCP stack):

    1. Validate RDMA/RoCE configuration end‑to‑end
      • Confirm whether the on‑prem path to ExpressRoute uses RDMA over Converged Ethernet (RoCE) NICs or switches.
      • If RoCE is in use, ensure Priority Flow Control (PFC) is configured consistently on all hosts, switches, and routers on that path, as the SMB log guidance states that missing PFC causes packet loss, frequent disconnects, and poor performance.
      • If PFC cannot be guaranteed end‑to‑end, consider disabling RDMA for this path and using standard TCP/Ethernet instead.
    2. Check for packet loss and MTU issues on ExpressRoute
      • Use network monitoring on both on‑prem and Azure sides to look for drops, errors, or retransmits on the interfaces carrying the SMB traffic.
      • Verify that MTU settings are consistent along the path; mismatched MTU can cause fragmentation and loss, which will manifest as duplicate ACK storms and resets.
    3. Ensure SMB network offload features are correctly configured
      • The guidance for slow SMB transfers emphasizes that network offloading technologies such as Receive Side Scaling (RSS), Large Send Offload (LSO), Receive Segment Coalescing (RSC), and checksum offloads should remain enabled because they improve throughput and reduce CPU usage.
      • Re‑enable any offloads that were disabled for testing (for example, LSO and RSC) and instead focus on fixing the underlying packet‑loss issue.
    4. Verify SMB client configuration for large transfers
      • On the Azure Windows Server 2022 VMs, ensure that bandwidth throttling is disabled and large MTU is enabled for SMB:
             Set-SmbClientConfiguration -EnableBandwidthThrottling 0 -EnableLargeMtu 1
        
      • Confirm that SMB Multichannel is enabled and functioning, as it can improve resilience and performance when multiple NICs or RSS/RDMA are available.
    5. Check for security features impacting RDMA
      • If SMB Signing or SMB Encryption is enabled on the path, be aware that, according to the context, SMB stops using RDMA direct data placement and falls back to a less efficient send/receive mode with smaller MTU, which can reduce performance and increase fragmentation.
      • Evaluate whether signing/encryption is required for this specific path; if they are mandatory, plan for the reduced performance and ensure the network can handle the additional overhead without loss.
    6. Isolate by testing alternative paths
      • Since the issue reproduces across subscriptions but only for SMB client traffic over ExpressRoute, test:
        • A smaller test file that just exceeds the failure threshold (for example, 20 GB) to confirm the size dependency.
        • A different network path (VPN or another ExpressRoute circuit) if available, to see whether the problem is specific to a particular circuit or on‑prem network segment.

    By focusing on eliminating packet loss and misconfiguration in the RDMA/RoCE and ExpressRoute path, and by ensuring SMB client offloads and large MTU are correctly configured, the duplicate ACK storms and connection resets during large SMB transfers should be resolved.


    References:

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.