12-02-2019 02:12 PM - edited 12-03-2019 07:04 AM
We have a cluster of 32 SR630 nodes.
1. All nodes are connected to a NE10032 switch (NOS 10.8.1.0).
2. We have two NICs on 32 nodes. Node 1 to 8 have "Mellanox ConnectX®-4 Lx ML2 1x25GbE SFP+ Adapter" (FW: 14.23.1020).
3. Node 9 to 32 have "Mellanox ConnectX®-4 Lx 1x40GbE QSFP+ Adapter" (FW: 14.20.1030).
4. Node 1 to 4 and node 5 to 8 are connected to the switch using 100G QSFP28 to 4x 25G SFP28 breakout cables.
5. Node 9 to 32 are connected to the switch directly.
6. Two ports on the switches are configured to work in 4x25Gbps mode for node 1 to 8.
When I try to use ib_send_bw to test the bandwidth between a node with 25Gbps NIC (say node-3) and node with 40Gbps (say node-9), the performance is different when I switch the client and server. The result figure is attached. The drop of node-9 to node-3 with request size larger than 8K is weird.
Update: FW upgraded to 14.25.1020 on both nodes. Nothing changes.
Does anyone experience similar things before? How can I solve it?