English Community

Datacenter NetworkingDatacenter Networking Hardware
All Forum Topics
Options

46 Posts

02-07-2017

CH

54 Signins

766 Page Views

  • Posts: 46
  • Registered: ‎02-07-2017
  • Location: CH
  • Views: 766
  • Message 1 of 9

LACP with 8124e and Linux-Server (SLES) mode: active-active / Load Balancing

2017-04-21, 13:48 PM

Hello :smileyhappy:

 

I have two 8124e configurated as one (ISL). I have also Linux-Servers (SLES) and I have them connected with LACP-bond.

 

My issue: the connection-speed (throughput) is not active-active / Load Balancing between the Linux-Server and the Switches.

 

The Switches are configured as written here: https://forums.lenovo.com/t5/Datacenter-Networking-Hardware/LACP-between-8124e-and-Linux-Server/td-p/3635154

 

The bond options on the linux-Server:

 

BONDING_MASTER='yes'
BONDING_MODULE_OPTS='mode=802.3ad miimon=100 lacp_rate=1'
BONDING_SLAVE0='eth0'
BONDING_SLAVE1='eth1'

The Infomration to the bond:

 

# cat /proc/net/bonding/bond0    
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
        Aggregator ID: 2
        Number of ports: 2
        Actor Key: 17
        Partner Key: 3002
        Partner Mac Address: 08:<asdasdasdasd>

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 5c:<asdasdasdasd>
Aggregator ID: 2
Slave queue ID: 0

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 5c:<asdasdasdasd>
Aggregator ID: 2
Slave queue ID: 0

both 8124e:

...
interface port 15
	lacp mode active
	lacp key 3001
...
vlag adminkey 3001 enable
...
#show lacp information 
port    mode    adminkey  operkey   selected   prio  aggr  trunk  status  minlinks
----------------------------------------------------------------------------------
...
15      active      3001     3001    yes      32768    15     14    up        1

the traffic speed is still 1GB not 2x1GB (between 10GB:NIC-Linux-Server<->2x8124e(ISL)<->2x1GB:NIC-Linux-Server):

 

119 MB/s

any ideas?

 

Thank you very much and regards

 

 

 

Solved! See the solution
Reply
Options

50 Posts

07-28-2015

IT

86 Signins

581 Page Views

  • Posts: 50
  • Registered: ‎07-28-2015
  • Location: IT
  • Views: 581
  • Message 2 of 9

Re: LACP with 8124e and Linux-Server (SLES) mode: active-active / Load Balancing

2017-04-21, 19:41 PM

Hi Luke,

how did you measure the traffic speed ( I think you meant the throughput or bandwidth)?

Anyways, you have to know that the link aggregation uses an hash algorithm to balance the traffic, and in G8124E  it is not "round robin". As default, it is "Source IP addr - Destination IP addr" pairs. So, if you tested the performance using a single pair, all the traffic is delivered over a single link.

To verify the improvement and get benefit of the link aggregation, you should setup a multiple pairs of traffic streaming.

You can find the possible configurable hash algorithms in the Application Guide Manual - chapter 8 Ports and Link Aggregation - Configurable LAG Hash Algorithm (page 145)

http://systemx.lenovofiles.com/help/topic/com.lenovo.rackswitch.g8124e.doc/G8124-E_AG_8-3.pdf

 

I hope that this helps you.

ciao Maurizio

Reply
Options

46 Posts

02-07-2017

CH

54 Signins

766 Page Views

  • Posts: 46
  • Registered: ‎02-07-2017
  • Location: CH
  • Views: 766
  • Message 3 of 9

Re: LACP with 8124e and Linux-Server (SLES) mode: active-active / Load Balancing

2017-04-22, 6:41 AM

Hello mauriziof,

 

Thank you very much for your answer with the details and references .

 

 "how did you measure the traffic speed ( I think you meant the throughput or bandwidth)?"

 

Server-X:10Gb*-NIC|=====>|8124e|===LACP/VLAG====>1Gb-NIC |Server-Y

                 10Gb*-NIC|=====>|           |===LACP/VLAG====>1Gb-NIC |

 

*at the moment it's configured as active-failover bond.

 

I measured with an SCP-File transfer from Server-X to Server-Y.

 

And I also measured with netcat:

 

Server X:

 

dd if=/dev/zero bs=1024K count=2048 | netcat -v SERVER-Y 2222

Server Y:

netcat -v -v -l -p 2222 > /dev/nul

With the Server-Y 2x1GB-LACP-NIC it should be about 240 MB/s but it's still aboud 120 MB/s.

 

I also watch the LACP/VLAG-ports from the Server-Y on the 8124e, only one has traffic during the tests.

 

 "As default, it is "Source IP addr - Destination IP addr" pairs."

 

We have the default hash:

show portchannel hash 
Current  Trunk Hash settings: 
    sip dip  

"To verify the improvement and get benefit of the link aggregation, you should setup a multiple pairs of traffic streaming."

 

Don't do I use multiple pairs of traffic streamming with the 2x1GB (LACP/VLAG) on Server-Y? Server-X is fast enough with the 10GB-NIC. Maybe I understand this point wrong?

 

Is it nessesary to change the hash option? Is there maybe a Problem with the LACP-driver from the Server-Y (Linux-SLES11)

 

Thanks and regards

 

Reply
Options

46 Posts

02-07-2017

CH

54 Signins

766 Page Views

  • Posts: 46
  • Registered: ‎02-07-2017
  • Location: CH
  • Views: 766
  • Message 4 of 9

Re: LACP with 8124e and Linux-Server (SLES) mode: active-active / Load Balancing

2017-04-22, 6:54 AM
Reply
Options

46 Posts

02-07-2017

CH

54 Signins

766 Page Views

  • Posts: 46
  • Registered: ‎02-07-2017
  • Location: CH
  • Views: 766
  • Message 5 of 9

Re: LACP with 8124e and Linux-Server (SLES) mode: active-active / Load Balancing

2017-04-22, 6:58 AM
Reply
Options

46 Posts

02-07-2017

CH

54 Signins

766 Page Views

  • Posts: 46
  • Registered: ‎02-07-2017
  • Location: CH
  • Views: 766
  • Message 6 of 9

Re: LACP with 8124e and Linux-Server (SLES) mode: active-active / Load Balancing

2017-04-22, 7:00 AM
Reply
Options

300 Posts

03-31-2015

US

670 Signins

5351 Page Views

  • Posts: 300
  • Registered: ‎03-31-2015
  • Location: US
  • Views: 5351
  • Message 7 of 9

Re: LACP with 8124e and Linux-Server (SLES) mode: active-active / Load Balancing

2017-04-22, 22:42 PM

Hello Luke,

 

Per the IEEE aggregation specs (previously known as 802.3ad but now defined in 802.1AX) the following might help explain your observations:

1) Traffic across an aggregation should be “session-based”, to reduce the likelihood of out of order packet deliver to the final destination (in other words, -not- round-robin). Devices sending traffic into an aggregation decide what constitutes a “session”, and most (if not all) vendors support defining what is a session by setting a "hash" to be used to transmit data in to the aggregation

2) Some more common hashes are based off of source and/or destination MAC, source and/or destination IP and source and/or destination TCP/UDP port. And almost all hashes also have some proprietary (not openly documented) decision making items (sometimes as simple as, do I have an odd or even number of ports in the aggregation, up to some very complex arraignments – all not documented to the public, since the IEEE does not demand everyone have the same hash options or what those options mean).

3) There is no guarantee of ever load balancing links in an aggregation. Statistically speaking, the more “sessions” you have, the better chance of load balancing

4) A given device only gets to determine how it transmits packets into an aggregation, not how it receives them. So each side decides how to send packets, so each sides hash can be different, and often will result in slightly different load balancing in one direction over the other.

 

So what does all of this mean to your testing?

 

With your current test (a single copy from one server to another server) you will never exceed the bandwidth of a single link in the aggregation. This is the nature of an aggregation and no amount of hash tuning will change this.

 

With your current test, you might notice the traffic going out one NIC but coming back in the other. Or it might go out and come back on a single link. That is also the nature of aggregation. Each side controls how it sends traffic into an aggregation. Specific to the test you defined above and I have re-pasted below for reference:

 

Server-X:10Gb*-NIC|=====>|8124e|===LACP/VLAG====>1Gb-NIC |Server-Y

                10Gb*-NIC|=====>|           |===LACP/VLAG====>1Gb-NIC |

 

First, is that a typo on the right hand server Y where it says “1Gb-NIC”? I’m guessing that should have been shown as 10Gb-NIC based on your speed reported, but just want to make sure. Assuming it is meant to be 10G-NIC, and based on each server being 10G LACP connected to the same pair of vLAGed switches, you will observer an artifact of vLAG, something called “local preference”. Without local preference, any traffic the vLAG switches send to either host might hash to one link on one switch or the other link on the other switch, meaning traffic that comes in from the left server might decide to cross the ISL and go out the link on the OTHER switch  toward the Y server. This is very inefficient and wasteful of ISL bandwidth, so “local preference” says, if a packet comes into one switch, and there is a path out that is up on both switches (as there would be in your case), always use the local link. So in your case, with your current design, and current single copy test, you should see all traffic using the same link for inbound and outbound traffic.

 

If you want to try to see a better use of links (and as noted above, there is no guarantee at all it ever will, just statistically speaking, it should), add more copy sessions ideally with variables that change the sessions to use the links differently. For example, changing the test by adding more copy destinations. And then experiment with hashes to encourage a determination of different sessions.

 

Before any of this I want to point out another aspect of your description of testing. You mentioned “*at the moment it's configured as active-failover bond.”. If you are really defining these as “active/failover”, that will not load balance at all. You need to make sure BOTH links on BOTH hosts are active/active. And, I hate to be a broken record, but there are STILL no guarantees it will load balance after that change.

 

All of the above, by the way, is one of the biggest differences between using aggregated links as opposed to a single higher speed link. For example, while a 10x1G link and a 1x10G link in theory have similar total bandwidth (10Gbps), in reality, the single 10G link will almost ALWAYS outperform an aggregation of 10x1G links. Especially if you are testing with a single session (a single session copy through the 10x1G link will never exceed 1G, while that same copy could approach 10G on the single 10G link). And even if you have many sessions running in the test, since there is no guarantee of the efficiency of any given data streams moving through these pipes, the 10G will still probably outperform the 10x1G path.

 

One final comment on the nature of aggregations, even if you set up multiple sessions and tune the hash on each side to get the best load balance for the test. There is no guarantee once you are in real production, running real loads, that the hashing that worked perfect for your test will be the same hash you need for your normal production loads. In other words, tuning hashes can be an effort in futility, as your loads may change.

 

With that said, I typically recommend trying to pick a more advanced hash over the simpler hashes, as in my experience they tend to produce the best load balance under the most typical loads.

 

So back to your specific testing, some things that might help:

A) You will definitely need more than a single copy session if you ever want to see more than a single link in use

B) Recommend at least for this test, to use some sort of IP based hash, preferably XORed for better randomness, and have at least three devices in the test (one source and two destinations or vice versa)

C) During the test, if you do not see multiple links getting used, try changing one of the IP addresses to end in odd or even number from what it had been using in the first test

D) Make sure you are using active/active on the bond in the Linux hosts

E) Use the command “show int port X bitrate” (X = port or range of ports separate by commas and dashes) on each Lenovo switch while running the test to see real-time reporting of input/output rates while under test (refreshes on a per-second basis until control-C is pressed).

 

Hopefully the above has helped explain things, but please let us know if you still have questions.

 

Thanks, Matt

0 person found this solution to be helpful.

This helped me too

Reply
Options

46 Posts

02-07-2017

CH

54 Signins

766 Page Views

  • Posts: 46
  • Registered: ‎02-07-2017
  • Location: CH
  • Views: 766
  • Message 8 of 9

Re: LACP with 8124e and Linux-Server (SLES) mode: active-active / Load Balancing

2017-04-28, 6:50 AM

Hello Matt,

 

Thank you very much for your very detailed answer! :smileyhappy:

 


wrote:

First, is that a typo on the right hand server Y where it says “1Gb-NIC”? I’m guessing that should have been shown as 10Gb-NIC based on your speed reported, but just want to make sure.


Server Y has 2x1GB LACP Bond.

 

I will try the tests!

Reply
Options

46 Posts

02-07-2017

CH

54 Signins

766 Page Views

  • Posts: 46
  • Registered: ‎02-07-2017
  • Location: CH
  • Views: 766
  • Message 9 of 9

Re: LACP with 8124e and Linux-Server (SLES) mode: active-active / Load Balancing

2017-07-21, 9:25 AM

LACP is working with Load-Balance:

I guess not Point-2-Point but with diffrent connections at the same time.

Reply
Forum Home

Community Guidelines

Please review our Guidelines before posting.

Learn More

Check out current deals!

Go Shop
X

Save

X

Delete