Fortinet Forum
The Forums are a place to find answers on a range of Fortinet products from peers and product experts.
ChrisStankevitz
New Contributor

30 seconds of packet loss when switch is re-connected

Hi,

 

Setup

I have a large network which is described in the image below.  The image is simplified.  In my real network there are about 40 switches like "Switch C" -- switches that are connected to both "Switch A" and "Switch B".  Also I have a Fortigate HA pair (Active/Passive) connected to "Switch A" and "Switch B".  Running firmware 6.2.2

 

Problem

In my network, I can ping from "Host 1" to "Host 2".  If I pull the power from "Switch D", my ping from "Host 1" to "Host 2" continues to work.  This is expected.  However, when I re-apply the power to "Switch D", and after "Switch D" boots, pings from "Host 1" to "Host 2" become sporadic.  This "packet loss" lasts for about 30 seconds.  Then the pings return to normal.

 

Question

Can anyone tell me (or guess) what the problem is or how I could go about debugging?  I duplicated the network with a spare pair of Fortigates and 448/224 switches and I was unable to reproduce.

 

Thank you,

 

Chris

 

switch-join.png

8 REPLIES 8
vponmuniraj
Staff
Staff

Hi Chris,

 

Does the traffic from host A to host B traverse through the firewall? If so, I would start by checking if there are any route changes happening on the firewall when the switch comes online. 

 

Also, perform a sniffer / flow debug to check if FGT is dropping the packets. 

 

 

Regards,

Vignesh.
Toshi_Esumi
Esteemed Contributor II

You need to put those FGTs in HA in the diagram and VLAN(s)/IP subnet(s). Otherwise nobody can know the topology involving the FGTs. But my feeling is it's on the switch side involving "port-fast".

 

Toshi

Yurisk
Valued Contributor

With 40+ switches I am pretty sure you have a central log storage for all of them, I'd check what log level is needed to catch STP port/VLAN state changes (Forwarding/Blocking) and look for them in the logs. Problem that occurs every 30 seconds each time seems more like a timers issue, especially with Spanning Tree Convergence times. 

Yuri
https://yurisk.info/ blog: All things Fortinet, no ads.


All opinions are mine only.
Muhammad_Haiqal

HI ChrisStankevitz,

Since you mentioned the network is big, im afraid this issue due to the network deployment itself.
Please take a look on this KB:

 

Troubleshooting Tip: LACP issue
https://kb.fortinet.com/kb/microsites/microsite.do?cmd=displayKC&docType=kc&externalId=FD50620

 

High availability basic deployment design
https://kb.fortinet.com/kb/microsites/microsite.do?cmd=displayKC&docType=kc&externalId=FD47572


That KB should provide general idea on the deployment.
Hope that helps.

 

haiqal
ChrisStankevitz
New Contributor

Thank you all for your replies.  A couple of comments which I should have included in my original post:

 

  1. The dropped traffic between "Host 1" and "Host 2" is on the same VLAN.  No routing.  The fortigate does not see the traffic (in the logs), which is what I would expect.  So I doubt the firewalls are playing any direct role in the problem
  2. @vponmuniraj what do you mean by "route changes"?  Are you talking about Layer 3?  Regardless, how do I monitor "route changes" on the switches or firewalls?
  3. @Toshi_Esumi I have two Fortigates (running 6.2.2) in HA Active-Passive.  Each fortigate is connected to both "Switch A" and "Switch B".  Similar to how the hosts are connected to these switches.
  4. @Yurisk I suspect STP or some fortinet ISL equivalent also.  Particularly in light of item 5 below.  However, the problem does not happen regularly at 30 second intervals.  When "Switch D" is connected to "Switch C", the traffic between "Host 1" and "Host 2" becomes "spotty"... and this misbehavior lasts for only about 30-45 seconds.  After this time, the traffic returns to normal and there is no more trouble.  The switch logs show nothing interesting.  They do show STP messages when individual ports come up and down.
  5. The problem appears in an additional way also.  Please see the imaged I pasted below.  If I leave all switches connected and powered up... but disconnect and reconnect each of the ISLs listed below, 1-4 (waiting one minute between each disconnect or connection), the problem (packet loss between "Host 1" and "Host 2") will also happen temporarily.

My sales engineer suggested that a "loop" is forming between switches C and D, which doesn't make sense to me.

 

Complicating this:  I'm working in an air gapped environment so getting support is difficult.  I tried to replicate with spare fortigates and switches, but so far I'm unsuccessful.

 

Does anyone know how I can diagnose/trace the "logic" that the switches go through when "Switch D" joins the topology or when ISL connections come up and down?

 

ChrisStankevitz_0-1655117927954.png

 

Muhammad_Haiqal

Hi ChrisStankevitz,
That design is too general. i can create about 2-5 solution based on that design.
Example:

ISL-1 and ISL-2 is 1 LACP. OR ISL-1 to ISL-4 is 1 LACP. 
LACP on Switch A is for which port?

Looking at this general design itself, i believe this is more to the switch configuration. I would suggest to contact switch support to help you on this.

haiqal
ChrisStankevitz

Thank you -- can you think of ANY configuration of ISL that will cause packet loss between "Host 1" and "Host 2" when "Switch D" is connected to "Switch C"?

 

Please consider if I DESIRE packet loss between "Host 1" and "Host 2" when "Switch D" is connected to "Switch C".  Can you think of ANY configuration of ANYTHING that would accomplish such packet loss?

Toshi_Esumi
Esteemed Contributor II

Then you should post this at Cisco Community instead of Fortinet Community.

 

Toshi