I have four FAP320B APs controlled by a cluster of FGT90 firewalls.
Periodically I will see individual APs "leave" the network. The APs will rejoin after some period of time (minutes). Less frequently, I will see all four APs leave the network more or less simultaneously and rejoin after some period of time (the same minutes timespan). This happens on the order of 5 to 40 individual events per day.
Now when one AP leaves, usually the client will join to another nearby AP and all is well. However when all four leave simultaneously, there are obviously no APs available to service connection requests and the users notice. This happens on the order of 1 to 3 times per day.
The FortiGate is not logging any useful information.
This is an example of a failure: clients notice a problem, then approximately two minutes later the FortiGate logs this:
2015-02-26T09:01:14.128012-05:00 wheel date=2015-02-26 time=09:01:14 devname=fw-ottawa-A devid=FGT90D3Z-------- logid=0104043522 type=event subtype=wireless level=notice vd="root" logdesc="physical AP activity" sn="FP3 20B3X--------" ap="elpfap1" profile="FAP320B-default" ip=10.8.0.31 meshmode="mesh root ap" snmeshparent="N/A" action="ap-fail" reason="Control message maximal retransmission limit reached" msg="AP elpfap1 failed." 2015-02-26T09:01:14.128646-05:00 wheel date=2015-02-26 time=09:01:14 devname=fw-ottawa-A devid=FGT90D3Z-------- logid=0104043522 type=event subtype=wireless level=notice vd="root" logdesc="physical AP activity" sn="FP320B3X--------" ap="elpfap1" profile="FAP320B-default" ip=10.8.0.31 meshmode="mesh root ap" snmeshparent="N/A" action="ap-leave" reason="Control message maximal retransmission limit reached" msg="AP elpfap1 left." 2015-02-26T09:01:45.390396-05:00 wheel date=2015-02-26 time=09:01:45 devname=fw-ottawa-A devid=FGT90D3Z-------- logid=0104043522 type=event subtype=wireless level=notice vd="root" logdesc="physical AP activity" sn="FP320B3X--------" ap="elpfap1" profile="FAP320B-default" ip=10.8.0.31 meshmode="mesh root ap" snmeshparent="N/A" action="ap-join" reason="N/A" msg="AP elpfap1 joined." 2015-02-26T09:01:45.393459-05:00 wheel date=2015-02-26 time=09:01:45 devname=fw-ottawa-A devid=FGT90D3Z-------- logid=0104043526 type=event subtype=wireless level=notice vd="root" logdesc="physical AP radio activity" sn="FP320B3X--------" ap="elpfap1" ip="10.8.0.31" radioid=1 configcountry="US " opercountry="US " cfgtxpower=23 opertxpower=21 action="config-txpower" msg="AP elpfap1 radio 1 cfg txpower is changed to 23 dBm." 2015-02-26T09:01:45.394045-05:00 wheel date=2015-02-26 time=09:01:45 devname=fw-ottawa-A devid=FGT90D3Z-------- logid=0104043526 type=event subtype=wireless level=notice vd="root" logdesc="physical AP radio activity" sn="FP320B3X--------" ap="elpfap1" ip="10.8.0.31" radioid=2 configcountry="US " opercountry="US " cfgtxpower=27 opertxpower=22 action="config-txpower" msg="AP elpfap1 radio 2 cfg txpower is changed to 27 dBm." 2015-02-26T09:01:45.543350-05:00 wheel date=2015-02-26 time=09:01:45 devname=fw-ottawa-A devid=FGT90D3Z-------- logid=0104043526 type=event subtype=wireless level=notice vd="root" logdesc="physical AP radio activity" sn="FP320B3X--------" ap="elpfap1" ip="10.8.0.31" radioid=1 configcountry="US " opercountry="US " cfgtxpower=23 opertxpower=21 action="country-config-success" msg="AP elpfap1 radio 1 country US (841) set success." 2015-02-26T09:01:45.547094-05:00 wheel date=2015-02-26 time=09:01:45 devname=fw-ottawa-A devid=FGT90D3Z-------- logid=0104043526 type=event subtype=wireless level=notice vd="root" logdesc="physical AP radio activity" sn="FP320B3X--------" ap="elpfap1" ip="10.8.0.31" radioid=1 configcountry="US " opercountry="US " cfgtxpower=23 opertxpower=21 action="oper-txpower" msg="AP elpfap1 radio 1 oper txpower is changed to 21 dBm." 2015-02-26T09:01:45.547569-05:00 wheel date=2015-02-26 time=09:01:45 devname=fw-ottawa-A devid=FGT90D3Z-------- logid=0104043526 type=event subtype=wireless level=notice vd="root" logdesc="physical AP radio activity" sn="FP320B3X--------" ap="elpfap1" ip="10.8.0.31" radioid=1 configcountry="US " opercountry="US " cfgtxpower=23 opertxpower=21 action="country-config-success" msg="AP elpfap1 radio 1 country US (841) set success." 2015-02-26T09:01:45.550162-05:00 wheel date=2015-02-26 time=09:01:45 devname=fw-ottawa-A devid=FGT90D3Z-------- logid=0104043526 type=event subtype=wireless level=notice vd="root" logdesc="physical AP radio activity" sn="FP320B3X--------" ap="elpfap1" ip="10.8.0.31" radioid=1 configcountry="US " opercountry="US " cfgtxpower=23 opertxpower=21 action="oper-channel" msg="AP elpfap1 radio 1 operating channel 0 ==> 149." 2015-02-26T09:01:45.658946-05:00 wheel date=2015-02-26 time=09:01:45 devname=fw-ottawa-A devid=FGT90D3Z-------- logid=0104043526 type=event subtype=wireless level=notice vd="root" logdesc="physical AP radio activity" sn="FP320B3X--------" ap="elpfap1" ip="10.8.0.31" radioid=2 configcountry="US " opercountry="US " cfgtxpower=27 opertxpower=22 action="country-config-success" msg="AP elpfap1 radio 2 country US (841) set success." 2015-02-26T09:01:45.712637-05:00 wheel date=2015-02-26 time=09:01:45 devname=fw-ottawa-A devid=FGT90D3Z-------- logid=0104043526 type=event subtype=wireless level=notice vd="root" logdesc="physical AP radio activity" sn="FP320B3X--------" ap="elpfap1" ip="10.8.0.31" radioid=2 configcountry="US " opercountry="US " cfgtxpower=27 opertxpower=22 action="oper-txpower" msg="AP elpfap1 radio 2 oper txpower is changed to 22 dBm." 2015-02-26T09:01:45.713336-05:00 wheel date=2015-02-26 time=09:01:45 devname=fw-ottawa-A devid=FGT90D3Z-------- logid=0104043526 type=event subtype=wireless level=notice vd="root" logdesc="physical AP radio activity" sn="FP320B3X--------" ap="elpfap1" ip="10.8.0.31" radioid=2 configcountry="US " opercountry="US " cfgtxpower=27 opertxpower=22 action="country-config-success" msg="AP elpfap1 radio 2 country US (841) set success." 2015-02-26T09:01:45.766208-05:00 wheel date=2015-02-26 time=09:01:45 devname=fw-ottawa-A devid=FGT90D3Z-------- logid=0104043526 type=event subtype=wireless level=notice vd="root" logdesc="physical AP radio activity" sn="FP320B3X--------" ap="elpfap1" ip="10.8.0.31" radioid=2 configcountry="US " opercountry="US " cfgtxpower=27 opertxpower=22 action="oper-channel" msg="AP elpfap1 radio 2 operating channel 0 ==> 11."
The APs are using TrendNET TPE-113GI POE injectors for power. These are connected back to Dell PowerConnect 5548 switches, which then connect to Juniper EX2200 switches, and finally into the firewall cluster.
When examined through the Manged Devices pane, the devices are not showing as being rebooted because of these outages.
The APs remain pingable during these outages.
Things that support has had me try:
- downgrade the APs from 5.2.2 to 5.0.9 (the firewall is still running 5.2.2) -- this reduced my individual ap-fail frequency from hundreds per day down to what you see now, easilly an order-of-magnitude drop
- stop using the APs passthrough -- some of the APs had computers daisy-chained off of their second ethernet ports; interestingly when the APs "left", the computers could not pass information through to the rest of the network; changing this had no effect on the events (although it has made those previously-daisy-chained computers much more reliable)
- change which ethernet device is connected to the injector -- this had no effect
- check the FortiGate management page during an outage to see if the device was present or not -- because these events are unpredictable and brief, I have not been able to "catch" the device in the act so to speak.
Does anyone have any ideas what I can do to diagnose and correct this issue. I have tickets in to support, but as you can see above the results have been less than ideal.
For the 320Bs you're still best on the 5.0 branch however an interim build may fox your issue... can I ask for your support ticket number? I'll look into it.
Ticket is 1339527
Update: my support contact has provided me with a different firmware, build 0089 (the 5.0.9 firmware is 0086). We have also increased the value of wireless-controller global/max-retransmit from 3 to 6.
Things appear marginally better -- i am not seeing the coordinated drops where all four APs vanish at the same time -- but the overall number of drops per day is not substantially improved.
I also managed to catch an event while it happened (a single AP drop). I discovered that the network is available right up until the firewall declares an ap-fail, at which point the SSID stops being broadcast for about 40 or 50 seconds (which I presume is the radios going through their initialization routine). The APs do not disappear from the firewall's management console or indicate in any way that they are in a failed or initializing mode.
If all the APs losing connectivity to controller, it could be caused by
1) network transient issue>---check switch logs to see whether there is event at that moment
2) whether CPU is high on FGT at the moment the issue was happening
3) daemon might be restarted on FGT
If the rejoin is intermittent, need to identify a relatively predicable AP,
1) turn on wtp daemon logs on AP
2) check /tmp/ping_results.txt file on AP
3) capture packets over the network ( switch port to AP, switch port to controller) to see whether packets are dropped some where
4) if there is no packet drop from AP to controller while controller doesn't respond to a particular packet, then we have to turn on the daemon logs on FGT for this AP to see
Select Forum Responses to become Knowledge Articles!
Select the “Nominate to Knowledge Base” button to recommend a forum post to become a knowledge article.
User | Count |
---|---|
1737 | |
1108 | |
752 | |
447 | |
240 |
The Fortinet Security Fabric brings together the concepts of convergence and consolidation to provide comprehensive cybersecurity protection for all users, devices, and applications and across all network edges.
Copyright 2024 Fortinet, Inc. All Rights Reserved.