I would like to post a solution to the community that would of otherwise caused even further downtime. Hope this will help someone else out there.
So in this particular site there is a Forigate 100F without redundancy. The HW ID is c1aj43-04aa-0000
One day out of the blue it just failed, with the error message like this
NP6XLITE: Error: xaui.usxs[0].usxs_port_sts 00000000.
NP6XLITE: Error: xaui.usxs[4].usxs_port_sts 00000000.
api return err code -3 opcode 5 BCM_VLAN_CREATE_SCC len 56 request 0 reply 1
api return err code -3 opcode 7 BCM_VLAN_PORT_ADD len 56 request 0 reply 2
api return err code -3 opcode 15 BCM_VLAN_CROSS_CONNECT_ADD len 56 request 0 rep ly 3
bcm sdk 140 exit.
bcm_sdk 140 is down with code 1.
Kernel panic - not syncing: BUG! "
The RMA unit came, with the hardware ID c1aj43-20aa-0000
The system came with firmware 6.4.9 which is not something that we use nor a perfect match with our configuration file. So I was told to downgrade the firmware via format and tftp upload. We use 6.0.14 with the config file matching 6.0.12
Upon doing this, the system will come back but I noticed on the boot cli it contain the following error:
NP6XLITE: Error: xaui.usxs[8].usxs_port_sts 00000000.
NP6XLITE: Error: xaui.usxs[12].usxs_port_sts 00000000.
The other messages do not appear.
At this point even with the factory default config, which shows up with show full-configuration, does not process any traffic at all. For example, if you connect directly with the MGMT port which have a default of 192.168.1.99 it does not respond to ping or https.
I tried to load another version of 6.0.x with same results.
I copied the my configuration to a usb drive and restored it via cli. All of the config the system took but again the device do not pass any traffic. No DHCP, no ping, nothing.
When I called Fortinet support I spoke with both the tech support and escalation tech support, but of them think the error message NP6XLITE means the device failed again and I will need another RMA.
I asked is there a minimum firmware version with certain hardware revisions, they stated no and the hardware should take all firmware versions.
After I get off the phone after hours with both support, I wanted to try the version 6.4.9 which came with the device before I RMA it again. Low and behold, the firmware loads and no error messages. I was able to get into the HTTPS and upload the config and the system was fully back with some error -160 and -61. But nothing too major. I was back online.
I wanted to share this experience hopefully it will help someone and save someone a few hours. Thank you.
Solved! Go to Solution.
Nominating a forum post submits a request to create a new Knowledge Article based on the forum post topic. Please ensure your nomination includes a solution within the reply.
Created on 06-05-2022 07:55 PM Edited on 06-05-2022 08:03 PM
As in the doc above, NP6XLite is an npu inside of SOC4 ASIC chip, which is used in those smaller F series models like 40F, 60F, 80F, 100F and 200F. Not sure if these are the all models that have SOC4 though.
Since SOC4 is a relatively new chip, FTNT needs to patch up by software whenever the chip hardware related issues pop up because you just can't replace the chip once the hardware got out to the market. At least I know 6.4.9 specifically fixed one of those issues on 40F, which I experienced with 6.4.8 ("failed to read lif" error msg on console). If you take a look at the release notes and search NP6XLite in Resolved Issues, you can find those.
Since they have 7.2, 7.0 and 6.4, not all problems can get fixed with older software like 6.2, 6.0, at least right away. Besides, 6.2 and older are now out of Engineering support. So if a new problem pops up, they won't spend resources to figure out how to fix it but tell us "upgrade to 6.4, 7.0, etc."
6.4.9 is more strict in terms of some illegal configuration than 6.4.8. I experienced some interface config was thrown away when I upgraded a FGT 6.2.10->6.2.8->6.2.9. My case was overlapped subnet, or similar, which didn't happen in the first step of the upgrade. The first part of the error message in the config-error-log is where the problem is. Then the second part should tell a brief reason why it's thrown out. Check the config of each part in the original config file. Or if you can share one of them, I or somebody else probably can tell why.
<edit> Now I remember what exactly the config problem was. A npu-vlink interface name was the same with a VDOM name. And it was thrown out.</edit>
Toshi
hm did you downgrade after you installed your config?
Once there is a config (except from factroy default config) you are recommended to use a valid upgrade path to not screw anything.
So I would have taken the rma unit and downgraded it to 6.0.14 and afterwardes execute a factory reset on it.
Then install your config and then upgrade it to 6.4.9 accoarding to the upgrade path if still needed or wanted.
I did this way on various FGT that came with a newer firmware than we had in use and I never ran into such issues with that...
--
"It is a mistake to think you can solve any major problems just with potatoes." - Douglas Adams
Nope, fresh RMA out of the box, saw the firmware does not match our config and downgraded right away. I did not load any config to it at all prior to the downgrade.
and that is also what i tried. format, tftp downgrade firmware, exec factoryreset, the device just won't process any traffic until i put it back to 6.4.9.
Based on your description, 6.0.12 must have particular problems not supporting 100F or NP6xlite models well. Another RMA unlikely solve the problem. You might want to try upgrading the current unit to 6.0.14 or further to 6.2.10 if the state of the current unit allows the steps of upgrades. And that might be the only option.
Toshi
Hi Toshi,
yes that was done as well prior to calling support. the latest 6.0.14 will generate the same boot error NP6XLITE: Error: xaui.usxs[8].usxs_port_sts 00000000. and will not process any traffic, even on factory default. I do not know if 6.2.10 will work but the version out of box 6.4.9 worked.
If you can shed some light, what exactly are NP6xLite models?
also, by using 6.4.9 our config from 6.0.12 had quite a few config import errors that were shown with "diagnose debug config-error-log read", mostly -160 and -61 errors. Support have not get back to me about how to address them yet.
Created on 06-05-2022 07:55 PM Edited on 06-05-2022 08:03 PM
As in the doc above, NP6XLite is an npu inside of SOC4 ASIC chip, which is used in those smaller F series models like 40F, 60F, 80F, 100F and 200F. Not sure if these are the all models that have SOC4 though.
Since SOC4 is a relatively new chip, FTNT needs to patch up by software whenever the chip hardware related issues pop up because you just can't replace the chip once the hardware got out to the market. At least I know 6.4.9 specifically fixed one of those issues on 40F, which I experienced with 6.4.8 ("failed to read lif" error msg on console). If you take a look at the release notes and search NP6XLite in Resolved Issues, you can find those.
Since they have 7.2, 7.0 and 6.4, not all problems can get fixed with older software like 6.2, 6.0, at least right away. Besides, 6.2 and older are now out of Engineering support. So if a new problem pops up, they won't spend resources to figure out how to fix it but tell us "upgrade to 6.4, 7.0, etc."
6.4.9 is more strict in terms of some illegal configuration than 6.4.8. I experienced some interface config was thrown away when I upgraded a FGT 6.2.10->6.2.8->6.2.9. My case was overlapped subnet, or similar, which didn't happen in the first step of the upgrade. The first part of the error message in the config-error-log is where the problem is. Then the second part should tell a brief reason why it's thrown out. Check the config of each part in the original config file. Or if you can share one of them, I or somebody else probably can tell why.
<edit> Now I remember what exactly the config problem was. A npu-vlink interface name was the same with a VDOM name. And it was thrown out.</edit>
Toshi
Thank you Toshi, this is the best reply I got so far and spells out what cause the issue. As of now the support is still thinking it's another hardware failure which at this point I just closed the ticket. There was a decent amount of stuff that the upgrade tossed out and hopefully support can eval those errors and let me know what to do next. Thanks for your contribution.
Select Forum Responses to become Knowledge Articles!
Select the “Nominate to Knowledge Base” button to recommend a forum post to become a knowledge article.
User | Count |
---|---|
1688 | |
1087 | |
752 | |
446 | |
226 |
The Fortinet Security Fabric brings together the concepts of convergence and consolidation to provide comprehensive cybersecurity protection for all users, devices, and applications and across all network edges.
Copyright 2024 Fortinet, Inc. All Rights Reserved.