NAT port is exhausted

GreatNetworks · ‎01-26-2014

I keep getting ff. error in my event log every few seconds. The router seems to work fine otherwise. How do i trace the source of the problem and block it? Help is very much appreciated. Will this error have effect on performance in future, will the speed degrade? Number of devices is about there are no more than 10,000 sessions at a time, number of devices about 70. notice it gives same error even when connection drop to 5000-6000 and about 50 devices. the firmware is 4.0 mr2 patch 1 the model of router is 300A there was only 1 WAN interface being used the other 3 WAN interface was idle for the active interface there are 8 Policy rules with per ip shaping rules applied Any ideas, i checked for other similar case but have seen none Level critical Sub Type system ID 20007 Status failure Service kernel Message NAT port is exhausted.

I Live to Solve

ede_pfau · ‎01-26-2014

hi, and welcome to the forums. This situation is quite seldom. Source of the trouble is that there are only 63K ports for outgoing NAT which all of your WAN sessions are using. Depending on the rate of new sessions (per second) I wouldn' t be surprised if at 10K active sessions your FGT might reach this limit soon. When the FGT is running out of available ports it has to deny sessions which ultimately leads to reduced throughput or connection failures. What you could do IMHO: - update your firmware to the latest in 4.2, being 4.2.15. There is a slight chance that intermediate bug fixes help in this situation. - you could reduce the TTL (time to live) for TCP sessions (or, more specific, HTTP sessions if these are the bulk of your outgoing traffic). Be warned that a shorter TTL will lead to a higher rate of new session setups, meaning higher CPU load. Session setup rate is limited by hardware ressources, anyway. On the other hand, used NAT ports are released earlier, making them available for reuse. TTL can be changed either for TCP in general, or for each protocol (HTTP, port 80). This will be in the CLI only. You can find the exact commands in the CLI Reference, depending on the major FortiOS version.

Ede Kernel panic: Aiee, killing interrupt handler!

GreatNetworks · ‎01-26-2014

Thank you Ede! Right now i have around 10k sessions divided into 2.8k tcp sessions with the rest udp sessions. the traffic is basically web or skype and QQ video and audio traffic How do i check how much of NAT table is consumed to validate? I will try to limit timeout of certain traffic particularly the UDP ones, do you have recommendation for SKYPE traffic?

I Live to Solve

emnoc · ‎01-27-2014

The diag sys session full-stat | grep ephemeral e.g misc info: session_count=41090 exp_count=0 clash=0 memory_tension_drop=0 ephemeral=853/57344 removeable=40230 Would give you a listing of sessions, but are you 100% sure your exhausting ephemeral ports? for 10K session and a 300A that doesn' t seem right. if those above two number are NOT equaling, than you don' t have a ephemeral exhaustion problem imho, and need to look else where/ FWIW Adjusting the TTL is short fix, but if you need more ephemeral ports, you will need to look at pools and probably splitting the source ip_range. What I typically do is use no more than a /22 per ip_nat Source. And and then tweak it after monitoring. e.g (inside) ( outside global ) 192.168.{0-3}.0/24 = public_address a.b.c.d 192.168.{4-7}.0/24 = public_address e.f.g.h 192.168.{8-11}.0/24 = public_address i.j.k.l Hope that helps and Ede is correct in that most model will be 63K or less, and you don' t get the full range from 0 thru 65535 .

PCNSE

NSE

StrongSwan

GreatNetworks · ‎01-27-2014

hi emnoc This is my output misc info: session_count=9937 exp_count=0 clash=8014 memory_tension_drop=0 ephemeral=1391/32768 removeable=8544 doesnt look like an ephemeral problem Im not sure how to determine the source of NAT port exhaustion? Awhile ago the sessions reacher 25-27k connections but the error message continually appears even at 5 or 9k Furthermore how do i determine there is negative effect from that i.e. packets being dropped denied or timed out? I did try pools right now (pool of 5 public IP address) randomly used by all outgoing internet connections but no effect on the error messages

I Live to Solve

GreatNetworks · ‎01-27-2014

The router was kind of slow before until i turned off all service to interface https ping etc. You think it might be under attack? Another person suggested it could be the LAN side broadcasting, it has at least 3 subnets of the form 192.168.x.yy behind one LAN interface but im not sure how to trace this beyond get sys session

I Live to Solve

emnoc · ‎01-27-2014

1st what I would do is verify that message. Does it include more information than what your applying and showing here? e.g ( per-fortinet document it should have more fields ) Message ID 20007 Log Subtype System Severity Critical Firmware version FortiOS 4.0 MR3 Meaning The socket is exhausted. Fields Field Description service The type of service. This field always contains kernel. status This field always contains failure. proto The protocol information. src The source IP address. src_port The source port number. nat The NAT information. dst The destination IP address. dst_port The destination port number. msg NAT port is exhausted. 2nd, you really should upgrade that 300A :) 3rd, how' s your overall health in regards to performance and traffic? e.g get system performance status CPU states: 9% user 16% system 0% nice 75% idle Memory states: 39% used Average network usage: 11275 kbps in 1 minute, 15512 kbps in 10 minutes, 15793 kbps in 30 minutes Average sessions: 49519 sessions in 1 minute, 49861 sessions in 10 minutes, 49610 sessions in 30 minutes Virus caught: 0 total in 1 minute IPS attacks blocked: 0 total in 1 minute Uptime: 10 days, 5 hours, 6 minutes diag firewall statistic show getting traffic statistics... Browsing: 690800679 packets, 636081063732 bytes DNS: 532546400 packets, 80861986900 bytes E-Mail: 52094006 packets, 11591063712 bytes FTP: 67863160 packets, 57670206220 bytes Gaming: 1583 packets, 301655 bytes IM: 119980 packets, 41174680 bytes Newsgroups: 16 packets, 679 bytes P2P: 544043 packets, 234339804 bytes Streaming: 367 packets, 31937 bytes TFTP: 22988 packets, 7175563 bytes VoIP: 5205 packets, 2220467 bytes Generic TCP: 78621308 packets, 36740829813 bytes Generic UDP: 109196351 packets, 56152268542 bytes Generic ICMP: 6458849 packets, 458005769 bytes Generic IP: 1598489 packets, 356534444 bytes get sys session list Looking at your avg or schedule poll and taking the 1/10/30 min avg is a good way to size up your session table usage. This is how I confirm my fortigates are not under size or having limitations. Using this command in a expect script and graphing the output is a good way to measure your device avgs or start by deploying cacti Then you can overlay other things like bandwidth graphs from cacti or cpu number once again from cacti. ( I' ve included a cacti graphs output on a problem I was monitor in a FGT110C a few months back as an example ) Lastly, can you identify the policy that' s allowing the traffic and change the action to deny ( temporarily ) and monitor for any improvements? Since you have a low count of policies you can easily monitor these sessions per policy-id. Other things to consider and check; do you have any one single host(s) that' s causing the alert messages based on it' s traffic patterns? ( if you suspect a single host, install a temp policy about all others with a DENY and monitor). is it time of day specific ( TOD )? do you have any UTM features that you can temp disable ? how many session are originating from the firewall directly ( ssh, dns, snmp, sflow, icmp-unreachables,ipsec,etc........)? can you disable local DNS from the firewall temporary ( e.g I had a problem on some 100A that was caused by all of the DNS lookups that it was trying todo ) can you disable the traffic shaper as temporal fix and see if it improves anything ( this is my hunch as to where the problem lays btw )? when the log messages appears, do any user complains? have lack of reach/access to a service(s)/network(s)? do you have any allowaccess to any untrusted sources or external interfaces ? As far as monitoring, I think looking at the session table in whole and then drilling into to look at the numbers and counts over an extended period is warrant. With the get sys session list output, you could put this into xls and graph the number by protocol and src/dst, etc... and pie or table graph it. The top talker and sustained talkers are what you might want to start at first and then look at TTLs. If you need a bigger firewall than you your data warrant the upgrade. I think you might want to start graphing and looking at the traffic-shapers. I hope the above helps, but I think all will say and suggest to upgrade that unit.

PCNSE

NSE

StrongSwan

emnoc · ‎01-27-2014

LAN side broadcasting

And to comment on broadcast, a true broadcast would be local and not in a session table. It should have a TTL = 1

PCNSE

NSE

StrongSwan

GreatNetworks · ‎01-28-2014

Guys thank you for your replies. I finally found the cost but the question is why? It turns out the source of problem is a DNS entry primary DNS points used to point to ISP DNS and secondary to windows server for file sharing Now i recall changing the primary DNS to OPENDNS SERVER in both the NETWORK/OPTIONS and DHCP SERVER/Service settings One of the Internal subnets could sjype but could not browse. When you do NSLOOKUP it says server UNKNOWN which made me realize it was a DNS issue So i changed both entries from OPENDNS to ISP DNS then after all the PC were able to obtain new DNS setting by DHCP the message stopped! question is WHY the DNS setting would have an effect? OpenDNS is free, what would cause the NAT port to be exhausted by changing to OPENDNS? I dont understand this but at least the messages are finally gone! :)

I Live to Solve

GreatNetworks · ‎01-28-2014

hi emnoc There was no specifics in the log no IP no cause for error just a generic NAT port is exhausted message over and over again Here are my router stats get system performance status CPU states: 0% user 4% system 0% nice 96% idle Memory states: 26% used Average network usage: 11319 kbps in 1 minute, 12139 kbps in 10 minutes, 10791 kbps in 30 minutes Average sessions: 11120 sessions in 1 minute, 10881 sessions in 10 minutes, 10590 sessions in 30 minutes Virus caught: 0 total in 1 minute IPS attacks blocked: 0 total in 1 minute Uptime: 3 days, 21 hours, 13 minutes diag firewall statistic show getting traffic statistics... Browsing: 159874318 packets, 103933564323 bytes DNS: 366939049170042880 packets, 1722376374976512 bytes E-Mail: 219530405 packets, 23376 bytes FTP: 4200478015488 packets, 237623360618496 bytes Gaming: 2799 packets, 234982 bytes IM: 171798691840 packets, 858877495083008 bytes Newsgroups: 81345982 packets, 2486 bytes P2P: 0 packets, 0 bytes Streaming: 5828 packets, 480514 bytes TFTP: 5117227935731810304 packets, 1714876790847045641 bytes VoIP: 102645877505 packets, 4992585 bytes Generic TCP: 189906273959936 packets, 12757036416630784 bytes Generic UDP: 0 packets, 0 bytes Generic ICMP: 0 packets, 0 bytes Generic IP: 0 packets, 0 bytes I think you hit it when you said " can you disable local DNS from the firewall temporary ( e.g I had a problem on some 100A that was caused by all of the DNS lookups that it was trying todo )" but why is it doing all kind of lookups with the opendns one and not the isp one? How did you find out it was DNS when you had the problem? Very good tips on troubleshooting i will try each one how do i download the output of get sys session list? it pauses when i use it so i grep the output Have to look into cacti but i must turn on snmp right? What can i do with the syslog output? which is easier nagios or cacti? Seems like im still not maxing out the 300A at 100 users and 25K sessions during peak

I Live to Solve

NAT port is exhausted

Nominate a Forum Post for Knowledge Article Creation