Fortinet Forum
The Forums are a place to find answers on a range of Fortinet products from peers and product experts.
scheuri
New Contributor III

dns queries not coming through AFTER internet outage

Hi all

 

I am somewhat stumped about this issue and I am not sure where to start looking.

 

Problem:

There is an app with automatic checks to the internet which also automatically repeats dns queries to our internal dns servers. Those queries come from five or six source ports from each of the two IP addresses (the app runs on two servers) and there are quite a lot of those requests.

This app runs on a virtual machine which is connected to a fortigate 60F with 6.2.9.

The query goes to that firewall and then trough a VPN to another fortigate and from there to the DNS.

 

  • If the internet goes offline for MORE than about five minutes and then comes back again, the automatic queries don't get an answer anymore (as if the DNS server never gets the requests).
  • If the internet goes offline for LESS than five minutes, then there is no issue.
    (I am not 100% sure about the exact amount of time).

 

Some more info:

  • Unfortunately I have no firewall logs from that time it happens
  • The customer is very reluctant about taking the branch offline again (so I am still planning more tests)
  • There are no DNS filter security policies or any IPS or DoS policies (there are actually no UTM features licensed).
  • The dns-session-helper is active.

 

Question:

Does anyone have an idea what I could check that might cause this issue (no response to automatic dns queries from this particular app after being MORE than five minutes offline)?

Thanks a lot

1 Solution
pminarik
Staff
Staff

Half-educated guess: When the issue happens again, check if the relevant source-IPs have existing sessions for the DNS traffic. If yes, check which egress interface was chosen. You can also try clearing the sessions, and then check if that "magically" solves the issue.


What I'm thinking: When the tunnel is down (=when the internet connection is down), your only route towards the DNS servers, normally reachable through the tunnel, might be through your WAN interface due to the default route matching it. If that happens, and there is a firewall policy that could allow this traffic, then that traffic likely also gets source-NATed to the WAN IP, which makes that session be "stuck" to the WAN egress route until it ends.

UDP sessions are closed based on timeout (no traffic seen), so if your client (the app) periodically sends some UDP traffic with the same source ports (you mentioned "queries come from five or six source ports") it is theoretically possible for these UDP sessions to be kept alive forever and thus forever stuck to egressing out of WAN.

 

If this is your case, the typical solution is to create a blackhole route with the worst admin distance (255) for the destination subnets of the VPN tunnel(s). This ensures that packets destined to the remote destinations will be dropped when the tunnel is down.

 

For what it's worth, modern FortiOS versions automatically create the blackhole route when the tunnel creation wizard is used, so there is a chance that you have such a blackhole route already. If you have it then this reply might not be relevant for you. (unless the blackhole route(s) do not cover the relevant destinations)

[ test signature, please ignore ]

View solution in original post

4 REPLIES 4
pminarik
Staff
Staff

Half-educated guess: When the issue happens again, check if the relevant source-IPs have existing sessions for the DNS traffic. If yes, check which egress interface was chosen. You can also try clearing the sessions, and then check if that "magically" solves the issue.


What I'm thinking: When the tunnel is down (=when the internet connection is down), your only route towards the DNS servers, normally reachable through the tunnel, might be through your WAN interface due to the default route matching it. If that happens, and there is a firewall policy that could allow this traffic, then that traffic likely also gets source-NATed to the WAN IP, which makes that session be "stuck" to the WAN egress route until it ends.

UDP sessions are closed based on timeout (no traffic seen), so if your client (the app) periodically sends some UDP traffic with the same source ports (you mentioned "queries come from five or six source ports") it is theoretically possible for these UDP sessions to be kept alive forever and thus forever stuck to egressing out of WAN.

 

If this is your case, the typical solution is to create a blackhole route with the worst admin distance (255) for the destination subnets of the VPN tunnel(s). This ensures that packets destined to the remote destinations will be dropped when the tunnel is down.

 

For what it's worth, modern FortiOS versions automatically create the blackhole route when the tunnel creation wizard is used, so there is a chance that you have such a blackhole route already. If you have it then this reply might not be relevant for you. (unless the blackhole route(s) do not cover the relevant destinations)

[ test signature, please ignore ]
scheuri
New Contributor III

 

Dear pminarik

 

Thank you very much for your reply, very much appreciated!

This is indeed a secenario I wasn't thinking of.

 

I will need to investigate whether the routing situation happens when the internet goes down (routing the DNS queries to the internal DNS with 10.x.x.x/8 adresses through WAN rather than VPN tunnel) - which is very likely.
And I will need to check if there is a firewall policy that would allow this (which I am not sure yet, but is not impossible at all).

Testing will be again in two weeks, so I hope I will have more information by then.

 

Again, thank you.

 

Follow up on 07-06-2022:
The situation is indeed as described. I have no blackhole route for the private IP address range AND I have a firewall policy that allows said DNS traffic to hit the internet/the ISP if the tunnel is not up.
Meaning - as described - chances are indeed that after the internet link goes down for a few minutes (and therefore also the tunnel) the route through the tunnel vanishes and the other firewall policy to the internet is being used...and since there are tons of automatic queries it appears that the timeout can never work.

We will test this live next week, then I know more for sure...

 

Follow Up from 2022-06-17:

Your "half-educated" guess was very much educated and correct. We were able to trigger the issue and analyse it further. The fact we hadn't blackhole routes and a firewall policy allowing DNS queries to the WAN interface (via default route) were our downfall.

After implementing the blackhole routes it worked (meaning, we couldnt reproduce the issue anymore and were able to fix it).

 

Again, thank you very much for your help. Very much appreciated.

 

xsilver_FTNT
Staff
Staff

Whenever there is any dynamic interface like VPN tunnel one (usually routing with just private IPs), I'm trying to set blackhole routes to prevent private traffic leaking out through default route.

That's especially important for chatty applications like VoIP which then would keep "leaky" session forever (till admin clears it to "fix" the issue).

 

Tom xSilver, planet Earth, over and out!

scheuri
New Contributor III

Hello xsilver

Thanks for your reply. 

 

We were able to reproduce the issue and the unfortunate chain of events really lead to the situation as pminarik described.

 

After we added the blackhole routes, the issue was gone.

 

Unfortunately I have no idea why those private range blackhole routes were not implemented in the first place (happend before my time). Now that they are implemented, it seems to solved.

 

Thanks also for your reply, much appreciated.