Why are the DoT/DoH mechanisms used by Fortigates so abominably slow?

PaulRoberts · ‎04-20-2023

So, I'd put in a support ticket for this against the 6.4.x firmware because it definitely behaves there's a bug where VIPs are being applied to everything and were likely ignoring the src-filters, and just got back around to poking at it on the newer 7.0.11 firmware today.

The root of this issue is that since there's no way to set the nameservers being used in the parameters being sent to Windows Native VPN tunnels, if one wishes to use a separate view for hosts "inside" the firewall that includes perhaps zones that don't exist, one has to perform a simple bit of DNS abduction using a VIP. That is, you install a pair of VIPs that only apply to the "dialin" VPN users via a src-filter definition, mapping the IP addresses of the Fortinet's two nameservers to two internal nameservers, drop in a policy that facilitates this hitting the VIPs (also carefully specifying the source addresses used by the VPN clients) and bob's your uncle. Now all DNS queries made by the VPN client which has no choice but to send all traffic to the internal network get redirected to your internal nameservers so your internal scopes resolve correctly even for the dialin people, and all is well.

...except now I see that the VIP (at least on 7.0.x, and for 601E as well as 101E) isn't the actual problem. The problem is that the default Fortinet nameservers are fantastically sluggish and since it's been going on for some weeks now I highly doubt it's some momentary routing problem in any of the five ISPs I've got units going through.

Earlier today, I put together a couple of little pastebombs so I could easily rip the three definitions out and put them right back to test further. With them in place and the default nameserver settings, (as accessed the hard way via `execute ping gmail.com`) nameservice barely functions. From the web GUI under Network->DNS there are two angry red boxes telling me that the nameservers are taking anywhere from 3,000ms to (more often) 10,000-15,000ms to respond (or are simply unreachable), and by replacing those default servers with some widely-known ANYNET addresses, i.e., 8.8.8.8 and 1.1.1.1 the problem continues as long as DNS over TLS (853/tcp) is selected. Flip the DNS server setting over to 'Selected' and leave the same nameservers in place, enable plain DNS, and everything goes just fine. Instantly times of less than 1,000ms are reported (and for 8.8.8.8 it's usually under 50ms). Just to look at things from a second angle, you can open up a console to the device and start running `execute ping somedomain.foo` (obviously using a domain that won't be in the cache) and it'll very likely timeout saying "Unable to resolve hostname" while DoT is being used.

Using knot on any given Linux workstation I can make DoT queries to 8.8.8.8 (and other servers not 96.45.45.45/96.45.46.46) and get responses in under 100-200ms just fine, so I am disinclined to believe this is a mistake or false positive limited to the web GUI. Whatever implementation is being used by the FortiGates seems to be just horrendously slow and needs to be fixed. I can't see how this is not preventing devices from getting their Geo-IP and various signature updates in a timely fashion.

gfleming · ‎04-20-2023

The DNS servers configured in the FortiGate's DNS settings have nothing to do with the response times of your internal DNS servers. Unless, of course you are using the FortiGate as a DNS forwarder on your internal DNS servers. Are you?

Cheers,
Graham

PaulRoberts · ‎04-20-2023

This has nothing, absolutely nothing to do with our internal DNS servers.

It has to do with the fact that something is profoundly screwed up with the DoT mechanism used by Fortigates, resulting in some absolutely ridiculous latency, even when and especially when using the default settings for the devices. Although I originally put in a support ticket because I thought something in the internally generated rules caused by the fairly straightforward VIPs I'd created (and I'd never had real cause to think the reported latency was anything other than a UI mistake--not the actual value), I literally disproved the idea that it might have been a VIP causing the problems earlier today because the problem remains when the VIP isn't present, and also happens to the 100+ other devices (all of which were using the default settings for DNS) which never had any such VIP in the first place.

This impacts not only the Fortigate appliances' ability to fetch their own updates (because it can't find the update server if if the DNS times out), but in some cases (since the same issue appears to impact it's querying of the DNS filter rating servers) clients which are simply making their own queries which are being filtered by the Fortigate appliances. Devices which can't even do their basic network functionality DNS lookup will simply decide they have no internet access (because they basically don't when the firewall is eating their DNS queries). The workstation I was doing the checking with knot from is not subject to any sort of DNS filtering by the Fortinet. The only reason this hasn't been excruciatingly obvious from day one is that the devices are caching DNS responses, but you can literally open a console, bang out `execute ping somenewhostnamenoonehaslookedupyet` and watch it fail the majority of the time.

Turning that junk off fixed quite a number of issues. I can literally (because I just did so) turn 53/tcp off and 853/tcp back on and watch the reported latency from 96.45.45.45/96.45.46.46 go from 20ms right back to >10,000ms. This problem exists in the Fortigate's default configuration.

The fix?

config system dns

set protocol cleartext

end

...and just like that, reported latency drops back to a perfectly acceptable <200ms range.

...which has to be done from a console because the web GUI literally won't let you disable DoT in favor of plain 53/udp as long as "Use Fortigate Servers" is lit.

Now... If you can't be bothered to read a post carefully enough to actually understand the problem, stop wasting other people's time.

gfleming · ‎04-20-2023

I read your entire post and I asked you a follow up question. If you're going to accuse me of wasting your time and not reading your post, the best you could do is answer the question I posed in my reply as all I am doing is trying to help you by better understanding how your network is configured.

You are complaining about FortiGate DNS response time being sluggish (local-out traffic) but you are also talking about VPN clients and using VIPs to redirect DNS lookups to your internal DNS servers (forwarded traffic) which has no bearing on local-out traffic from the FortiGate. I'm trying to understand the relationship between both of those things in your network and your configuration. As normally, there should be no relationship. Unless, as I asked already, you are using the FortiGate as your DNS forwarder on your internal DNS servers.

Just have some respect for the people that are here helping you on their free time while we're at it. Thanks.

Cheers,
Graham

PaulRoberts · ‎04-21-2023

That's going to be a hard no. While I understand your bosses have you guys under orders to respond to every post within 24 hours, I'm pretty sure when they came up with that what they had in mind was not low-effort garbage posts. The only "effort" you seem to be making is in bending over backwards to deny that there is ever any kind of problem. You literally zeroed in on the a thing that has no actual bearing on the problem and ignored everything else. As to your question about whether or not we're using the Fortigate as a DNS forwarder, that's not even a type of query I have attempted to test yet and I've already narrowed the problem down to just one thing so I will be ignoring that question.

I would think that by now you should be able to figure out, based on my rather explicit response, that the problem is not the "response time" of the Fortigate, but how long it takes to resolve anything in the first place and that this seems to be causing other problems elsewhere in addition to the problems it causes for the device. I have in the past noticed parts of this mess when I've had to troubleshoot why a device didn't have anything but an ancient copy of (among other things) the GeoIP database, because it couldn't resolve the update host. That did lead me to discover that the logs don't accurately report why hosts were being blocked by the GeoIP filter. The logs show what country an address belongs to at the time you view the logs, not at the time the decision to block was made, and as much as I hate to admit it, Florida is still a part of the US so yeah seeing the Fortigate logging that it blocked Florida is going to make me look rather carefully at things. I would say "efficiency over accuracy" but the route someone took for that code was neither accurate nor efficient. ...but I have no reason to bother even trying to report that as a bug when it'll just be hand-waved away. I'm perfectly happy to let detail-oriented lawyers working for someone else discover that the hard way (even though it's fairly unlikely to come up it does represent what lawyers refer to as "doubt" as to the accuracy of the logs).

I specifically mentioned in the very first post a section of the GUI whose sole function appears to be to report the time it takes the Fortigate itself to get a response to the queries that (in the default configuration!) it is making to Fortigate's DNS servers. I've been seeing that indicating a problem for weeks, and the only reason I was ever willing to assume that it didn't reflect reality was that I've already seen at least one other case where part of a Fortinet's reporting emits random numbers instead of the correct values, and no one seemed to actually care about that. You personally discounted that one as being a real problem, by the way.

I have illustrated how this ridiculous latency negatively impacts the functionality of both the device itself, and may very well be impacting any client systems relying on the appliance for connectivity under a configuration that simply involves using functionality that is central to why people pay money for the things--namely the filtering of "bad stuff" via scrutiny of DNS queries. I've seen the same bizarrely high values showing up next to the DNS filtering servers at well (although it's not something I've seen in the last 48 hours). It makes perfect sense that if it can't go check on a new domain it might very well just not send a response to the client system that made the recursive query. Being that the reported latency is wildly variable, I wouldn't be surprised to find that it's simply slightly less obvious when applied to the DNS filter rating servers (and as I write this I have the UI open with the problematic feature enabled just to watch it in realtime and one of the DNS filter rating servers just jumped to 14,540ms).

I have explained that this absurdly high latency remains no matter whose nameservers a Fortinet is pointed at. I have tested the same type of queries using an ordinary linux machine from an internal IP address that isn't subject to DNS filtering to exclude the possibility that our network path is somehow broken. Using knot I was able to perform 100 different queries to 8.8.8.8/1.1.1.1 that all got responses in under 200ms, even going through an impacted Fortinet 601E (but not asking it to do any of the work beyond just toss the packets). Personally, I think getting to the point where everything involved is Fortinet's purview is as far as I really should have needed to go with this, but I like to be thorough because it often leads to other discoveries.

I have now explicitly indicated the single configuration setting necessary to turn the bad behavior on and off at will. With DoT or DoH enabled, both 601E and 101F models exhibit the same problem. Re-enable plaintext and disable DoT/DoH and the problem goes away. I could presume that it might be something screwy with 96.45.45.45/96.45.46.46 because I can't seem to get knot to give me anything but a TLS handshake failure (even when specifying the TLS hostname shown in the returned cert), but the Fortigate exhibits this symptom when pointed at other people's nameservers so all I can show there is that maybe both the Fortigate and the 95.x nameservers are wonky. However, none of that is my responsibility to fix. Your DoT implementation appears to be hot garbage. How is taking more than ten seconds for a simple query considered acceptable performance?

It's not our ISP (because more than ten different ISPs and even more ASNs are involved) causing the problem with some weird routing. It's highly unlikely that it's a regional routing problem because we've got it showing up in locations scattered throughout most of the eastern US (which would make it a really big region and DownDetector would be getting major traffic). It's not just one model because this is happening on at least two different models. It's not our configuration decisions because again, these are the default settings that came in with the 7.0.11 firmware and the only thing I'm touching via a local-in policy is restricting access to the management services ports (in part because in 2023 some vendors have still not gotten the message about unused features being enabled by default). It's not the VIP configuration because this is happening after the VIP and policy were removed and is also happening on over 100 devices which never had the VIP in the first place, ergo there is no possible "relationship" between these things, and you are ignoring that I've said so both implicitly and explicitly.

Let me just simplify all this for you...

It's not DNS.
It was never DNS.
It's the Fortigate.

Someone probably needs to look into why this issue has remained through multiple revisions of the firmware because the web UI has literally been reporting this straight into people's eyes for who knows how long, but some folks might now be drawing their own conclusions about the why of that.

gfleming · ‎04-21-2023

Oh you're the guy who wanted to put tab-delimiters everywhere in the DHCP cli lease output. Right, makes sense now! Your response and attitude sounded familiar.....

And just so you know I have zero mandate to respond to these posts. I can easily just log off and never look at this forum and all will be well. It's a community forum, after all...

Your rambling isn't helping you make a coherent case about what your issue is. But I'll take a stab at it.

The response time listed in the FortiGate DNS is the response time for the last query made by the FortiGate. And FWIW my FGT 80F on 7.2 using DoT on dns.quad9.net shows consistently <100ms response times. I use my FGT as a DNS forwarder and DNS filter and do not have any issues. So it's probably worth looking into this a bit more on your side...

If you are using DNS filtering in a policy then FortiGate's DNS server configuration has zero bearing on client response time (unless again your clients are being forwarded to the FortiGate DNS server by your internal DNS servers—but i dont know if they are or not because you are choosing to limit how much useful info you are providing).

If you are using DNS Filtering as part of a DNS server config on the FortiGate then again it depends on that DNS Server config as to why or how clients might be seeing delays.

If you actually want help—and I'm starting to think your only goal coming to this forum is to post long rambling rants with no intent to actually get a resolution to your issues—then I suggest you make it easy on people here and post a succinct summary of your issue (does talking about VPN clients and VIPs have anything to do with it because it sounds like it doesn't) and give us some useful details of your configuration, topology, traffic flows, FortiOS versions, etc etc etc.

Most users who actually want help start with those basics and they also are much better at treating fellow forum-goers with respect and gratitude and don't purposefully refuse to answer questions...

Cheers,
Graham

PaulRoberts · ‎04-21-2023

@gfleming wrote:
Oh you're the guy who wanted to put tab-delimiters everywhere in the DHCP cli lease output. Right, makes sense now! Your response and attitude sounded familiar.....

Yeah my "attitude" is one that is developed after spending hours carefully studying the problem. ...and no you clearly don't understand as much about programming as I did thirty years ago. There's no way to correctly parse that line, there was never a way to correctly parse that line, and I that the Fortigate will happily accept whitespace in those two fields makes the entire line impossible for a human to read with 100% confidence. Deal with it and move on because no amount of precocious nonsense about how "easy" it is to do with python will make the impossible a reality. I had the entire problem solved using perl and a handful of modules via the REST API (despite the pointless paywall trying to hide the endpoint documentation) a few hours after I posted about the bugs in the CLI.

@gfleming wrote:
And just so you know I have zero mandate to respond to these posts. I can easily just log off and never look at this forum and all will be well. It's a community forum, after all...

Then why are you in here wasting other people's time with useless nonsense? You've no experience with the equipment in question, have no way to even attempt to replicate the environment, and can't even seem to resist the temptation to simply speculate wildly about things which are not mentioned at all. How is that helpful or even professional behavior? You have yet to address--even for a moment--that the DoT mechanism used appears to be very seriously broken and the behavior can be turned off and on at will with the change of one setting, and keep insisting on inventing other imaginary problems that you can then point at and still not have a useful answer, when I have been absolutely crystal clear that the issue is between the Fortigate and the nameservers used in the default configuration. Not our local nameservers, not any VIP, and not your different model, different firmware, different target nameserver configuration.

@gfleming wrote:
Your rambling isn't helping you make a coherent case about what your issue is. But I'll take a stab at it.

The response time listed in the FortiGate DNS is the response time for the last query made by the FortiGate. And FWIW my FGT 80F on 7.2 using DoT on dns.quad9.net shows consistently <100ms response times. I use my FGT as a DNS forwarder and DNS filter and do not have any issues. So it's probably worth looking into this a bit more on your side...

There was nothing "rambling" in that previous reply. The only reason some of that detail was included was to try to stop you from wasting more time with imaginary configurations.

Hours have already been spent looking at it on my side, up to and including reading through the packet dumps, which is why I'm now bringing it to a place where someone who might actually have the same equipment (or might have already resolved it) can see if they're experiencing the same problem and perhaps shed some light on the matter. ...and once again there you are you're throwing something completely imaginary into the mix for no reason I can fathom other than to just have something to post. We are not using the Fortigates as a DNS forwarder, and I'm about to scrap any plans for ever doing so knowing how it'll be easier for me to just build out a custom machine than to try to troubleshoot even the first problem that might come up. There was no rational reason whatsoever for you to bring that into consideration.

I am perfectly aware of what that latency number in the UI represents--because it's literally the only explanation for how it's gotten there unless it's just 100% fictional. The devices are regularly failing and failing hard when DoT/DoH is enabled, so unless you've got some grand observation about hardware handoff (that we have not changed from the default configurations either) that might be negatively impacting the appliance's ability to perform otherwise normal chain of trust calculations without it taking fifteen whole seconds to do so then maybe you should stick to playing in the shallow end of the pool?

@gfleming wrote:
If you actually want help—and I'm starting to think your only goal coming to this forum is to post long rambling rants with no intent to actually get a resolution to your issues—then I suggest you make it easy on people here and post a succinct summary of your issue (does talking about VPN clients and VIPs have anything to do with it because it sounds like it doesn't) and give us some useful details of your configuration, topology, traffic flows, FortiOS versions, etc etc etc.

Most users who actually want help start with those basics and they also are much better at treating fellow forum-goers with respect and gratitude and don't purposefully refuse to answer questions...

Pot, meet Kettle.

The bulk of the stuff you just asked for has already been answered from the very beginning (configuration? Pretty much defaults. Toplogy? Same as 99.9% of users with HA configurations) Traffic flows? Irrelevant. FortiOS versions? given right up front), some of it more than once, and you've ignored all of it in favor of just whining more. Again, how is this even remotely professional behaviour? This is not an appropriate place for trolling so you can score imaginary internet points.

gfleming · ‎04-21-2023

Well you've got me there. I'm just here for the badges, tbh. I love me some imaginary internet points. And I do enjoy helping others. Sounds like you don't want me to help you because I don't have a FGT-601E in front of me. Fair enough. I would keep talking to TAC in that case as they can get their hands on a 601E. Good luck.

Cheers,
Graham

PaulRoberts · ‎04-23-2023

You also don't have a pair of Fortigate 601E in HA mode, or any Fortigate 101F's, or any experience in actually running these things at an enterprise level, nor apparently do you have a device running the matching minor release firmware, or the ability to click the sole button necessary to replicate the configuration conditions, or the requisite reading comprehension skills. The only thing you seem to have is the ability to fabricate imaginary circumstance unrelated to any of the information given so you can proudly dismiss them as being the source of the problem. ...because that's what you've done every single time.

Perhaps you should leave the complex technical problems to actual engineers and network administrators, because you've brought exactly nothing to the table here.

gfleming · ‎04-23-2023

The projection is real with you. So I am apparently fabricating imaginary circumstances and yet you go on about fabricating imaginary circumstances about what my professional experience is? OK, boss!

The only point I was making by telling you my 81F on 7.2 works fine with DoT was that hey maybe it's an issue with FOS 7.0.X, maybe it's a platform issue. You know, just some info to get started on productively troubleshooting this with you. Maybe it's something completely unrelated. I tried......

But you ain't gonna figure it out if you're gonna continue to be miserable about it all. Sorry man. You clearly don't want to try and work on this. I thought I'd give it another go but it's pointless. I'll let someone else chime in—if they really want to after reading the type of replies you give...

Cheers,
Graham

Why are the DoT/DoH mechanisms used by Fortigates so abominably slow?

Nominate a Forum Post for Knowledge Article Creation

You are leaving our website