Why are the DoT/DoH mechanisms used by Fortigates so abominably slow?

PaulRoberts · ‎04-20-2023

So, I'd put in a support ticket for this against the 6.4.x firmware because it definitely behaves there's a bug where VIPs are being applied to everything and were likely ignoring the src-filters, and just got back around to poking at it on the newer 7.0.11 firmware today.

The root of this issue is that since there's no way to set the nameservers being used in the parameters being sent to Windows Native VPN tunnels, if one wishes to use a separate view for hosts "inside" the firewall that includes perhaps zones that don't exist, one has to perform a simple bit of DNS abduction using a VIP. That is, you install a pair of VIPs that only apply to the "dialin" VPN users via a src-filter definition, mapping the IP addresses of the Fortinet's two nameservers to two internal nameservers, drop in a policy that facilitates this hitting the VIPs (also carefully specifying the source addresses used by the VPN clients) and bob's your uncle. Now all DNS queries made by the VPN client which has no choice but to send all traffic to the internal network get redirected to your internal nameservers so your internal scopes resolve correctly even for the dialin people, and all is well.

...except now I see that the VIP (at least on 7.0.x, and for 601E as well as 101E) isn't the actual problem. The problem is that the default Fortinet nameservers are fantastically sluggish and since it's been going on for some weeks now I highly doubt it's some momentary routing problem in any of the five ISPs I've got units going through.

Earlier today, I put together a couple of little pastebombs so I could easily rip the three definitions out and put them right back to test further. With them in place and the default nameserver settings, (as accessed the hard way via `execute ping gmail.com`) nameservice barely functions. From the web GUI under Network->DNS there are two angry red boxes telling me that the nameservers are taking anywhere from 3,000ms to (more often) 10,000-15,000ms to respond (or are simply unreachable), and by replacing those default servers with some widely-known ANYNET addresses, i.e., 8.8.8.8 and 1.1.1.1 the problem continues as long as DNS over TLS (853/tcp) is selected. Flip the DNS server setting over to 'Selected' and leave the same nameservers in place, enable plain DNS, and everything goes just fine. Instantly times of less than 1,000ms are reported (and for 8.8.8.8 it's usually under 50ms). Just to look at things from a second angle, you can open up a console to the device and start running `execute ping somedomain.foo` (obviously using a domain that won't be in the cache) and it'll very likely timeout saying "Unable to resolve hostname" while DoT is being used.

Using knot on any given Linux workstation I can make DoT queries to 8.8.8.8 (and other servers not 96.45.45.45/96.45.46.46) and get responses in under 100-200ms just fine, so I am disinclined to believe this is a mistake or false positive limited to the web GUI. Whatever implementation is being used by the FortiGates seems to be just horrendously slow and needs to be fixed. I can't see how this is not preventing devices from getting their Geo-IP and various signature updates in a timely fashion.

PaulRoberts · ‎04-26-2023

Let me make this absolutely crystal clear. Go away. Your gaslighting is not needed. I have figured the problem out. Breaking down complex problems into successively smaller pieces and then testing and analyzing those pieces until I have identified and eliminated the problems is what I do because I am an actual engineer and not a pre-sales "engineer". At this point the only missing piece is that I can't make queries directly to the 96.45.45.45/96.45.46.46 pair using DoT (but neither could anyone else I asked to try, so we're thinking there's something screwy about the service running there) so I had to find a workaround for generating performance metric data for the service. Being that it's not my direct responsibility analyze how badly your nameservers are misbehaving, as close as I'm going to bother with here is "Fortigate's default configuration talking to Fortigate's DNS servers is unsuitable for use in a production environment".

After I posted here and filed a formal support request, almost like magic (or more likely like someone said "oops!" and quietly restarted a struggling service) the problem went away until about 10-ish this morning, when latency for the DoT requests to Fortinet's servers once again began exhibiting ridiculously high latency across multiple centers. So... seeing that apparently no one's bothering to run a service monitor or trying to achieve minimum service levels, I'm not about to go back to letting the equipment try to use DoT with Fortinet's nameservers. Some people would think that if a device is going to alter settings to a new standard during a firmware upgrade, that the things those settings rely on should actually be more than "minimally functional", but that apparently doesn't include your company's devops people.

Yesterday after noting that it looked like maybe some devoops person intervened and got the service running acceptably again, I took a chance and set up our home office HA pair to act as a DNS forwarder, which, although the rest of the path was encrypted so I couldn't see inside the traffic, does appear to work as a "duct-tapey" way to throw test queries at Fortinet's nameservers. Then I set up a service monitor to repeatedly query a DNS record (in a zone I control) with a TTL of zero, and to let me know when it started failing. It doesn't matter to me personally if it's a bug in the Fortigates, or if it's a problem of not properly maintaining their DoT servers. Both of those things are Fortinet's responsibility, not their customers' and definitely not mine, but if it's going to screw things up (which it has been doing) then it becomes my problem to either get someone to fix it or to find a workaround.

Unsurprisingly, I was contacted about tablets being unable to detect internet access on wifi this morning at about the same time the monitor started reporting it wasn't getting acceptable responses back from the forwarder (no big surprise there). A quick check of four locations in different states revealed that it almost certainly wasn't limited to just that one center, and I'm very glad I disabled DoT enterprise-wide last week. So, whatever they did appears to have made it work properly for possibly as much as two days.

Despite it being mentioned many, many times you show no sign of ever having attempted a DoT query against the indicated nameservers or you'd surely have done something other than idly speculate about made-up configuration scenarios.

Oh and I'll point out that this is the first time I have mentioned a DNS forwarder at all, because it's the first time I've ever bothered to use that functionality. You pulled that speculation out of thin air, and you're not fooling anyone by lying about it. I've spent several hours dealing with this problem over the course of more than a week, which is why I'm the one who has a functional workaround for the problem (as posted before) and you're still the one running your mouth trying to gaslight people.

RachelGomez123 · ‎04-26-2023

It's difficult to determine the exact reason why the DoT/DoH mechanisms used by Fortigates may be slow without more information about the specific setup and configurations being used. However, there are several potential factors that could be contributing to slow performance:

Network latency: DoT/DoH traffic is encrypted and requires additional processing compared to unencrypted traffic, which can increase network latency. Depending on the network topology and the distance between the Fortigate and the DNS server, latency could be a contributing factor to slow performance.

DNS server performance: The speed of DoT/DoH resolution can also depend on the performance of the DNS server being used. If the DNS server is overloaded or experiencing issues, this could slow down resolution times.

Fortigate configuration: The Fortigate device itself may also be configured in a way that is contributing to slow DoT/DoH performance. For example, if the Fortigate is configured to use an inefficient encryption algorithm, this could impact performance.

Regards,

Rachel Gomez