Description
This article describes the unique nature of the FortiGate-200E/201E and how it differs from other models with regards to hardware acceleration. In particular, the topics of Nturbo, IPS Acceleration (IPSA) and general NP packet offloading ('fastpath') is explained, as well as certain other notes related to the 200E/201E's internal design.
Note: For the rest of this article, any references to the FortiGate-200E will also include the 201E (as they are identical in design with the exception of an additional log SSD in the 201E).
Scope
FortiGate-200E/201E
Solution
As a primer, see the following diagram that originates from the FortiGate Hardware Acceleration documentation.
As per the diagram, the FortiGate-200E (and the 201E) has two NP6lite onboard. Each NP6lite connects to a separate block of ports (marked in red and in green), and there is NOT an Integrated Switch Fabric (ISF) in-between the ports and the NP6lites. With that in mind, consider the following aspects of hardware acceleration:
NP6lite session/packet acceleration (aka 'fastpath')
- Generally speaking, this feature allows the FortiGate to offload packet-handling from the CPU to the onboard Network Processors (NP). This reduces CPU utilization for traffic that is not being actively inspected (i.e. inspection has completed or there is no security inspection in the Firewall Policy) since packets will flow in/out of the NP without touching the CPU.
- The FortiGate-200E does support fastpath acceleration, generally speaking. However, for this acceleration to take place, the traffic must ingress and egress on interfaces that are within the same block of ports (i.e. traffic cannot cross between different NP6lite processors).
- For example, traffic ingressing on port1 and egressing via port2 can be fastpath accelerated since they are part of the same port block.
- However, traffic ingressing on port 11 and egressing via port7 cannot be fastpath accelerated since they belong to different port blocks.
- Traffic can still flow successfully between different port blocks, but it will need to pass through the CPU in order to do so (which negates the fastpath offloading benefit).
- Note regarding Link Aggregates: Depending on the FortiOS version, it is possible to create an Aggregate interface that features ports from different port-blocks on the FortiGate-200E. While this can be done, it is highly recommended to avoid this scenario where possible.
- In these mixed port-member aggregate scenarios, the FortiGate will set ports as active or passive based on which port block they belong to. For more info, see the documentation.
- More importantly, there are known-issues with how NP fastpath acceleration handles these scenarios, meaning it is possible that traffic will face noticeable impacts to stability.
- Instead, it is recommended to create Aggregate interfaces using ports that are members of the same port block/NP6lite processor.
NTurbo
- The NTurbo feature allows the FortiGate to accelerate flow-based security inspection and reduce CPU utilization.
- In brief, Nturbo allows the NP to send packets directly to a dedicated memory space monitored by the IPS Engine, rather than needing to route these packets through the kernel. There is also additional intelligence involved that allows packets to be more efficiently distributed amongst multiple CPU cores for the inspection process.
- As mentioned above, this feature only works when using flow-based firewall policies. It is not active when traffic is subjected to proxy-based inspection.
- For more info, refer to the documentation.
- Notably, the FortiGate-200E model does not support NTurbo.
- This has significant implications, as enabling any security inspection (flow-based or proxy) results in traffic needing to be routed through the kernel for inspection (i.e. CPU usage increases with the inspection load and overhead of shuttling packets through the kernel).
- Note that fastpath acceleration does not take place when security inspection is occurring (since the inspection occurs on the CPU), so the NP is not performing any traffic offloading whatsoever on the FortiGate-200E when inspection is occurring.
- For example, enabling Application Control in a flow-based Firewall Policy would result in no NP-based acceleration on the FortiGate-200E (at least while traffic is actively being scanned).
IPS Acceleration (IPSA)
- IPSA offloads pattern-matching for flow-based security inspection. More specifically, it accelerates IPS and Application Control (which are notably always handled in a flow-based manner, regardless of Firewall Policy inspection mode).
- The FortiGate-200E does support IPSA, and IPSA also always takes place regardless of the state of NP-based acceleration (unless disabled administratively).
- The onboard CP9 interfaces with the CPU directly and interfaces with the ipsengine processes handling flow-based inspection.
Side Note regarding Security Inspection Performance.
- Notably, the FortiGate-200E has a dual-core x86-64 CPU onboard to handle security inspection and packet-handling.
- With that in mind, it is a good idea to monitor CPU usage during periods of high traffic/session volume to identify peak CPU usage for the FortiGate in the given environment. If CPU usage reaches 100% across both cores, there could be impacts to network stability (e.g. dropped or high-latency packets due to overloading).
- In these scenarios, it may be wise to try and reduce unnecessary security inspection where possible in order to lower CPU usage during periods of intense network activity.
- As a point of reference, the FortiGate-200E's datasheet specifies an 'SSL Inspection Throughput (IPS, avg. HTTPS)' of 820Mbps in total. This benchmark is conducted with IPS + SSL certificate inspection in a flow-based Firewall Policy (i.e. IPSA is available).
- Adding additional inspection profiles (e.g. Antivirus, Web Filtering) and/or utilizing proxy-based inspection (which is generally more resource-intensive than flow-based inspection) could potentially increase system CPU load significantly and result in a lowered ceiling for performance.