FortiSIEM Discussions
AEH
New Contributor

FortiSIEM ClickHouse Deployment Architecture

Hello everyone,

 

I’m currently working on the design of a FortiSIEM deployment expected to handle 10,000 EPS, and I’d like to get some insights or recommendations from the community.

 

Planned architecture

  • 1 Supervisor

  • 1 Worker

  • Collectors at each site (multiple sites)

  • ClickHouse for event storage

Before finalizing the architecture, I have a few questions about the design choices and database placement.

 

 

1. Architecture benefits

What are the main advantages of separating the Supervisor and Worker compared to using an all-in-one Supervisor setup (for a 10K EPS environment)?
Does it provide noticeable performance or scalability improvements in real-world deployments? or would be an all in one supervisor good enough (to optimize resources usage).

 

 

2. ClickHouse placement

Where should ClickHouse ideally be installed — on the Supervisor or on the Worker?
My initial preference is to host ClickHouse on the Worker to reduce load on the Supervisor, but I’d like to confirm if that’s a recommended or supported approach.

 

 

3. Installing ClickHouse on the Worker

If ClickHouse can (or should) reside on the Worker, how can I install and configure it there instead of the Supervisor?


If anyone has an official Fortinet KB or deployment guide covering this scenario, please share the reference.

I’d really appreciate feedback from anyone who has implemented or benchmarked a similar setup — especially around event storage design, deployment best practices, and operational lessons learned.

 

Thanks in advance for your help!

AEH.
AEH.
9 REPLIES 9
AEH
New Contributor

@Secusaurus Waiting for your kind support.

AEH.
AEH.
Secusaurus
Contributor III

Hi @AEH,

 

Have a look at these documents:

 

I think, the NSE training (https://training.fortinet.com) also mentions the basics of ClickHouse deployment.

 

In general, my recommendation:

  • Always use distinct Supervisor(s) and Workers
    • An all-in-one-deployment does not scale. If you ever run into performance issues, you are limited to SSD-speeds, max CPUs/RAM of your server. If you use Workers right from the start and you run into performance issues, you can spin up another worker on another hardware
    • In an all-in-one-deployment, you don't get a good redundancy/backup concept. In case this machine goes down, you loose everything
  • Use ClickHouse
    • This is recommended, fully integrated and enables some features you don't have with the other db-structures
    • ClickHouse has a noticeable better performance (hard drive speed & consumption, CPU, RAM) than eventDB which you would usually get on an all-in-one-deployment
  • Use (at least) two identically configured Workers as Replicas within the same Shard
    • This is a perfect backup-concept. If one Worker goes down (or just does an upgrade), you have all the data on the other one and don't loose anything

 

ClickHouse automatically comes with the default installation, you don't need to install anything additionally. You only need to configure it, depending on your needs (for needs: see sizing guide above). This is done in the admin config menu and you simply configure, test and deploy it on the GUI. In a simple setup (10k still is a low number), I would recommend one Supervisor as Keeper and two Workers as Replicas of one Shard can absolutely handle that. You might need to deploy more Keepers, in case you like HA things on the Supervisor.

 

Hope this helps.

 

Best,

Christian

FCX #003451 | Fortinet Advanced Partner
FCX #003451 | Fortinet Advanced Partner
AEH
New Contributor

Dear @Secusaurus,

 

Thank you for the valuable information. I have a suggestion regarding the architecture and would like to get your advice on it.

As you recommended, I’m planning to have 2 workers + 1 supervisor. Each worker will have a replica in the same shard, and both workers and the supervisor will have the keeper node role enabled (for keeper redundancy), therefore 3 keeper nodes. Both workers will serve as data and query nodes.

What do you think about this proposition? Do you consider it feasible and adequate for handling at least 10K EPS?

Thank you for your guidance.

AEH.
AEH.
Secusaurus
Contributor III

Hi @AEH,

 

This setup should be able to handle 10k and more (assuming you spend enough CPUs and RAM for the nodes).

 

Talking about Keepers: I don't really see a benefit from configuring all three nodes as Keepers (Supervisor goes down: You cannot use the system anyways / One worker goes down: This one will not take a part in the cluster anyways // everything which could not be send to the cluster, will be cached on the Collectors). But, there is no real downside as well. So, you can just go with it.

 

Best,

Christian

FCX #003451 | Fortinet Advanced Partner
FCX #003451 | Fortinet Advanced Partner
AEH
New Contributor

Dear @Secusaurus ,

isn't the benefit having keeper nodes redundancy ? even if the super is down the keeper keeps functioning on the workers, thus preventing the read only mode.

 

Best regards.

AEH.
AEH.
AEH
New Contributor

waiting for your kind reply @Secusaurus 

AEH.
AEH.
Secusaurus
Contributor III

Hi @AEH,

 

Yes, having redundancy for Keepers can make sense. However, best practice is to have separate machines (see also: https://clickhouse.com/docs/architecture/replication). So, if you go for redundancy, you should also go the full way of keeping the nodes separate from each other. This is, why I don't like the concept of using the Workers for that very much.

 

In the Keeper(Supervisor)-reboot/update-scenario, I would assume that the Worker will deny the logs, if it can't store - as it's a full https-api-connection. On a deny, the Collector will then cache the logs for the meantime.

 

Anyway, you can (if the GUI let's you ;) ) configure your cluster this way. It should work, with 10k EPS your workers will have enough spare time to handle the Keeper functionality as well.

 

Best,

Christian

FCX #003451 | Fortinet Advanced Partner
FCX #003451 | Fortinet Advanced Partner
AEH
New Contributor

Hello @Secusaurus ,

Can you explain to me please why would the worker deny the logs if the super goes down , since we still have 2 keeper workers that shouldn't be the case if i'm not wrong.

also , isn't the collector cache limited ? so counting on that risks of loosing logs if the limit is reached

Best regards

AEH.
AEH.
Secusaurus
Contributor III

HI @AEH,

 

If you have three Keepers, a majority decision (one is down, two remaining) will enable writing to the tables and therefore, yes, the logs will be received and stored in this scenario.

I was talking about the case if you only use the Supervisor as Keeper. When the Supervisor goes down, the Replicas will go into readonly and from my expectations deny the logs, forcing the Collectors to cache.

 

In your environment (10k EPS), the Collectors should be able to cache for several hours in the default deployment.

 

But, let's make it easy ;)

  • Yes, setting the Worker nodes to Keepers as well as Replicas is supposed to work
  • It is recommended, though, to keep Keepers separate from Replicas, to be able to scale better
  • In case of FortiSIEM, the whole system is kind of useless when the Supervisor is down. So, in our deployments, we either choose to keep it simple and stay with one Keeper on the Supervisor, or build a HA-cluster of Supervisors, remediating the risk of one single Keeper as well

Which will then be my recommendation.

 

Best,

Christian

FCX #003451 | Fortinet Advanced Partner
FCX #003451 | Fortinet Advanced Partner