This is a tough one and I will be thoroughly impressed if someone were able to give me any advice that could help solve this issue. Fortinet support is having trouble solving so I'm reaching out to t...
See more...
This is a tough one and I will be thoroughly impressed if someone were able to give me any advice that could help solve this issue. Fortinet support is having trouble solving so I'm reaching out to the community for help. I'm going to provide a little background (sorry it's a little lengthy but I think it helps understand the issue) on our issues with FSSO and then get into our current issue. Hopefully someone has seen this problem and can offer some suggestions as to what the problem might be. Background: We've been using FSSO for around 8 years. We initially installed a single collector agent in DC Agent mode. We had 8-10 internet auth groups and things seemed to run fine. We then decided to add a second Collector for redundancy. Things still ran as expected. A few years later, we started authenticating our laptop users to our wireless network (802.1x/PEAP) via Group Policy configurations. This introduced our first set of issues with FSSO. We found out that when users had both a wired and wireless network connection at the same time, FSSO would randomly have issues. We discovered that the way in which Windows DNS servers/clients operated was playing a role in the auth failures. Turns out, when doing something like PEAP to auth wireless connections via the user account, DNS entries for the client would be handled in a way that confused FSSO. The FSSO collector would thing you were authenticated to an IP it wasn't expecting. For example, suppose you log into your laptop with both a wired and wireless connection enabled. When you first connect your system to your network cable/dock/whatever, the wired adapter will register itself with DNS on a DHCP segment. The FSSO collector agent would see your user account authenticated to this physical nic IP. Then, when you logged into the domain, your wireless adapter would authenticate your system (via your user account) to the network and overwrite your previous DNS entry with a name that resolves to your wireless IP. FSSO would now see your wireless IP as authenticated for your user account. Generally, the physical NIC was first in the binding order so when you would go to use the internet, your traffic would source from the physical NIC. Since FSSO uses DNS to lookup the names of systems during auth for the firewall, FSSO would not be able to identify your user account with your system's DNS record as it would only authenticate you properly if you were sourcing from the wireless IP. Turn off your wireless card and generate a login event and the problem goes away. Disconnect your physical NIC and it's resolved. This drug on for a long time without any known way to resolve this. Windows sucks at handling multiple network interfaces properly and there really wasn't any enterprise-ready solutions for disabling adapters if multiple were detected. Also, the way DNS operated on a client seems silly (I have no idea why if you have two different IPs, you can't have to different records in DNS - design flaw maybe). So, at Fortinet's recommendation, we switched to polling mode. Shortly after, with some minor tweaking to the events we were monitoring for, our problems went away. Suddenly, earlier this year, the problem resurfaced with a new twist. This time, when the internet auth issue occurs (sporadically but often for the same users) the FSSO collector agent isn't seeing an authenticated user at all. However, when we pull our domain controller logs, the user has log entries indicating a logon event. We found that restarting the FSSO service on both collector agent servers was the quickest way to resolve the issue and when the next logon event was generated it would force the user to properly show up in the collector user table. Initially, we found that all of the people who experienced were authenticated to a new domain controller we added earlier this year. So we were heavily looking at this system (firewall ports, etc.). We spent lots of time collecting logs only to have Fortinet tell us what we already knew in that the user wasn't in the FSSO auth table. Today I discovered something different that may help narrow this down. We had a person experience the problem and while collecting logs I found the same symptoms as usual - not in the user table in FSSO (unauthenticated) even though there are logon events. We asked that we hold off on restarting services so we could troubleshoot the issue. However, before I got too far down the troubleshooting path, the issue was suddenly resolved. The user had accessed a file share and generated a logon event during the process. I was able to capture the events and sure enough there was a new logon event that the collector agent had picked up. I thought about this for a bit and I'm starting to wonder whether or not the FSSO collector agent is having issues handling the amount of events it is processing via polling mode. I say that because there are periods of times (days even) where we don't have the issue at all. Then, other days, we have it multiple times a day. This incident today almost seemed like the agent was behind in processing/polling events and eventually "caught up". I'm thinking we should switch back to DC Agent mode but I'm concerned the wireless issues might resurface.
Thoughts? Suggestions? I would appreciate any and all help. Thanks in advance! Current Environment: Collector Agents: 2 Domain Controllers: 15 Polling Mode: Check Windows Security Event Logs Changes that may have had an impact this year: A new domain controller, authentication of wireless switch from user based to machine based prior to login and user based post login - so computers are connected prior to logins, and we upgraded to 5.4.X (this seemed to start before we upgraded to 5.4 I believe)