Support Forum
The Forums are a place to find answers on a range of Fortinet products from peers and product experts.
PaulRoberts
New Contributor III

Bug: FortiOS CLI bad format string making `exec dhcp lease-list` unparseable

Scope: At least all Fortinet-101F units running 6.4.11, and I have no doubt it applies to basically any unit capable of running a DHCP server or relay (and didn't notice any mention of it being fixed in newer versions).
Severity: CVSS < 1 (Let's not get crazy, it's an output formatting problem.)

Mainly I'm wondering if there's some magic exec command that would let me override the format string the called tool uses, but I'm not going to hold my breath.

 

Ultimately I'm trying to go through the CLI (over SSH) to grab the current information on the leases the Fortigate presently "knows" about to make the lives of our support staff easier.  If there's a JSON API query that can do it, I don't know about it because I'm not registered with the developer network (and they probably wouldn't be amused with the number of glitches like this that I come across).  Before anyone goes to look, it's not available via SNMP either (although that would be super-handy).

The problem:
When you execute `exec dhcp lease-list` you get a list that starts with an interface name on column 1, followed by a bunch of entries that comprise the cached DHCP lease information for the DHCP server scope active on that interface.  Those lines are indented by two spaces, and the first of these lines serve as columns labels for the rest of the lines.  This line and all the others have the first two fields tab-separated, but the rest are whitespace separated and make an "attempt" at using a fixed column width.

Annoyingly, the hostname column appears to be 20 characters wide, but if the client-supplied hostname is longer than 20 characters, it overflows into the next field (VDI) making it basically impossible to separate the hostname from the vendor client identifier.  This also complicates figuring out the values for the SSID and AP columns (if they carry values).

...and just to be thorough, the next to last column of SERVER-ID seems to show a single-digit integer value (that probably matches up with a definition elsewhere) or what appears to be a signed 32-bit int representation of lord-only-knows what.  Also the final column doesn't mention a timezone (but I'm pretty sure it's reasonable to assume localtime for the device).

Suggested fix: Patch the code so it uses the wisdom of our ancient forefathers and makes all the fields tab-separated and not just some of the fields.  ...and swat the dev who recycled a format string from elsewhere gently on the nose with a rolled up paper.

9 REPLIES 9
gfleming
Staff
Staff

You might get better results if you filter the command to a specific interface. And I can probably parse this output using some fancy Perl or other scripting mechanisms. I see the only issue being the date field which includes whitespace as well but you could just regex for the three-letter days to get around that.

 

Also the server ID is literally the ID of the DHCP server on the Fortigate. As in what shows up in the list of "show system dhcp server".

 

And most importantly, there *is* a REST API to access all of this info. Please see the /monitor/system/dhcp method.

Cheers,
Graham
PaulRoberts
New Contributor III

Requesting just the leases apropos to a specific interface does not affect the terrible format string problem.  By example, there's no reasonable way to parse this line:

 

192.168.202.23 64:c2:de:25:08:73 android-7248423a00969c5android-dhcp-8.1.0 0 Wed Mar 8 14:09:37 2023


Let me assure you there is no way you can parse that reliably with a perl script (or any script) because not even Raku has a regular expression modifier that's linked against libMsCleoHotline.so so there is simply no possible way you're going to be able to separate the four middle fields considering that they're both now (mostly) whitespace separated and the fields themselves contain whitespace.  If it were possible, I would already be doing it. 

The date field is actually trivial to match by starting at the end and working backwards, but having to assume the timezone grates on my nerves.

Notice how there's no whitespace separating the hostname and the VCI?  The format string used was probably something along the lines of "  %s\t%s\t%-20s%-20s%20s%-20s" which will break each and every time one of third and successive arguments exceeds 20 characters.  It just needs to be "  %s\t%s\t%s\t%s\t%s\t%s".

 

As to the server id, again, no.  Like I said, I thought that might be what it is, but it doesn't explain the presence of a whole slew of nearly random signed integers like -158016539.  The server I'm poking at right now has a grand total of five DHCP scopes it knows about, and that would mean even if these were being improperly bit-converted I would be looking at a maximum of five bizarre-o numbers.  There's several dozen being reported.  That is absolutely some kind of bug.  I would bet money on it.

After some jiggery-pokery it does appear that hitting /api/v2/monitor/system/dhcp returns a list of all the known scopes, which is actually fine, but it does support my argument that the server-id field is bugged and is returning some random memory locations instead of what it should be returning. Someone should fix that bug because it's scary to think things like this are slipping through and then possibly being ignored.

gfleming

So first of all the CLI output is never really intended to be parsed. It's intended to be formatted so it's human readable in the console. The API is there for programatically interacting with the data.

 

But just for fun I ran a line of output through python's str.split() method (which handles tabs and spaces the same way) and got pretty reasonable output:

 

 str.split(" 10.1.1.51 80:5e:c0:37:29:e8 FON-D71-B Fortinet 12 Mon Mar 6 13:48:47 2023")
['10.1.1.51', '80:5e:c0:37:29:e8', 'FON-D71-B', 'Fortinet', '12', 'Mon', 'Mar', '6', '13:48:47', '2023']

 As predicted the date is an issue but that can be pre-processed to avoid the issue. Anyway it's possible to parse it if you really want to. But yeah just use the API.

 

As for the server ID column issue I'd recommend opening a TAC case so they can confirm whether it's a bug or something else and if its a bug they can get it fixed.

 

This forum is not intended for bug fix requests; that would definitely be TAC's job.

Cheers,
Graham
PaulRoberts
New Contributor III

It doesn't really matter what you did with Python.  The output results in fields with no separator character, and as a result the columns won't even line up.  If a client uses a hostname and VDI the reader can't immediately separate from one another they'll have no way to tell which one is what regardless.  You didn't see your code malfunction because you chose super-easy test data.  ...and for the record, only amateurs "pre-process" data in the way you're suggesting.  Processing the same data twice should be considered a bug. Professional programmers would use something more like (\D{3})\s(\D{3})\s+(\d{1,2})\s(\d{1,2}:\d{2}:\d{2}\s\d{4})$ to catch the date at the end, but that wouldn't even be necessary if the entire line were simply tab-separated fields as has been a fairly common practice for oh... thirty or forty years now.

I guess I'll just stop submitting bug requests entirely since it always seems to require a detailed accounting of every aspect of our network--including the unrelated ones, followed by several phone calls, and hours spent watching someone from Tier I over a GoToMeeting window make blind guesses.

I submitted a report about a problem with multicast packets not being reported properly in SNMP mibs, and that is basically what happened.  I submitted a report about the policy check tool not showing an interface necessary to testing VPN policies, and after a couple of weeks got a suggestion to do the same thing that I had first tried which somewhat predictably still did not work.

This is just not worth my time.  I just figured other users might like to have a way to have their problems confirmed with a web search.

gfleming

You wanted to parse the output. I showed you how to do it with Python. So I'n not sure why you're telling me it doesn't matter?

 

Anyway you have access to the API now so hopefully you're all good.

Cheers,
Graham
PaulRoberts
New Contributor III

No, you showed me how to do it badly with python while continuing to pretend there was no problem at all with parsing (or even reading) malformed output.

The whole "issue" of the date was your problem, by the way.  Binding a match to the end of the line to grab a date that is being formatted correctly (or at least predictably) makes it possible to reliably extract three of the fields, and most of a fourth (hostnames longer than 20 characters will always be truncated).  Like I said in the initial post, it's just annoying that the date lacks a mention of a timezone, but after a few decades it becomes clear no one ever listens about that until they've got a massive pile of indecipherable times on their hands.

gfleming

Just to be clear, your recommendation to make all of the fields tab-delimited would never be implemented. It would make the readability far worse (columns would not line up in any predictable manner). So in this case the use of mixed tabs and spaces is best used for readability. Either way, parsing is done via API as usual.

 

My python output was just showing you that you could parse it despite your claims to the contrary. Obviously it was just a starting point and would require a bit more work to get things separated out properly but it's possible with the start of separating out the fields using str.splt(). That's what I was tyring to show you. You are claiming it's impossible. It is not.

 

Timezone thing is weird.. I'm not sure why you'd want it to show it by default. I mean it's clearly in the device's time zone. Why would output be in a different timezone? When I look at my linux syslog or windows event log I'm not provided a timezone. It's implied that it would be the local device's timezone.

 

Cheers,
Graham
PaulRoberts
New Contributor III


@gfleming wrote:

Just to be clear, your recommendation to make all of the fields tab-delimited would never be implemented. It would make the readability far worse (columns would not line up in any predictable manner). So in this case the use of mixed tabs and spaces is best used for readability. Either way, parsing is done via API as usual.

Let me be extra clear about this since you seem to keep ignoring me saying it...  The way things stand now, fields run into each other and fields longer than 20 characters cause everything to shift to the right which is already a legibility issue.  If I send a hostname or VDI field in a malicious lease request that contains "Any-19-characters someothernonsense " (and let me assure you this is allowed for VDI) then you've got no way to read that line correctly with 100% accuracy because I've made it utterly impossible.  If a simple change to a fully tab-separated format string "would never happen" then ya'll's company just ain't long for this earth.  You're crossing the 50% market-share threshold now which means playtime is over because your product is now an attractive target for blackhat developers who will not hand-wave away things like this.  They'll poke and prod at them until they've mapped the full scope and breadth of the problem or until a marketable vulnerability falls out.


@gfleming wrote:

My python output was just showing you that you could parse it despite your claims to the contrary. Obviously it was just a starting point and would require a bit more work to get things separated out properly but it's possible with the start of separating out the fields using str.splt(). That's what I was tyring to show you. You are claiming it's impossible. It is not.

There's a couple of things wrong with what you were doing, and I'll repeat them since you don't seem to be reading very carefully.  The first is that it was amateurish and the second was that it was wrong.  One does not parse the same data twice (because this basically guarantees you're wasting clock cycles).  Scanning subsections of that (i.e., the parts you chopped it into in the first pass) for different criteria is fine, but the same input multiple times in different ways is a hard no because it will always be inefficient and sometimes not even possible because of restrictions on the device or the data stream itself.  ...and you certainly can't use str.split on a line full of whitespace when several of those fields can themselves contain whitespace because that's just dead wrong.  Sure you can lop the date off the end of the resulting list because you know how many fields it contains, but both the VDI and the SSID can happily contain whitespace and at that point your method breaks even more.  Some people would have considered me casually posting a very long and nasty-looking regular expression (which would have been longer and nastier-looking but I didn't want to presume localization) to be a hint.  This is some stuff I've known about for more than thirty years ago when I was just a kid, so wtf man.  You're also literally relying on an admins eyeballs for the correct interpretation of the info, and those eyeballs are generally the weakest components in the entire system.  The VDI field can readily be 255 characters which is enough space for someone feeling lucky about guessing line lengths to spoof an entire second line of fake information and frankly I don't even want to know if that information still needs to be sanitized further to prevent someone from pulling a warez-kid stunt from the 90's and just starting their VDI with newline-uparrow so they can overwrite the entire line and hide their actual information.  Thanks to the existance of unicode, programmers have a reason to actually not de-taint that data right away and I didn't notice anything in the RFCs about requiring only printable ASCII.  Coding with great precision and not just going "ah it's good enough for the admin to read" eliminates ever having to ask those kinds of questions and it's literally not my job to find your vulns for you, but I can certainly think of some people whose jobs it is to at least find them.


...and I have no idea why you keep ignoring me saying that after a few results what's coming out for what should be the enumerated number of the DHCP scope which would typically be an unsigned int of no more than two characters (unless someone's staff has absolutely lost their minds with individual DHCP scopes for every port on a switch or something) is coming back as signed long integers.  I've literally got lots of numbers like -151738136 showing up in the list, and that sort of attention to detail in coding makes red-teamers breathe funny.  There is no reality in which that code is reading the correct part of the struct unless the struct itself is being constructed improperly and either of these things going unnoticed is actually quite bad (does the compiled binary somehow not even notice a segmentation problem?).  So much for trying to downplay that so as not to draw attention.


@gfleming wrote:

Timezone thing is weird.. I'm not sure why you'd want it to show it by default. I mean it's clearly in the device's time zone. Why would output be in a different timezone? When I look at my linux syslog or windows event log I'm not provided a timezone. It's implied that it would be the local device's timezone.  


There's nothing weird about that, although it's not something people used to dealing with small networks are likely to realize.  Typically people who have a network containing devices in more than one time zone work this out on their own, but over time engineers dealing with times and dates tend to learn either from being told or from having to cleaning up the resulting disastrous mess.  (Try to avoid being in the latter group.  It'll always be worse than you think.)  Timestamps should preferentially be in GMT (since it's unambiguously convertible to every other timezone) or indicate their specific time zone (including daylight saving time status) or they simply stop being unambiguous time indicators (which might make them useless or even counter-productive).  The moment you begin doing any kind of centralized logging involving multiple time zones you're pretty much just wasting storage space unless those times contain the timezone from the very start.  Four extra bytes of output is just not an exorbitant cost for making life so much easier.

I'm not going to comment on Windows, but Linux admins know the simple solution for this (beyond making sure ntpd/chrony are actively keeping the clock as accurate as possible) when one is dealing with a WAN that crosses timezones is to just go ahead and set the system's timezone to GMT (because it doesn't care!) and use /etc/profile or whatever else is handy to set the timezone for the local users using the TZ variable (because they do care).  That way there's nothing vague or confusing about any of the timestamps in the (centralized) logs and no one ever has to resort to saying "Well, it came after this event in the logs so it must have happened after that event" because that's not guaranteed to be true either thanks to (yay) physics and network latency.  Recording the time in an unambigous way might also some day prevent you from being eaten alive by defense attorneys.  Even with respect to the human interface, having the time zone right there keeps the user from having to guess or need to be reminded that it's not their timezone that they're looking at.  Some folks might find this information rather important when it comes to leases that are within an hour of expiring.

TL;DR: Store as GMT (or epoch seconds), convert to time zones only when showing the user.  Always.

gfleming

All this effort in responding to me could have been spent opening a TAC case to talk about your potential server ID bug (which I did not ignore and recommended you open a TAC case as it sounded like you might be hitting a bug but you didn't want to do that because it was "not worth your time"). You've likely spent more time here than you would have with TAC.

 

Next, if we made all the fields tab-delimited you would be whining here again because nothing would line up at all (except the first three columns), even entries that had short VCIs. Once you hit the third column, hostnames, you'll have a lot of variety in field lengths and once you go tab-delimited from there all the following columns will be out of joint.  And yes I agree it's annoying that longer VCI fields cause misalignment but then the only soltuion there is to truncate and that would also **bleep** people off. It's not hard to recognize the next two fields as server ID and Expiry time. So it's probably the best solution as it stands to use tabs where needed and spaces where needed to make it as humanly readable as possible without losing data (i.e. truncating fields). Jumping to a conclusion that this is somehow a precursor to a security vulnerability is a bit much. And we are spending far too much time on this though. It's really not ever meant to be parsed. Please use the API. If you want VCI field to be truncated you can work with your account team to submit an NFR. 

 

But just to circle back on this you absolutely can parse this. There's a set number of characters between each field. It's easy to parse for each field based on these character counts regardless of if a field contains whitespace or not. But again..... it's not meant to be parsed. Use the API. And I'm laughing at you worrying about wasting clock cycles on a simple parse job. Sorry.

 

Again I don't know of any systems that default to showing time zones in local device output. Centralized DHCP lease logs from a Fortigate (FMG or FAZ) would work the same as your Linux syslog example. Timezones would be normalized. But again, for the use-case of having local device local output to show timezones you are free to submit an NFR if you think this would be useful to other users. 

Cheers,
Graham
Labels
Top Kudoed Authors