Skip to main content
MJ_FTNT
Staff
Staff
April 20, 2023

Troubleshooting Tip: BGP hold timer expired/unspecified error subcode

  • April 20, 2023
  • 0 replies
  • 25271 views
Description This article describes how to troubleshoot an issue when receiving logs from BGP stating ‘Hold Timer Expired/Unspecified Error Subcode’.
Scope FortiGate and BGP.
Solution

From the message, there can be two possibilities:

 

  1. BGP Peer is not receiving the keepalive sent by the FortiGate, and the hold-down timer is expiring (or vice-versa). The most common potential causes are as follows:

Network connectivity issues:

There could be network connectivity issues between the FortiGate device and the BGP peer, such as a link failure, routing misconfiguration, or firewall rules blocking BGP traffic.

These issues could prevent the keepalive messages from reaching the BGP peer, causing the peer not to acknowledge receipt of the keepalives and eventually triggering the hold-down timer to expire.

 

MTU issues:

Packets exceeding the underlying link MTU with higher MTU bytes can be dropped by the intermediate L2 network, sometimes especially when the BGP keepalives are piggybacked with other BGP messages.

 

2. FortiGate is not sending the keepalive at all, which causes the BGP to flap and the hold-down timer to expire. The most common potential causes are as follows:

Note: If the BGP peering is between two vendors' devices (for example, FortiGate and Juniper Apstra), the best practice is to find the highest MTU supported by each side and set both sides to the MTU of the lower one/side if both do not support the same MTU size. Remember to set DF to 1 during the max MTU support size testing.

 

Software or hardware issues:

There could be software or hardware issues on the FortiGate device that prevent the proper functioning of BGP keepalive messages. This could include bugs, memory or CPU utilization issues, or hardware failures.

 

Routing issues:

There may be routing issues, such as incorrect routing tables or route advertisements, that are preventing the FortiGate device from sending BGP keepalive messages to the BGP neighbor.

 

To find the root cause for this, the following information should be collected using multiple PuTTY/SSH sessions:

 

1st Putty Session:

 

diagnose sniffer packet any "port 179" 6 0 l

 

2nd Putty Session:

 

diagnose sys top 2 50

 

3rd Putty Session:

 

diagnose ip router bgp all enable
diagnose ip router bgp level info
diagnose debug console timestamp enable

diagnose debug duration 0      <----- To run debug continuously until manually stopped.
diagnose debug enable

 

To disable debug:

 

diagnose debug reset

diagnose debug disable

 

3. In the case of BGP over IPsec, one other reason for this problem could be that it has been advertising the connected routes to the remote BGP peer without any filters.

 

This advertisement will then include the IPsec Tunnel interface IP, and it is the same IP on which the BGP is established. Therefore, if the remote peer installs this learned route in its routing table (if it satisfies the conditions like longest-prefix), the BGP keepalives are then not sent back correctly.

 

For example, in the outputs below, the HUB sent out the SYN-ACK to a different tunnel because of the incorrect routing, broke the BGP neighborship with the spoke:

 

  • In spoke, the packet sniffer showed that BGP packets were sent out, but nothing came back:


2026-03-04 13:20:21.158556 HUB1-VPN1 out 172.16.16.5.10068 -> 172.16.16.253.179: syn 4181818667
2026-03-04 13:20:24.927653 HUB1-VPN1 out 172.16.16.5.10070 -> 172.16.16.253.179: syn 3963906063
2026-03-04 13:20:25.958563 HUB1-VPN1 out 172.16.16.5.10070 -> 172.16.16.253.179: syn 3963906063

 

  • In HUB at the same time, the packet sniffer showed the incoming SYN packets from the spoke, and HUB already sent out the SYN-ACK packets, but there was no ACK packet to complete the TCP-3-way handshake:

2026-03-04 13:20:24.927048 VPN1 in 172.16.16.5.10070 -> 172.16.16.253.179: syn 3963906063
2026-03-04 13:20:24.927071 VPN1 out 172.16.16.253.179 -> 172.16.16.5.10070: syn 3738288167 ack 3963906064
2026-03-04 13:20:25.958025 VPN1 in 172.16.16.5.10070 -> 172.16.16.253.179: syn 3963906063
2026-03-04 13:20:25.958032 VPN1 out 172.16.16.253.179 -> 172.16.16.5.10070: syn 3738288167 ack 3963906064

 

  • The debug flow (on HUB) showed that the SYN-ACK packets were sent to the incorrect tunnel VPN1_1 (tun_id=172.16.16.2), instead of VPN1 (tun_id=172.16.16.1). This was why the spoke could not get the SYN-ACK packets from the HUB:
 
2026-03-24 09:23:23 id=65308 trace_id=4625 func=print_pkt_detail line=6336 msg="vd-root:0 received a packet(proto=6, 172.16.16.253:179->172.16.16.1:21912) tun_id=0.0.0.0 from local. flag [S.], seq 3117379338, ack 1812746166, win 27360"
2026-03-24 09:23:23 id=65308 trace_id=4625 func=resolve_ip_tuple_fast line=6444 msg="Find an existing session, id-00029527, reply direction"
2026-03-24 09:23:23 id=65308 trace_id=4625 func=ip_session_core_in line=7058 msg="dir-1, tun_id=172.16.16.1"
2026-03-24 09:23:23 id=65308 trace_id=4625 func=ipsecdev_hard_start_xmit line=662 msg="enter IPSec interface VPN1, tun_id=172.16.16.1"
2026-03-24 09:23:23 id=65308 trace_id=4625 func=_do_ipsecdev_hard_start_xmit line=222 msg="output to IPSec tunnel VPN1_1, tun_id=172.16.16.2, vrf 0"
2026-03-24 09:23:23 id=65308 trace_id=4625 func=esp_output4 line=910 msg="IPsec encrypt/auth"
2026-03-24 09:23:23 id=65308 trace_id=4625 func=nipsec_set_ipsec_sa_enc line=1014 msg="Trying to offload IPsec encrypt SA (p1/p2/spi={VPN1_1/VPN1/0x402cbad7}), npudev=
-1, skb-dev=port1"
2026-03-24 09:23:23 id=65308 trace_id=4625 func=nipsec_set_ipsec_sa_enc line=1063 msg="IPSec encrypt SA (p1/p2/spi={VPN1_1/VPN1/0x402cbad7}) offloading-check failed, r
eason_code=2."
2026-03-24 09:23:23 id=65308 trace_id=4625 func=ipsec_output_finish line=679 msg="send to 0.0.0.0 via intf-port1"
 
The root cause is a routing issue on HUB:
 
B 172.16.16.1/32  [200/0] via 172.16.16.2 (recursive is directly connected, VPN1), 00:00:24, [9999/0]
                  [200/0] via 172.16.16.1 (recursive is directly connected, VPN1), 00:00:24, [9999/0]
 
Due to the spokes' config:

config router bgp
    config redistribute "connected"
        set status enable
    end

 

To resolve this issue, reconfigure the spoke using route-map in connected redistribution's settings:

 

config router bgp
    config redistribute "connected"
        set status enable
        set route-map "map1"
    end
 
config router route-map
    edit "map1"
        config rule
            edit 1
                set match-ip-address "perf1-no-172.16.16.0/24"
                unset set-ip-prefsrc
            next
        end
    next
end
 
 
config router prefix-list
    edit "perf1-no-172.16.16.0/24"
        config rule
            edit 1
                set action deny
                set prefix 172.16.16.0 255.255.254.0
                set ge 24
                set le 32
            next
            edit 2
                set prefix any
                unset ge
                unset le
            next
        end
    next
end