BGP is widely used dynamic routing protocol. It allows to run over IPsec tunnels which make it very useful to advertise routes over IPsec tunnels or in ADVPN.
First step when troubleshooting BGP is to list the BGP peers using the command below:
get router info bgp summary
VRF 0 BGP router identifier 192.168.2.1, local AS number 65551 BGP table version is 1 0 BGP AS-PATH entries 0 BGP community entries
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 192.168.2.2 4 65551 25 27 0 0 0 00:21:26 0
The first line shows the IP address (192.168.2.2) of remote BGP peer which is configured as a remote peer, second value is the AS of the remote peer which is 65551. The output also shows that the BGP is UP and running, the TCP session is established.
If the option 'set passive' is not configured on any of the FortiGates, this means that every of the peers will start to initiate a TCP session. The easiest way to find which peer initiated the BGP session is using the command below, which filters the session list by the BGP peer:
get system session list | grep 192.168.2.2 tcp 3587 192.168.2.1:10686 - 192.168.2.2:179 -
The output shows that the session is locally from FG_1 towards remote BGP peer 192.168.2.2, with this info could be used a filter to filter by source / destination IP the session list which will give more information for the ingress/egress interface , duration of the session and other useful information.
diagnose sys session filter dst 192.168.2.2
diagnose sys session list
session info: proto=6 proto_state=01 duration=403 expire=3583 timeout=3600 refresh_dir=both flags=00000000 socktype=0 sockport=0 av_idx=0 use=3 origin-shaper= reply-shaper= per_ip_shaper= class_id=0 ha_id=0 policy_dir=0 tunnel=/ tun_id=10.5.147.48/0.0.0.0 vlan_cos=0/255 state=log local statistic(bytes/packets/allow_err): org=1285/20/1 reply=1233/19/1 tuples=2 tx speed(Bps/kbps): 2/0 rx speed(Bps/kbps): 2/0 orgin->sink: org out->post, reply pre->in dev=14->19/19->14 gwy=0.0.0.0/0.0.0.0 hook=out dir=org act=noop 192.168.2.1:10686->192.168.2.2:179(0.0.0.0:0) hook=in dir=reply act=noop 192.168.2.2:179->192.168.2.1:10686(0.0.0.0:0) pos/(before,after) 0/(0,0), 0/(0,0) misc=0 policy_id=0 pol_uuid_idx=0 auth_info=0 chk_client_info=0 vd=0 serial=0001021c tos=ff/ff app_list=0 app=0 url_cat=0 rpdb_link_id=00000000 ngfwid=n/a npu_state=00000000 no_ofld_reason: local
The field duration shows before how many seconds this session was created, the field dev=14->19/19->14 shows the ingress and egress interface. This field indicates originating traffic flow from dev 14->19 and replying traffic flow from 19->14.
Dev=14 and dev=19 are not the names of the interfaces, they are the device_ID of interfaces which are assigned by FortiOS. Command below shows how to find the name of interfaces assigned to this device_id:
get router info kernel | grep dev=14 tab=65535 vf=0 vrf=0 scope=253 type=3 proto=2 prio=0 0.0.0.0/0.0.0.0/0->127.0.0.0/32 pref=127.0.0.1 gwy=0.0.0.0 dev=14(root)
get router info kernel | grep dev=19 tab=65535 vf=0 vrf=0 scope=254 type=2 proto=2 prio=0 0.0.0.0/0.0.0.0/0->192.168.2.1/32 pref=192.168.2.1 gwy=0.0.0.0 dev=19(VPN_1)
The output below shows that the BGP session with peer 192.168.2.2 is down:
get router info bgp summary
VRF 0 BGP router identifier 192.168.2.1, local AS number 65551 BGP table version is 1 0 BGP AS-PATH entries 0 BGP community entries
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 192.168.2.2 4 65551 54 53 0 0 0 never Connect
The state 'Connect' means that local FortiGate is trying to open TCP session with the remote peer, but there is no reply back from that peer.
The first step is to perform a sniffer and notice if there is traffic to this peer and if it is routed via the correct interface:
diagnose sniffer packet any " host 192.168.2.1 or host 192.168.2.2" 4 Using Original Sniffing Mode interfaces=[any] filters=[ host 192.168.2.1 or host 192.168.2.2] 12.276459 VPN_1 out 192.168.2.1.11098 -> 192.168.2.2.179: syn 2902306357 <--- sent out TCP SYN packet 20.516425 VPN_1 out 192.168.2.1.11098 -> 192.168.2.2.179: syn 2902306357 <--- sent out TCP SYN packet 20.516425 VPN_1 out 192.168.2.1.11098 -> 192.168.2.2.179: syn 2902306357 <--- sent out TCP SYN packet 34.290177 VPN_1 in 192.168.2.3.9294 -> 192.168.2.1.179: syn 2296280879 <--- received TCP SYN packet 34.290177 VPN_1 in 192.168.2.3.9294 -> 192.168.2.1.179: syn 2296280879 34.290574 VPN_1 out 192.168.2.1.179 -> 192.168.2.3.9294: syn 1281151053 ack 2296280880 <--- SYN/ACK 34.290574 VPN_1 out 192.168.2.1.179 -> 192.168.2.3.9294: syn 1281151053 ack 2296280880 34.290921 VPN_1 in 192.168.2.3.9294 -> 192.168.2.1.179: ack 1281151054 34.290921 VPN_1 in 192.168.2.3.9294 -> 192.168.2.1.179: ack 1281151054 34.291033 VPN_1 in 192.168.2.3.9294 -> 192.168.2.1.179: psh 2296280880 ack 1281151054 34.291033 VPN_1 in 192.168.2.3.9294 -> 192.168.2.1.179: psh 2296280880 ack 1281151054 34.291064 VPN_1 out 192.168.2.1.179 -> 192.168.2.3.9294: ack 2296280965 34.291064 VPN_1 out 192.168.2.1.179 -> 192.168.2.3.9294: ack 2296280965 34.291091 VPN_1 out 192.168.2.1.179 -> 192.168.2.3.9294: rst 1281151054 ack 2296280965 <--- RST 34.291091 VPN_1 out 192.168.2.1.179 -> 192.168.2.3.9294: rst 1281151054 ack 2296280965
The sniffer above shows that FortiGate is trying to open a TCP session with peer 192.168.2.2 (local IP address is 192.168.2.1) but the remote side is also trying to open a TCP session with source IP 192.168.2.3. This one could indicate a problem with the wrong BGP peer IP address. Because there is a mismatch between the local and remote BGP peers, the local FortiGate sends an RST packet.
Another potential factor to look out for when BGP is running on top of IPsec is to make sure that the local and remote BGP peer IP addresses are included in phase-2 selectors. The sniffer below shows this problem. The local BGP peer IP address is sending TCP SYN packets, but there are no received packets:
diagnose sniffer packet any " host 192.168.2.1" Using Original Sniffing Mode interfaces=[any] filters=[ host 192.168.2.1 ]
8.353618 VPN_1 out 192.168.2.1.11324 -> 192.168.2.2.179: syn 41379263 8.353618 VPN_1 out 192.168.2.1.11324 -> 192.168.2.2.179: syn 41379263 ............................................................... 40.633329 VPN_1 out 192.168.2.1.11324 -> 192.168.2.2.179: syn 41379263 40.633329 VPN_1 out 192.168.2.1.11324 -> 192.168.2.2.179: syn 41379263
Using the command 'fnsysctl ifconfig VPN_1' could show if there are dropped packets on the exit interface:
fnsysctl ifconfig VPN_1 VPN_1 Link encap:Unknown inet addr:192.168.2.1 Mask:255.255.255.255 UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1420 Metric:1 RX packets:6844 errors:0 dropped:0 overruns:0 frame:0 TX packets:13571 errors:112 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:311469 (304.2 KB) TX bytes:10176292 (9.7 MB)
fnsysctl ifconfig VPN_1 VPN_1 Link encap:Unknown inet addr:192.168.2.1 Mask:255.255.255.255 UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1420 Metric:1 RX packets:6844 errors:0 dropped:0 overruns:0 frame:0 TX packets:13571 errors:113 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:311469 (304.2 KB) TX bytes:10176292 (9.7 MB)
The errors indicate a problem with traffic which should be sent out to remote peer: it is dropped instead. Using debug flow, information may be gained on why traffic is dropped:
diagnose debug reset diagnose debug disable diagnose debug flow filter daddr 192.168.2.2 diagnose debug flow show function-name enable diagnose debug flow show iprope enable diagnose debug console timestamp enable diagnose debug flow trace start 999 diagnose debug enable
id=65308 trace_id=8 func=print_pkt_detail line=5998 msg="vd-root:0 received a packet(proto=6, 192.168.2.1:11364->192.168.2.2:179) tun_id=0.0.0.0 from local. flag [S], seq 634790851, ack 0, win 27600" id=65308 trace_id=8 func=init_ip_session_common line=6198 msg="allocate a new session-00018992" id=65308 trace_id=8 func=iprope_dnat_check line=5558 msg="in-[], out-[VPN_1]" id=65308 trace_id=8 func=iprope_dnat_tree_check line=826 msg="len=0" id=65308 trace_id=8 func=iprope_dnat_check line=5583 msg="result: skb_flags-00000000, vid-0, ret-no-match, act-accept, flag-00000000" id=65308 trace_id=8 func=ip_session_confirm_final line=3203 msg="npu_state=0x0, hook=4" id=65308 trace_id=8 func=ipsecdev_hard_start_xmit line=662 msg="enter IPSec interface VPN_1, tun_id=0.0.0.0" id=65308 trace_id=8 func=_do_ipsecdev_hard_start_xmit line=222 msg="output to IPSec tunnel VPN_1, tun_id=10.5.147.48, vrf 0" id=65308 trace_id=8 func=ipsec_common_output4 line=886 msg="SA is not ready yet, drop"
The error indicates that there is a problem with the encryption domain, and IPsec phase-2 selectors need to be checked on both sides of the IPSec tunnel.
When BGP runs over IPSec is good practice to check if the VPN is UP and running using the commands below :
diagnose vpn ike gateway list name VPN_1
vd: root/0 name: VPN_1 version: 2 interface: port1 3 addr: 10.5.147.49:500 -> 10.5.147.48:500 tun_id: 10.5.147.48/::10.5.147.48 remote_location: 0.0.0.0 network-id: 0 transport: UDP virtual-interface-addr: 192.168.2.1 -> 192.168.2.2 created: 23s ago pending-queue: 0 PPK: no IKE SA: created 1/1 IPsec SA: created 1/1
id/spi: 14 faa9b69ef23aa946/0000000000000000 direction: responder status: connecting, state 3, started 23s ago
diagnose vpn tunnel list name VPN_1 list ipsec tunnel by names in vd 0 ------------------------------------------------------ name=VPN_1 ver=2 serial=1 10.5.147.49:0->10.5.147.48:0 nexthop=0.0.0.0 tun_id=10.5.147.48 tun_id6=::10.5.147.48 status=down dst_mtu=1500 weight=1 bound_if=3 real_if=3 lgwy=static/1 tun=intf mode=auto/1 encap=none/552 options[0228]=npu frag-rfc run_state=0 role=primary accept_traffic=0 overlay_id=0
proxyid_num=1 child_num=0 refcnt=4 ilast=1176 olast=1176 ad=/0 stat: rxp=0 txp=0 rxb=0 txb=0 dpd: mode=on-demand on=0 status=fail idle=20000ms retry=3 count=0 seqno=1 natt: mode=none draft=0 interval=0 remote_port=0 fec: egress=0 ingress=0 proxyid=VPN1 proto=0 sa=0 ref=2 serial=1 auto-negotiate src: 0:0.0.0.0-255.255.255.255:0 dst: 0:0.0.0.0-255.255.255.255:0
diagnose sniffer packet any " host 192.168.2.1" 4 Using Original Sniffing Mode interfaces=[any] filters=[ host 192.168.2.1] 7.589302 port1 out 192.168.2.1.11516 -> 192.168.2.2.179: syn 3731331284 7.589302 port1 out 192.168.2.1.11516 -> 192.168.2.2.179: syn 3731331284
Because the IPsec is down, FortiGate is trying to reach the remote BGP peer using the default route via port1.
Another useful set of commands that can be used for troubleshooting the BGP is as follows:
SSH No1:
diagnose ip router bgp all enable diagnose ip router bgp level info diagnose debug console timestamp enable diagnose debug enable
SSH No2:
diagnose sniffer packet any "host x.x.x.x and host y.y.y.y and port 179" 6 <----- Where x.x.x.x and y.y.y.y are the BGP peering involved.
|