Solution |
During an ongoing RDP session, both FortiGates have sessions for the traffic between the User VM and to RDP server. In this example, fortigate_1 is handling the session between the user and the RDP server. When Fortigate_1 is rebooted, ILB switches the traffic to the second FortiGate after the default probe fail timer of 5 seconds. However, the session is not picked up by fortigate_2 and instead, a new session is created on fortigate_2.
FortiGate 1:
diagnose sys session list
session info: proto=6 proto_state=01 duration=527 expire=3599 timeout=3600 flags=00000000 socktype=0 sockport=0 av_idx=0 use=3 origin-shaper= reply-shaper= per_ip_shaper= class_id=0 ha_id=0 policy_dir=0 tunnel=/ vlan_cos=0/255 state=log may_dirty synced f00 statistic(bytes/packets/allow_err): org=275518/2467/1 reply=1364677/3538/1 tuples=2 tx speed(Bps/kbps): 65/0 rx speed(Bps/kbps): 352/2 orgin->sink: org pre->post, reply pre->post dev=5->5/5->5 gwy=10.244.255.193/10.244.255.193 hook=pre dir=org act=noop 10.10.53.235:62230->10.244.100.6:3389(0.0.0.0:0) hook=post dir=reply act=noop 10.244.100.6:3389->10.10.53.235:62230(0.0.0.0:0) pos/(before,after) 0/(0,0), 0/(0,0) src_mac=c0:d6:82:93:e8:15 dst_mac=12:34:56:78:9a:bc misc=0 policy_id=1 pol_uuid_idx=15744 auth_info=0 chk_client_info=0 vd=0 serial=0013788e tos=ff/ff app_list=0 app=0 url_cat=0 rpdb_link_id=00000000 ngfwid=n/a npu_state=0x000100 no_ofld_reason: npu-flag-off total session 1
diag deb ena
2024-07-10 12:50:46 id=65308 trace_id=3 func=print_pkt_detail line=5894 msg="vd-root:0 received a packet(proto=6, 10.244.100.6:3389->10.10.53.235:62230) tun_id=0.0.0.0 from port2. flag [.], seq 2135323557, ack 2095182362, win 63161" 2024-07-10 12:50:46 id=65308 trace_id=3 func=resolve_ip_tuple_fast line=5982 msg="Find an existing session, id-0013788e, reply direction" 2024-07-10 12:50:46 id=65308 trace_id=3 func=ipv4_fast_cb line=53 msg="enter fast path" 2024-07-10 12:50:46 id=65308 trace_id=4 func=print_pkt_detail line=5894 msg="vd-root:0 received a packet(proto=6, 10.10.53.235:62230->10.244.100.6:3389) tun_id=0.0.0.0 from port2. flag [.], seq 2095182362, ack 2135323608, win 1028" 2024-07-10 12:50:46 id=65308 trace_id=4 func=resolve_ip_tuple_fast line=5982 msg="Find an existing session, id-0013788e, original direction" 2024-07-10 12:50:46 id=65308 trace_id=4 func=ipv4_fast_cb line=53 msg="enter fast path"
FortiGate 2:
diagnose sys session list
session info: proto=6 proto_state=01 duration=519 expire=3080 timeout=3600 flags=00000000 socktype=0 sockport=0 av_idx=0 use=3 origin-shaper= reply-shaper= per_ip_shaper= class_id=0 ha_id=0 policy_dir=0 tunnel=/ vlan_cos=0/255 state=log dirty may_dirty f00 syn_ses statistic(bytes/packets/allow_err): org=0/0/0 reply=0/0/0 tuples=2 tx speed(Bps/kbps): 0/0 rx speed(Bps/kbps): 0/0 orgin->sink: org pre->post, reply pre->post dev=5->5/5->5 gwy=0.0.0.0/0.0.0.0 hook=pre dir=org act=noop 10.10.53.235:62230->10.244.100.6:3389(0.0.0.0:0) hook=post dir=reply act=noop 10.244.100.6:3389->10.10.53.235:62230(0.0.0.0:0) pos/(before,after) 0/(0,0), 0/(0,0) misc=0 policy_id=1 pol_uuid_idx=0 auth_info=0 chk_client_info=0 vd=0 serial=0013788e tos=ff/ff app_list=0 app=0 url_cat=0 rpdb_link_id=00000000 ngfwid=n/a npu_state=0x000100 no_ofld_reason: npu-flag-off total session 1
2024-07-10 12:52:28 id=65308 trace_id=1 func=print_pkt_detail line=5894 msg="vd-root:0 received a packet(proto=6, 10.10.53.235:63549->10.244.100.6:3389) tun_id=0.0.0.0 from port2. flag [S], seq 2446168376, ack 0, win 64260" 2024-07-10 12:52:28 id=65308 trace_id=1 func=init_ip_session_common line=6080 msg="allocate a new session-0003f341, tun_id=0.0.0.0" 2024-07-10 12:52:28 id=65308 trace_id=1 func=iprope_dnat_check line=5281 msg="in-[port2], out-[]" 2024-07-10 12:52:28 id=65308 trace_id=1 func=iprope_dnat_tree_check line=824 msg="len=0" 2024-07-10 12:52:28 id=65308 trace_id=1 func=iprope_dnat_check line=5293 msg="result: skb_flags-02000000, vid-0, ret-no-match, act-accept, flag-00000000" 2024-07-10 12:52:28 id=65308 trace_id=1 func=__vf_ip_route_input_rcu line=1990 msg="find a route: flag=00000000 gw-10.244.255.193 via port2" 2024-07-10 12:52:28 id=65308 trace_id=1 func=iprope_fwd_check line=768 msg="in-[port2], out-[port2], skb_flags-02000000, vid-0, app_id: 0, url_cat_id: 0" 2024-07-10 12:52:28 id=65308 trace_id=1 func=__iprope_tree_check line=524 msg="gnum-100004, use int hash, slot=45, len=3" 2024-07-10 12:52:28 id=65308 trace_id=1 func=__iprope_check_one_policy line=2033 msg="checked gnum-100004 policy-6, ret-no-match, act-accept" 2024-07-10 12:52:28 id=65308 trace_id=1 func=__iprope_check_one_policy line=2033 msg="checked gnum-100004 policy-1, ret-matched, act-accept" 2024-07-10 12:52:28 id=65308 trace_id=1 func=__iprope_user_identity_check line=1807 msg="ret-matched" 2024-07-10 12:52:28 id=65308 trace_id=1 func=__iprope_check line=2281 msg="gnum-4e20, check-0000000094f0b62a" 2024-07-10 12:52:28 id=65308 trace_id=1 func=__iprope_check_one_policy line=2033 msg="checked gnum-4e20 policy-6, ret-no-match, act-accept" 2024-07-10 12:52:28 id=65308 trace_id=1 func=__iprope_check_one_policy line=2033 msg="checked gnum-4e20 policy-6, ret-no-match, act-accept" 2024-07-10 12:52:28 id=65308 trace_id=1 func=__iprope_check_one_policy line=2033 msg="checked gnum-4e20 policy-6, ret-no-match, act-accept" 2024-07-10 12:52:28 id=65308 trace_id=1 func=__iprope_check line=2298 msg="gnum-4e20 check result: ret-no-match, act-accept, flag-00000000, flag2-00000000" 2024-07-10 12:52:28 id=65308 trace_id=1 func=__iprope_check_one_policy line=2251 msg="policy-1 is matched, act-accept" 2024-07-10 12:52:28 id=65308 trace_id=1 func=iprope_fwd_check line=805 msg="after iprope_captive_check(): is_captive-0, ret-matched, act-accept, idx-1" 2024-07-10 12:52:28 id=65308 trace_id=1 func=iprope_fwd_auth_check line=824 msg="after iprope_captive_check(): is_captive-0, ret-matched, act-accept, idx-1" 2024-07-10 12:52:28 id=65308 trace_id=1 func=fw_forward_handler line=989 msg="Allowed by Policy-1:" 2024-07-10 12:52:28 id=65308 trace_id=1 func=ip_session_confirm_final line=3113 msg="npu_state=0x100, hook=4"
This is a known issue for the Azure load balancer. Even though a health probe failed, it will not re-route the existing sessions. This is by design, intended to offer the administrator the opportunity to gracefully shut down the application to avoid any unexpected and sudden termination of the ongoing application workflow.
Note: The above example is for an A-A cluster behind ELB, however, this is an expected behavior for any kind of cluster (A-A, A-P)/standalone setup behind an Azure load balancer.
See this Microsoft article for more information.
|