Lacework
Access helpful articles and other FAQs on Lacework
Kate_M
Community Manager
Community Manager
Article Id 339228
Description Using multiple fields in a Semi-join filter
Scope Using a Semi-join (in / not in) LQL query
Solution

 

If using a Semi-join (in / not in) LQL query, there may come a time when you'll want to use multiple fields to filter on the "in / not in" value. To show you what I mean and how we can do it, let's use an example. First we'll look at a query that uses a simple Semi-join with just one field then we’ll evolve it to use multiple fields.

 

Okay, let's say we want to get the names of containers that are making network requests to a specific IP address (e.g., 91.109.184.3). In order to do this, we'll use the following data sources and approach:

 

  1. LW_HA_CONNECTION_SUMMARY
    Contains summaries of all network connections. We'll use this to get the unique machine ID of the host machine(s) connecting to the specific IP address.
     
  2. LW_HE_MACHINES
    Contains details about host machines. Using the returned machine ID(s) from the LW_HA_CONNECTION_SUMMARY data source, we can get additional details about the host machine(s).
     
  3. LW_HE_CONTAINERS
    Contains details about containers running on the host machine(s). We'll use the 1:1 relationship between LW_HE_MACHINES and LW_HE_CONTAINERS to get the container details of the container(s) running on the host machine(s) connecting to the specific IP address.
     

When we put it altogether, we get the following LQL query:

---
queryId: LWCustom_ContainersMakingRequestsToIP
queryText: |-
{
source {
LW_HE_MACHINES M
WITH LW_HE_CONTAINERS C
}
filter {
M.MID::string in {
source {
LW_HA_CONNECTION_SUMMARY S,
ARRAY_TO_ROWS(S.ENDPOINT_DETAILS) ED
}
filter {
S.SRC_ENTITY_TYPE = "MachineUc"
AND ED:dst_ip_addr = "91.109.184.3"
}
return distinct {
S.SRC_ENTITY_ID:mid::string AS MID
}
}
}
return distinct {
C.CONTAINER_NAME,
C.CONTAINER_ID
}
}

We use the LW_HA_CONNECTION_SUMMARY data source to get the machine ID(s) of the host machine(s) connecting to the specific IP address, 91.109.184.3. Then we do a semi-join to relate the LW_HE_MACHINES data source to the LW_HA_CONNECTION_SUMMARY data source using the machine ID(s). Finally, we take advantage of the 1:1 relationship between LW_HE_MACHINES and LW_HE_CONTAINERS to get the container details of the container(s) running on the host machine(s) connecting to the specific IP address.

 

Easy enough, right? Well, not so fast. If there is a host machine that is running multiple containers then our current query will return the names of all containers running on that host. This is an issue because we just want to know the specific container(s) who are making connections to the specified IP address, not the container(s) running on the host making the connection.

 

In order to fix this, we can use the internal IP address that is part of the LW_HA_CONNECTION_SUMMARY's ENDPOINT_DETAILS field to further filter the results so that only the container(s) making the connection are returned. First, we'll need to add another return value to the LW_HA_CONNECTION_SUMMARY query, ED:src_ip_addr. Then we'll need to add another filter field to the semi-join query that limits the results to only the container(s) making the connection. So the updated LQL becomes:

---
queryId: LWCustom_ContainersMakingRequestsToIP
queryText: |-
{
source {
LW_HE_MACHINES M
WITH LW_HE_CONTAINERS C
}
filter {
(M.MID::string, C.IPV4::string) in {
source {
LW_HA_CONNECTION_SUMMARY S,
ARRAY_TO_ROWS(S.ENDPOINT_DETAILS) ED
}
filter {
S.SRC_ENTITY_TYPE = "MachineUc"
AND ED:dst_ip_addr = "91.109.184.3"
AND ED:src_ip_addr LIKE "192.168.%.%"
}
return distinct {
S.SRC_ENTITY_ID:mid::string AS MID,
ED:src_ip_addr::string AS IPV4
}
}
}
return distinct {
C.CONTAINER_NAME,
C.CONTAINER_ID
}
}

By adding another return value to the LW_HA_CONNECTION_SUMMARY query, return ... ED:src_ip_addr::string AS IPV4, then adding another filter field on the semi-join for that new return value, (M.MID::string, C.IPV4::string) in {...}, we are now able to limit the results to only the container(s) making the connection to the specified IP address.

 

 

Contributors