Contributor III

Question

Regular Expressions Examples

Forum|Forum|18 years ago
June 16, 2008
48 replies
32210 views

I thought it might be a good idea to start a thread where we can all post examples of Regular Expressions that we use to block spam. Or maybe some good web sites that we use to look up expressions. This is to help users who aren' t familiar with Regex (like I was when I first got my FG) to get them started and perhaps for all of us to find better expressions to use to keep spam to a minimum. Perhaps if this thread is useful it could be stickied to make it easier to find...

A

Anonymous_UserAuthor

Contributor III

One example I have is:

(?i) c[i|1][a|4][i|l|1|!][i|l|1|!][s|z]

This matches many derivatives of " cialis" which is now a common word in spam messages. I use (?i) to disable case sensitivity... don' t know if this is technically correct but it works. I inserted the white space in front of the word to prevent false matches with words such as specialist.

rwpatterson

New Member

/ c[i|1][a|4][i|l|1|!][i|l|1|!][s|z]/i should do the trick. What you have is looking for one or no ' i' s at the beginning.... My quick and dirty web site for refreshers is by Rex Swain.

A

Anonymous_UserAuthor

Contributor III

Thanks guys... I tweaked mine to look for @ and fixed the case sensitivity check on each of my RegExs. Here are some others I use: /v[i|1][a|4|@][g|r][g|r][a|e|4|@]/i /pen[i|1|u|!]s/i /d[i|1]p[l|1][o|0]m[a|4]s?/i /Mast[e|3][e|3]rMBA/i /r[e|3]p[l|1|i][l|1|i]c[a|4]s? /i / w[a|4]tch[e|3]s/i / r[o|0][l|1|!][e|3]x/i /br[e|3][i|1|!]t[l|1|!][i|1|!]ng/i /Bach[e|3][e|3]lor/i /D[o|0]ct[o|0]r[a|4][a|4]te/i I gave all of these a rating of five with a threshhold of 10. As I add more I think I am going to change it so each item has a lower value to be more certain that I' m not causing false positives. I am going to change all of the " A" s to check for @ as well.

A

Anonymous_UserAuthor

Contributor III

here is mine /\bc[i1\|l\!\Â¡]+[a4@]+[i1\|l\!]+[i1\|l\!\Â¡]+[sz5]+/i

noiz

New Member

guys how about email from unknown. if you using linux based email server you can see the log for the email showing from:<> to etc@etc.com. any way to block this from:<>

A

Anonymous_UserAuthor

Contributor III

Guys, very useful.

zaskar

New Member

  Hi all,    does anyone know how to match Unicode characters with regular expression in   Antispam Banned Words?    For example say I want to match the registered sign Â® followed by " some   text"  in the subject of incoming mail:  I tried the following patterns;    /.*Â® some text/i  /.*some text/i  but it appears that a mail subject containing the Â® character bypass the   antispam filter.   If I remove che Â® from the test mail the second pattern block it.    Pattern with the Perl pattern \u00AE is not accepted by Fortigate GUI.  Any suggestion?

Thanks Marco --------------------------------------------- Fortigate FGT200 2.8 build 489[size=1][/size][size=4][/size]

A

Anonymous_UserAuthor

Contributor III

I need some help here... I used the following regex to filter for links to .cn domains in incoming emails. Or well, I tried to, but it doesn' t work. /\.cn/i I also used the following to check for the word unsubscribe in the same message. Every spam that has been slipping through lately has these two elements in them. /unsubscribe/i I have given these two items a high enough score that if they are in the same message it should always be blocked. And yet they are still coming through. I probably should have started a new thread for this... but thought it might be nice to keep all of this stuff together to help someone find it in the future. Thanks in advance! Neal

abelio

SuperUser

Hi, what do the AS logs say? Nowadays spam includes those chinese url embedded in image files, so your regexp will fail. I' ve tried with /https?://.\w+\.\w+\.cn/i (in body) as banned word with more or less success..

A

Anonymous_UserAuthor

Contributor III

Well my AS logs don' t say much. I can only see a log entry for when a message is determined to be spam... it doesn' t show results of the scanning process or anything. I have examined the messages that I have received and it doesn' t appear as though the links are embedded in any images. When I look at the source of the message it has those hyperlinks in it. It is strange because I have used a regex tester and verified that the syntax I use should work... and yet it' s not. seems to be the case sensitivity switch that buggers it up. I have had that problem in the past. Maybe I' ll turn that off. What does the .\w+ do? Is that roughly the equivalent to a wild card? Thanks!

abelio

SuperUser

Well my AS logs don' t say much. I can only see a log entry for when a message is determined to be spam... it doesn' t show results of the scanning process or anything.

weÂ´ll expect to see something like " The email contains banned word(s).(regexp expression, etc) under " Message" column Re-check you relevant SMTP traffic profile for enabling antispam logging

What does the .\w+ do? Is that roughly the equivalent to a wild card?

\w stands for a word [A-Za-z0-9_] (alphanumeric characters plus " _" ) and + stands for matching the preceding element one or more times

A

Anonymous_UserAuthor

Contributor III

weÂ´ll expect to see something like " The email contains banned word(s).(regexp expression, etc) under " Message" column

I do see those in the log. Not nearly as often and I expect that I should see them when considering how many of these messages have been getting through. I read your other post in the other thread I created... I shouldn' t have double posted this. I have a number of Regexs that all have a score of 5 and the threshold is 8. I do have emails getting blocked so somehow they must be cumulative. My thinking is that it' s not cumulative from the number of reoccurences of one expression... but that there is a cumulative score between the occurences of different expressions. So in my messages, if any two regex' s occur in the same message it should get blocked. I could be wrong but that' s how I figured it.

\w stands for a word [A-Za-z0-9_] (alphanumeric characters plus " _" ) and + stands for matching the preceding element one or more times

That' s good to know. That will come in handy.

A

Anonymous_UserAuthor

Contributor III

/https?://.\w+\.\w+\.cn/i

I tried this and it started blocking almost every URL under the sun. Then I realized that I had to use this: /https?:\/\/\w+\.\w+\.cn/i This allows the backslashes to be seen literally and not as a function. Plus I think that the first dot might have been a mistake. Not sure on that. I haven' t tested this yet but I am going to.

A

Anonymous_UserAuthor

Contributor III

I just tested this and what ended up working for me was: /https?:\/\/.\w+\.\w+\.cn/i Not sure why the first dot was needed... but without it nothing matched. I have tested this somewhat and confirmed that it does not appear to be blocking other root domains.

A

Anonymous_UserAuthor

Contributor III

Oh, and a good website for Perl RegEx stuff: http://www.troubleshooters.com/codecorn/littperl/perlreg.htm