Skip to main content
New Contributor III
June 16, 2008
Question

Regular Expressions Examples

  • June 16, 2008
  • 15 replies
  • 32086 views
I thought it might be a good idea to start a thread where we can all post examples of Regular Expressions that we use to block spam. Or maybe some good web sites that we use to look up expressions. This is to help users who aren' t familiar with Regex (like I was when I first got my FG) to get them started and perhaps for all of us to find better expressions to use to keep spam to a minimum. Perhaps if this thread is useful it could be stickied to make it easier to find...

    15 replies

    New Contributor III
    June 16, 2008
    One example I have is:
    (?i) c[i|1][a|4][i|l|1|!][i|l|1|!][s|z]
    This matches many derivatives of " cialis" which is now a common word in spam messages. I use (?i) to disable case sensitivity... don' t know if this is technically correct but it works. I inserted the white space in front of the word to prevent false matches with words such as specialist.
    rwpatterson
    New Member
    June 17, 2008
    / c[i|1][a|4][i|l|1|!][i|l|1|!][s|z]/i should do the trick. What you have is looking for one or no ' i' s at the beginning.... My quick and dirty web site for refreshers is by Rex Swain.
    New Contributor III
    June 17, 2008
    Thanks guys... I tweaked mine to look for @ and fixed the case sensitivity check on each of my RegExs. Here are some others I use: /v[i|1][a|4|@][g|r][g|r][a|e|4|@]/i /pen[i|1|u|!]s/i /d[i|1]p[l|1][o|0]m[a|4]s?/i /Mast[e|3][e|3]rMBA/i /r[e|3]p[l|1|i][l|1|i]c[a|4]s? /i / w[a|4]tch[e|3]s/i / r[o|0][l|1|!][e|3]x/i /br[e|3][i|1|!]t[l|1|!][i|1|!]ng/i /Bach[e|3][e|3]lor/i /D[o|0]ct[o|0]r[a|4][a|4]te/i I gave all of these a rating of five with a threshhold of 10. As I add more I think I am going to change it so each item has a lower value to be more certain that I' m not causing false positives. I am going to change all of the " A" s to check for @ as well.
    New Contributor III
    June 17, 2008
    here is mine /\bc[i1\|l\!\¡]+[a4@]+[i1\|l\!]+[i1\|l\!\¡]+[sz5]+/i
    noiz
    New Member
    June 18, 2008
    guys how about email from unknown. if you using linux based email server you can see the log for the email showing from:<> to etc@etc.com. any way to block this from:<>
    New Contributor III
    June 18, 2008
    Guys, very useful.
    zaskar
    New Member
    August 28, 2008
      Hi all,    does anyone know how to match Unicode characters with regular expression in   Antispam Banned Words?    For example say I want to match the registered sign ® followed by " some   text"  in the subject of incoming mail:  I tried the following patterns;    /.*® some text/i  /.*some text/i  but it appears that a mail subject containing the ® character bypass the   antispam filter.   If I remove che ® from the test mail the second pattern block it.    Pattern with the Perl pattern \u00AE is not accepted by Fortigate GUI.  Any suggestion?
    Thanks Marco --------------------------------------------- Fortigate FGT200 2.8 build 489[size=1][/size][size=4][/size]
    New Contributor III
    June 25, 2009
    I need some help here... I used the following regex to filter for links to .cn domains in incoming emails. Or well, I tried to, but it doesn' t work. /\.cn/i I also used the following to check for the word unsubscribe in the same message. Every spam that has been slipping through lately has these two elements in them. /unsubscribe/i I have given these two items a high enough score that if they are in the same message it should always be blocked. And yet they are still coming through. I probably should have started a new thread for this... but thought it might be nice to keep all of this stuff together to help someone find it in the future. Thanks in advance! Neal
    abelio
    SuperUser
    SuperUser
    June 26, 2009
    Hi, what do the AS logs say? Nowadays spam includes those chinese url embedded in image files, so your regexp will fail. I' ve tried with /https?://.\w+\.\w+\.cn/i (in body) as banned word with more or less success..
    New Contributor III
    June 26, 2009
    Well my AS logs don' t say much. I can only see a log entry for when a message is determined to be spam... it doesn' t show results of the scanning process or anything. I have examined the messages that I have received and it doesn' t appear as though the links are embedded in any images. When I look at the source of the message it has those hyperlinks in it. It is strange because I have used a regex tester and verified that the syntax I use should work... and yet it' s not. seems to be the case sensitivity switch that buggers it up. I have had that problem in the past. Maybe I' ll turn that off. What does the .\w+ do? Is that roughly the equivalent to a wild card? Thanks!
    abelio
    SuperUser
    SuperUser
    June 26, 2009
    Well my AS logs don' t say much. I can only see a log entry for when a message is determined to be spam... it doesn' t show results of the scanning process or anything.
    we´ll expect to see something like " The email contains banned word(s).(regexp expression, etc) under " Message" column Re-check you relevant SMTP traffic profile for enabling antispam logging
    What does the .\w+ do? Is that roughly the equivalent to a wild card?
    \w stands for a word [A-Za-z0-9_] (alphanumeric characters plus " _" ) and + stands for matching the preceding element one or more times
    New Contributor III
    June 26, 2009
    we´ll expect to see something like " The email contains banned word(s).(regexp expression, etc) under " Message" column
    I do see those in the log. Not nearly as often and I expect that I should see them when considering how many of these messages have been getting through. I read your other post in the other thread I created... I shouldn' t have double posted this. I have a number of Regexs that all have a score of 5 and the threshold is 8. I do have emails getting blocked so somehow they must be cumulative. My thinking is that it' s not cumulative from the number of reoccurences of one expression... but that there is a cumulative score between the occurences of different expressions. So in my messages, if any two regex' s occur in the same message it should get blocked. I could be wrong but that' s how I figured it.
    \w stands for a word [A-Za-z0-9_] (alphanumeric characters plus " _" ) and + stands for matching the preceding element one or more times
    That' s good to know. That will come in handy.
    New Contributor III
    July 10, 2009
    /https?://.\w+\.\w+\.cn/i
    I tried this and it started blocking almost every URL under the sun. Then I realized that I had to use this: /https?:\/\/\w+\.\w+\.cn/i This allows the backslashes to be seen literally and not as a function. Plus I think that the first dot might have been a mistake. Not sure on that. I haven' t tested this yet but I am going to.
    New Contributor III
    July 10, 2009
    I just tested this and what ended up working for me was: /https?:\/\/.\w+\.\w+\.cn/i Not sure why the first dot was needed... but without it nothing matched. I have tested this somewhat and confirmed that it does not appear to be blocking other root domains.
    New Contributor III
    July 10, 2009
    Oh, and a good website for Perl RegEx stuff: http://www.troubleshooters.com/codecorn/littperl/perlreg.htm