The
[\\W]* between each of your spamAlphabet ranges should instead just be
\W* (since you're using the "gi" modifiers I assume you're running this in JavaScript or somewhere other than ColdFusion, but note that even in JavaScript you only need to escape the escape operator (e.g.,
\\W) when using the RegExp constructor, which you're not doing when building a regex like
/regex/gi).
However, note that
\W excludes the underscore character, which I assume you don't want to do. To avoid that, you could use
[\W_]*Also, you don't need to escape some of those characters you included on your sample list. The only literal characters that need to be escaped when put within
[] are "]" and "\" (also "^" if it is the first character within the square brackets).
sa_Joshua wrote:
I'm trying to filter out spam on my forum. I added two tables to my db. One is spamwords and another called spamAlphabet.
The spamAlphabet contains records like:
a aàáâåãäæ\@
b b6
c cç6
d db
e eéèêë3
f f4
g qgp9
h h4
i iìíîï¡1\|l\!
j ji1
which I use to construct a regex expression using a stored procedure in Sql.
Using the above substitution, the word "tour" becomes
/[t7\+][\\W]*[o0óòôøõö(\(\))\*\.][\\W]*[u][\\W]*[r]/gi
Essentially, I would like to find any form or shape of the word "tour", even if there is punctuation or white space between the letters.
Is the specific resulting regex that I'm using in the correct syntax to trap these sort of occurances? If not, please write how I should change it.