Detecting *all* emojis

asked8 years ago
last updated8 years ago
viewed12.1k times
Up Vote29Down Vote

Right now I'm using this piece of code :

public static bool ContainsEmoji(this string text)
{
    Regex rgx = new Regex(@"\p{Cs}");
    return rgx.IsMatch(text);
}

And it's being somewhat helpful.

Most of them appear to be detected, but some aren't.

Here's a reference list to help : http://unicode.org/emoji/charts/full-emoji-list.html

All the smiley faces appear to be fine, but these specific emojis do not get caught by the Regex :

1920 U+2614 ☔ umbrella with rain drops

1921 U+26F1 ⛱ umbrella on ground

1922 U+26A1 ⚡ high voltage

1923 U+2744 ❄ snowflake

On the keyboard these are not close to each other, but in the list they are following each other, so I just assumed that there was a point where it would start not working in the emoji list, and it's not really verifying. From 1905 (weather-like emojis), going down, some are caught in the regex, some aren't. There does not seem to be any rule.

I can't afford to just go full ASCII because I need people to enter characters such as cyrillic, but I can't accept emojis specifically. I have no clue how to go forward from here.

I read the MSDN docs about surrogates high/low pairs, but at this stage this is very confusing to me, and I think some push in the right direction would go a long way.

Thank you very much for your time :)