We have recently been looking at the bot traffic hitting our sites and have noticed that there are a few bots sneaking passed our WUFRL detection. We are using the latest version of the definition file June 2, 2013 and v1.4.4.0 of the .Net API.
Here are the UserAgents that are not being correctly identified:
- Mozilla/5.0 (Linux; U; Android 2.2.1; ja-jp; SC-02B Build/FROYO) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 YJApp-ANDROID jp.co.yahoo.android.yjtop/2.1.5
- Mozilla/5.0 (Linux; U; Android 4.1.1; ja-jp; SC-03E Build/JRO03C) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 YJApp-ANDROID jp.co.yahoo.android.yjtop/2.1.5
- Mozilla/5.0 (Linux; U; Android 4.1.2; ja-jp; 201M Build/9.8.2Q-34_SMJ-102) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 YJApp-ANDROID jp.co.yahoo.android.yjtop/1.7.6
- Mozilla/5.0 (Linux; U; Android 4.0.3; ja-jp; F-05D Build/V09R32B) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 YJApp-ANDROID jp.co.yahoo.android.yjtop/2.1.2
- Mozilla/5.0 (Linux;u;Android 2.3.7;zh-cn;) AppleWebKit/533.1 (KHTML,like Gecko) Version/4.0 Mobile Safari/533.1 (compatible; +http://www.baidu.com/search/spider.html)
- Mozilla/5.0 (YahooYSMcm/3.0.0; http://help.yahoo.com)
- Mozilla/5.0 (Linux; U; Android 4.1.2; ja-jp; SO-02E Build/10.1.D.0.343) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 YJApp-ANDROID jp.co.yahoo.android.yjtop/2.1.5
- Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Exabot-Thumbnails)
- Mozilla/5.0 (Linux; U; Android 4.0.4; ja-jp; SC-03D Build/IMM76D) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 YJApp-ANDROID jp.co.yahoo.android.yjtop/2.1.5
- Mozilla/5.0 (Linux; U; Android 2.3.4; ja-jp; IS11T Build/FGK400) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 YJApp-ANDROID jp.co.yahoo.android.yjtop/2.0.5
- YahooCacheSystem
- WordPress/3.5.1; http://.* (different version numbers)
Also the Capabilities doc http://www.scientiamobile.com/wurflCapability says that is_bot bot will return default, is_bot, is_not_a_bot while it is currently returning me true or false? Which behaviour should I be expecting?
Let me know if you need any help with the above list and if you think you will be able to detect them in the future.
Thanks for the help.