Crawler/Bot false positives?

elrichar
Posts: 27
Joined: Fri Mar 23, 2012 1:15 pm

Crawler/Bot false positives?

Postby elrichar » Mon Jul 30, 2012 8:28 am

Configuration:
Java API: wurfl-1.4.0.1.jar
wurfl.xml version: 2012-04-29_wurfl.zip
*Note: Recreated issue on local box with latest library and definitions as well (wurfl-1.4.0.3.jar and 2012-07-29_wurfl.zip)

Issue:
We've been seeing a lot of collection coming in as 'generic_web_crawler' where we think they may be 'false positives' and aren't actual crawlers. We have subsequently added additional logging to track all the exact user agent strings that are coming up from WURFL as 'generic_web_crawler'. We need you to help us understand why so many matches are coming in this way (and we're guessing it isn't the desired behavior).

Example:
User Agent string: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB7.3; BTRS124290; .NET CLR 2.0.50727; .NET CLR 3.5.30729; ibrytetoolbar_browseforchange; .NET CLR 3.0.4506.2152)
Result using http://tools.scientiamobile.com/: generic web browser
Result using Java API: generic_web_crawler

This is one of many User Agent strings that are coming in as a 'generic_web_crawler', I can provide more if needed. At first glance there are a lot (if not all) that have some form of 'toolbar' in them (though that may not mean anything).

My thought is WURFL should detect this specific example as IE8, not a web crawler/bot. Please let us know what you think (and remember the problem is larger than this 1 example I'm providing).

kamermans
Posts: 393
Joined: Mon Jun 06, 2011 9:50 am

Re: Crawler/Bot false positives?

Postby kamermans » Mon Jul 30, 2012 8:44 am

Hi,

Thank you for your detailed post. We don't go to great lengths to identify crawlers in the APIs at this point, but you are correct that non-crawlers should not be identified as such. We will take a look at the API and determine the root cause. It may be helpful if you could provide us with a list of the user agents that you've seen detected as generic_web_crawler - would you mind sending it to us offline at wurfldb@(our domain name)?.
Thanks,

Steve Kamerman
ScientiaMobile

Make sure you check out our WURFL Cloud, WURFL InSight and WURFL InFuze products!

elrichar
Posts: 27
Joined: Fri Mar 23, 2012 1:15 pm

Re: Crawler/Bot false positives?

Postby elrichar » Mon Jul 30, 2012 8:57 am

Email sent.

kamermans
Posts: 393
Joined: Mon Jun 06, 2011 9:50 am

Re: Crawler/Bot false positives?

Postby kamermans » Mon Jul 30, 2012 9:00 am

Email received. We will take up the conversation via email until we've got a conclusion.
Thanks,

Steve Kamerman
ScientiaMobile

Make sure you check out our WURFL Cloud, WURFL InSight and WURFL InFuze products!


Who is online

Users browsing this forum: No registered users and 19 guests