advice on updating tera-wurlf

The API formerly known as Tera-WURFL
notalifeform
Posts: 15
Joined: Tue Nov 15, 2011 2:34 am

advice on updating tera-wurlf

Postby notalifeform » Wed Feb 08, 2012 2:19 am

Hi,

We are currently using the database API for device detection. Upgrading the wurfl xml file takes about 5 minutes, which is slow, but I read it is expected, if you have a big cache. I'm afraid (didn't do measurements yet though) that the service will have significant reduced performance during upgrades.

Should we counter-measure this? What would be the proper/easiest solution?

regards,

Robert

notalifeform
Posts: 15
Joined: Tue Nov 15, 2011 2:34 am

Re: advice on updating tera-wurlf

Postby notalifeform » Wed Feb 08, 2012 4:57 am

a quick update: this is log of our latest update, the number look worrying to me...

Wed, 08 Feb 2012 08:04:29 +0100 [ip-10-48-46-127 4149][Tera-WURFL] WARNING: Loading patch: web_browsers_patch.xml
Wed, 08 Feb 2012 08:04:37 +0100 [ip-10-48-46-127 4149][Tera-WURFL] WARNING: Loading patch: wurfl_patch.xml
Wed, 08 Feb 2012 08:04:37 +0100 [ip-10-48-46-127 4149][Tera-WURFL] WARNING: Loading patch: wurfl_patch_20111117.xml
Wed, 08 Feb 2012 08:04:37 +0100 [ip-10-48-46-127 4149][Tera-WURFL] WARNING: Loading patch: wurfl.xml
Wed, 08 Feb 2012 10:25:39 +0100 [ip-10-48-46-127 4149][Tera-WURFL] WARNING: Finished loading WURFL db.scientiamobile.com - 2012-02-05 00:00:01 (Sun Feb 05 00:01:53 -0500 2012) in 8474.06 seconds

Our wurfl-stats:
MERGE

TeraWurflMerge Rows: 15893
Actual Devices: 7313
Table Size: 9.14 MB

INDEX
TeraWurflIndex Rows: 15893
Table Size: 435.54 KB
CACHE
TeraWurflCache Rows: 5424
Table Size: 95.55 MB

We currently run tera-wurlf on a amazon large machine using RDS(large).

regards,

Robert

kamermans
Posts: 393
Joined: Mon Jun 06, 2011 9:50 am

Re: advice on updating tera-wurlf

Postby kamermans » Wed Feb 08, 2012 9:52 am

Hi Robert,

The logs you've posted do indeed show that the update process is taking exceptionally long to complete. Even on legacy hardware, like Amazon is using, the cache rebuild process should be able to recache devices at a rate of over 150/sec, so 5500 cached items would take about 35 seconds to complete.

I would recommend two things. First, edit TeraWurflConfig.php and use the Capability Filter option (near the bottom of the file) to specify only the capabilities you're interested in. This will significantly reduce both the memory consumption and the cache rebuild time, since instead of determining all 500+ capabilities for each device, only the ones you've specified will be loaded. I suspect that after you make the first change, the performance will be lower than expected, in which case I would clear your cache before an update. I prefer updating the WURFL from the command line, in which case you can do this in one step:

Code: Select all

php ./admin/cmd_line_admin.php --clearCache --update=remote
I'll do some performance testing on one of our Amazon EC2 Large instances as well - there may be a problem with the hardware your instance is running on. I've run into issues before on Amazon where an instance was considerably slower than it's clones, so I cloned that instance, and the new one was much faster - this may be an option for you as well.
Thanks,

Steve Kamerman
ScientiaMobile

Make sure you check out our WURFL Cloud, WURFL InSight and WURFL InFuze products!

notalifeform
Posts: 15
Joined: Tue Nov 15, 2011 2:34 am

Re: advice on updating tera-wurlf

Postby notalifeform » Wed Feb 08, 2012 4:18 pm

Hi Steve,

Thanks for your response - the command-line way is much more convenient indeed. I just did some tests with the database dump of teraWurlf of my production machine on both my mac and a vps (see below) - it seems to show that it is a "hardware" issue indeed. Another difference is that the tera-wurlf instance on my production machine was actually in use (although not in a very high rate, 1-2 detections per second) . The cpu of the Amazon machine was below 10% and RDS was at 50% (interesting.. why?)

I will also try to compare this with an amazon machine with a local mysql and with a machine that uses RDS (= Amazon hosted mysql). I find it hard to believe that amazon RDS is 60 times slower than an VPS with an onboard mysql though .. in our load tests we saw a bit less perfomance 5-10%...

anyway.. thanks for looking into this and I will keep you posted on the progress.

regards,

Robert

== MacBook Pro Core2 Duo 2.4 Ghz

~/tmp/teraWurfl % php ./admin/cmd_line_admin.php --update=remote

done (/Users/robert/tmp/teraWurfl/data/wurfl.xml: 18.52 MB [1.12 MB compressed])
Downloaded in 3.6626901626587 sec @ 2.57 Mbps


Database Update OK
Total Time: 163.92988085747
Parse Time: 5.0391669273376 (TeraWurflXMLParser_XMLReader)
Validate Time: 0.1769711971283
Sort Time: 2.0463089942932
Patch Time: 0.0042898654937744
Database Time: 6.8029239177704
Cache Rebuild Time: 149.86021995544
Number of Queries: 27063
PHP Memory Usage: 110.35 MB
--------------------------------
WURFL Version: db.scientiamobile.com - 2012-02-05 00:00:01 (Sun Feb 05 00:01:53 -0500 2012)
WURFL Devices: 15850
PATCH New Devices: 40
PATCH Merged Devices: 1


================ VPS 2CPU Core2Duo 2.4 Ghz

Downloading WURFL from http://www.scientiamobile.com/wurfl/xxxxxxx/wurfl.zip ...

done (/var/www/play/framework/Tera-Wurfl/data/wurfl.xml: 18.52 MB [1.12 MB compressed])
Downloaded in 2.23243713379 sec @ 4.22 Mbps

Database Update OK
Total Time: 119.99671793
Parse Time: 4.33648180962 (TeraWurflXMLParser_XMLReader)
Validate Time: 0.00949001312256
Sort Time: 1.68331980705
Patch Time: 0.0770862102509
Database Time: 5.27369618416
Cache Rebuild Time: 108.616643906
Number of Queries: 27063
PHP Memory Usage: 107.51 MB
--------------------------------
WURFL Version: db.scientiamobile.com - 2012-02-05 00:00:01 (Sun Feb 05 00:01:53 -0500 2012)
WURFL Devices: 15850
PATCH New Devices: 43
PATCH Merged Devices: 720


=====================

kamermans
Posts: 393
Joined: Mon Jun 06, 2011 9:50 am

Re: advice on updating tera-wurlf

Postby kamermans » Wed Feb 08, 2012 5:03 pm

I didn't realize you were using RDS - that's probably the issue. When Tera-WURFL loads the WURFL, it splits the whole data file into large chunks and inserts them in bulk, so it might only take 10 or 20 queries to load the WURFL data, but when it rebuilds the cache, each cached item may result in 10 or more queries to recache, so the latency between your server and RDS is very important during the rebuild process. It's very likely that a local instance of MySQL has 60x lower latency in my opinion, since there is no network latency at all.

You might even consider using a local MySQL install for WURFL and RDS for your data, or just clear your cache in RDS before a reload.
Thanks,

Steve Kamerman
ScientiaMobile

Make sure you check out our WURFL Cloud, WURFL InSight and WURFL InFuze products!

notalifeform
Posts: 15
Joined: Tue Nov 15, 2011 2:34 am

Re: advice on updating tera-wurlf

Postby notalifeform » Thu Feb 09, 2012 5:01 pm

Hi Steve,

I did some more tests; here are the results:

Code: Select all

MacBook Pro Core2 Duo 2.4 Ghz, local mysql 5.1                                  - 163.9
VPS 2CPU Core2Duo 2.4 Ghz, local mysql 5.1                                      - 119.9
1) EC2 VPS Large, local mysql 5.1                                               - 156.6
2) EC2 VPS Large, RDS mysql 5.1, same AV (availability zone)                    - 206.3
3) EC2 VPS Large, RDS mysql 5.5, same AV                                        - 152.4
4) EC2 VPS Large, RDS mysql 5.5, same AV - add 1 req/sec extra, one UA          - 155.4
5) EC2 VPS Large, RDS mysql 5.5, same AV - add 1 req/sec extra, unique UA       - 161.1
6) EC2 VPS Large, RDS mysql 5.5, other AV                                       - 187.6
7) EC2 VPS Large, RDS mysql 5.5, other AV , production mysql, other db(schema)  - 219.1
8) EC2 VPS Large, RDS mysql 5.5, other AV , production mysql, prod db (schema)   - **** stopped manually after 10 mn, RDS load of 50%
at the time of testing the production system ran a bit less than 1 tera-wurfl request per second (through web-service)

So I think I can draw these conclusions:
  • its not RDS that is causing this (1) vs (2)
  • the mysql version is not the cause(2) vs (3)
  • the availablity zone is not the cause (5) vs (6)
  • putting some load on the system doesnt cause it (3) vs (4)/(5)
  • my production RDS is not "fully loaded" (7)
I have to do some thinking what might be causing this.... some locking issue springs to my mind (maybe my test-load was not representative enough?)
Anyway - any ideas where to look next would be appreciated.

thanks,

Robert

kamermans
Posts: 393
Joined: Mon Jun 06, 2011 9:50 am

Re: advice on updating tera-wurlf

Postby kamermans » Thu Feb 09, 2012 5:06 pm

Thanks for providing the test results, Robert. Can you send me a zipped dump of your MySQL DB so I can see what's going on? My email is steve@(this domain).
Thanks,

Steve Kamerman
ScientiaMobile

Make sure you check out our WURFL Cloud, WURFL InSight and WURFL InFuze products!

kamermans
Posts: 393
Joined: Mon Jun 06, 2011 9:50 am

Re: advice on updating tera-wurlf

Postby kamermans » Fri Feb 10, 2012 8:43 pm

Hi Robert,

I’ve solved the problem after much investigation!

When you dumped the database, you didn’t dump the procedures, so when it’s imported into another MySQL instance, the tables are created with the UTF8 charset (this is specified per table in the dump), but since the stored procedures are not there, Tera-WURFL creates them for you with the default charset of PHPs connection to your DB (in my case latin1/latin1_swedish_ci).
Since the default charset and collation of your database tables don’t match the PHP connection’s charset, the stored procedures are compiled to use a different charset and collation (latin1/latin1_swedish_ci).
The problem is the procedure TeraWurfl_RIS, which spends much of its time doing string comparison. For each of these comparisons, MySQL must coerce the incoming UTF8 variables to the procedure’s charset and collection, and then compares it to the user agent in the table, which also needs to be converted. This can happen 20-50 times per lookup (all in memory in MySQL). The coercion process is very expensive so it’s causing the lookups to be very, very slow, and in some cases it can fail completely.

You can see a coercion failure by creating a db with charset ascii, collation ascii_bin and import your DB dump – the update process will fail with:
Error: Illegal mix of collations (ascii_bin,IMPLICIT) and (latin1_swedish_ci,COERCIBLE) for operation '='

In order to reproduce your problem, create a new database with charset utf8, collation utf8_general_ci (or utf8_general_bin) and import your DB dump. The update starts off fast, but around user agent 1000 in your cache, the user agents start getting really long, which means there are lots of character conversions, and the update crawls to a stop, in some cases processing less than one user agent per second. The update process only gets about half way done before my PHP timeout kills it at 1200 seconds.

Now, let’s prove that it’s the PHP connection causing the problem. Delete the database and re-import it as in the previous step. Now, edit DatabaseConnectors/TeraWurflDatabase_MySQL5.php and near line 524, right before the line “$this->connected = true;”, put this:
$this->dbcon->set_charset("utf8");

Now, rerun the update, and on my system, 113 seconds later, the database is updated and the cache rebuilt!

As far as I can tell, this bug only affects environments where the MySQL tables were copied from another environment without the procedures, and the PHP MySQLi Driver’s default charset is different than the MySQL default charset for the given database, in which case PHP forces the procedures to be created with the wrong charset, causing type coercion chaos. Note, that if the update fails, the cache will be empty, and re-updating it will recreate all the tables in PHP’s charset, effectively fixing the problem.

I’ll consider explicitly specifying the charset as described above, assuming there is no noticeable performance impact.
Thanks,

Steve Kamerman
ScientiaMobile

Make sure you check out our WURFL Cloud, WURFL InSight and WURFL InFuze products!

notalifeform
Posts: 15
Joined: Tue Nov 15, 2011 2:34 am

Re: advice on updating tera-wurlf

Postby notalifeform » Sat Feb 11, 2012 2:06 pm

Hi Steve,

Thank you for finding out the root cause of the issues we experienced and the thorough explanation. This was a hard one to crack! Next time I will make sure the dump my database including stored procedures.

To get to the original question of this thread: I think the answer is that the actual update takes less than 10-20 seconds and most time is spend by rebuilding the cache, which are basically 'normal' detections. So for now we will just do updates on the 'quiet hours' of our service.

regards,

Robert

kamermans
Posts: 393
Joined: Mon Jun 06, 2011 9:50 am

Re: advice on updating tera-wurlf

Postby kamermans » Sat Feb 11, 2012 3:01 pm

Actually, I don't think dumping the procs with the tables will help since they are always recreated when you update the DB, and the PHP->MySQL connection charset is to blame for the problem.

Changing the default charset in php.ini to the charset you use in your DB should fix the problem:
mysql.connect_charset = utf8
mysqli.connect_charset = utf8
pdo_mysql.connect_charset = utf8

I think you can also change it in a .htaccess if you use the web interface to update WURFL:
php_admin_value mysql.connect_charset "utf8"
php_admin_value mysqli.connect_charset "utf8"
php_admin_value pdo_mysql.connect_charset "utf8"
Thanks,

Steve Kamerman
ScientiaMobile

Make sure you check out our WURFL Cloud, WURFL InSight and WURFL InFuze products!

notalifeform
Posts: 15
Joined: Tue Nov 15, 2011 2:34 am

Re: advice on updating tera-wurlf

Postby notalifeform » Sat Feb 11, 2012 3:16 pm

Hi Steve,

Ok. I will make sure the charset matches.

thanks again,

Robert


Who is online

Users browsing this forum: No registered users and 1 guest