Page 1 of 1

Need tips on how to speed up s9y with 1M+ entries

Posted: Tue Sep 09, 2008 2:06 pm
by gregman
Hi there,

I'm using s9y rather as a CMS not as a blog. For my new project I need s9y to handle a lots of entries (i.e. > 1.000.000). After I had the entries automatically inserted in the database of my localhost (by a script: inserting the data in the tables _entries, _entrycat, _entrytags, _permalink) s9y slowed down enormously. More precisely it couldn't display the startpage any more (for about 20 minutes, I guess, then I closed the window). First I thought, my local mysql server couldn't handle such an amount of data, but testing querys with the shell worked just fine (of course each query limited to 15 entries, as s9y does in standard config).

I'm sure, handling > 1 M entries with s9y is possible, if I can get rid of every db-query which is not necessary for my needs. But I don't know where to start. Any hints?

Thx
Greg

Posted: Tue Sep 09, 2008 2:46 pm
by Bodehase
Hint: you already had a look at the cache-plugin?

BTW: What the hell are you doing with 1 million articles?

Re: Need tips on how to speed up s9y with 1M+ entries

Posted: Tue Sep 09, 2008 2:57 pm
by garvinhicking
Hi!

S9y performs, depending on your list of event and sidebar plugins, quite a lot of SQL queries with several joins.

The joins hugely increase the memory requirements for the MySQL server. If you have 1.000.000 entries, let's say each entry is 4000 bytes long. Let's also assume you have a total of 50 categories (50 bytes per categories), and each entry is approx. assigned in 2 categories. Then we add a total of 3 tags (15 bytes per tag) to each entry, as well as one permalink (approx 60 bytes) for each entry.

This calculates as:

(1.000.000 * 4000) * (2 * 50) * (3 * 15) + (1.000.000*60)
= 18.000.060.000.000 bytes

of data that MySQL has to sifft through and sort through. Through code optimizations and indexes of MySQL, this is a lot less more in terms of RAM usage, but I'd still suggest to reserver about 1GB of RAM to the MySQL process, if the MySQL server is solely working on your s9y database and nothing else.

So, bottom line: s9y performs as fast, as your server and mysql configuration allows it to. You'd surely need optimizations in terms of MySQL RAM usage, variable handling, stack/thread sizes and mysql query cache size.

Best regards,
Garvin

Posted: Thu Sep 11, 2008 11:11 am
by tianyi
1 million entries?!
a mini wiki?

Posted: Thu Sep 11, 2008 10:55 pm
by gregman
Hi,

and thx for your input. A server with 1 GB RAM only for the MYSQL engine is not really achievable for me right now. Maybe I have to patch s9y code to remove the JOINS where not absolutely necessary (for my needs). I experienced that even my localhost (standard WAMP) can handle 250.000 entries quite well. Accessing an entry page performs as good as on my former blog with less 100 entries (I guess there are no JOINS in the mysql query?!?). The front page needs about 10 seconds to load.
tianyi wrote:a mini wiki?
Not really a wiki, but some kind of knowledge base.

Greetings
Greg

Posted: Thu Sep 11, 2008 11:27 pm
by garvinhicking
Hi!
Maybe I have to patch s9y code to remove the JOINS where not absolutely necessary (for my needs).
You can get rid of these joins if you remove the "extedned properties" plugin, and if you disable the usage of read/write permissions for logged-in authors in the frontend. That speeds up a lot.
I experienced that even my localhost (standard WAMP) can handle 250.000 entries quite well. Accessing an entry page performs as good as on my former blog with less 100 entries (I guess there are no JOINS in the mysql query?!?). The front page needs about 10 seconds to load.
This very much depends on the server you use. Even a simple Pentium3 with 1GHz can outpower a Pentium4-Quadcore server that hosts 100 webs instead of just one.

HTH,
Garvin

Posted: Sun Sep 14, 2008 1:28 pm
by gregman
Hello again,
garvinhicking wrote: You can get rid of these joins if you remove the "extedned properties" plugin, and if you disable the usage of read/write permissions for logged-in authors in the frontend.
Well, I already read about this in other posts and made it so, but I needed some more speed tuning. I found the parameters $joincategories and $joinauthors of the function "serendipity_fetchEntries" and set their default to false. In order to have the sql query work properly, I also had to add an if-statement to the select_key:

Code: Select all

". (($joinauthors === true) ? ", a.realname AS author, a.username AS loginname, a.email" : "");
In order to get the category view still to work, I added a new parameter $nojoin with default = false to the serendipity_getMultiCategoriesSQL. Inside the fetch_Entries function, if $joincategories is set to false, the getMultiCategoriesSQL function will be called with the $nojoin parameter set to true:

Code: Select all

if ($joincategories === true) {
            $cat_sql = serendipity_getMultiCategoriesSQL($serendipity['GET']['category']);
} else {
            $cat_sql = serendipity_getMultiCategoriesSQL($serendipity['GET']['category'], false, true);
}
and inside the serendipity_getMultiCategoriesSQL function I made the sql-query dependent of the $nojoin parameter:

Code: Select all

if ($nojoin === false) {
                $cat_sql_array[] = " (c.category_left " . ($invert ? " NOT " : "") . " BETWEEN " . implode(' AND ', serendipity_fetchCategoryRange($categoryid)) . ')';
} else {
                $cat_sql_array[] = " title LIKE '%" . $serendipity[smarty]->_tpl_vars[category_info][category_name] . "%'";
}
Of course, the"nojoin" query only works, if you put the category name in the title of your entry (e.g. category: "Latin", title: "Latin: Pars Pro Toto"). If you can live with this limitation and don't mind to hardcode the author's name in your template files, these changes can speed s9y up a lot.

Greg

Posted: Mon Sep 15, 2008 4:04 pm
by judebert
How much is "a lot"? Is it working as expected on your 1M+ entries?

Posted: Mon Sep 15, 2008 5:46 pm
by gregman
Hi Judebert,

I reduced the amount of entries to 204500 for developing reasons and because of the primary problems. On my local WAMP, a sole entry loads in about 1.5 seconds (I didn't experience any differences between original and modified code here). Requesting the front page with the latest 10 entries requires about 13 seconds without modification. Turning only $joinauthors to false speeds it up about 1.5 seconds, turning only $joincategories to false makes it about 2.5 seconds faster. If I set both parameters to false I get a loading time of about 3.5 seconds, i.e. a saving of 9.5 seconds. So the modification makes my "blog" 3.5 times faster.

Regards
Greg