Clustering LiveJournal Take 2

Differences from Revision 1

The following are the major differences from revision 1 of our clustering plan.

No clustering of web slaves; no backhand redirection

This is the main difference. We're only clustering databases. This means we don't need the backhand redirector machines to look at URIs and redirect requests to the right pool of webslaves. And this also means we can still have premium faster paid servers.

No RPC between clusters

Each webslave will talk directly to the DB it needs to using DBI, rather than doing some HTTP wrapper kludge. The point of the RPC wrapper before was to prevent the cluster master DBs from having five billion connections from a half billion web slaves. But really, MySQL handles insane numbers of connections anyway (if thy're mostly idle, as they will be). If we need to serialize requests later between a smaller number of db connections, we'll just do that, making each machine have a pool of connections they have to share.

Why all idle you say? Well, web slaves continues to grow over time, but we limit the number of users/traffic/load per db cluster. So divide. Each web slave will eventually get a master connection, and the master traffic is fixed, so over time, a smaller number of those connections will be active. When it gets too extreme, we either cluster web slaves or make the DB connection pool. But we can deal with this later. Both solutions are easy enough, but they're boring to care about now.

Cluster Tables

Tables that can be found on each cluster are as follows:



talk2
talktext2
talkprop2
log2
logsec2
logtext2
logsubject2
logprop2
syncupdates2
userbio
talkleft

These tables will replace the tables on the original master server with the similar names (i.e. without the 2 appended.) userbio is the same.

Currently, a user can conceivably be on either the original master database or in one of the myriad clusters available. To detect what the case may be, examine the clusterid element of the user's entry in the user table. If clusterid == 0 then the user is located on the old master database and their data needs to be loaded from the old tables; otherwise, the data is located on cluster #clusterid using the new table names above.

For future expansion, the element dversion is also added to the user table. If dversion == 0 then the user is not on a cluster, i.e. they're on the original master system. dversion == 1 implies that the user is located on a cluster. As more of the user's data is moved to the clusters, dversion will increase. Note that any dversion >= 1 means the user is on a cluster. The plan is for higher dversion numbers to indicate that more per user data is moved from the original setup to the clustering system.

Conversion from dversion 0 to dversion 1 will be a lazy conversion. This involves the READ_ONLY capability code. Basically, the user's READ_ONLY capability bit will be set and then the code will pause for a minute or two to allow any pending transactions to go through. After this, all data will be copied from the old database system into the appropriate cluster. After everything is copied, the data is deleted from the old system and the users's READ_ONLY capability bit is toggled off.