Scaling SQL Server 2005 & National TV (USA, Canada)

I’ll be on America’s most wanted this weekend,  with over 10 million viewers i’m worried that my site won’t stand up to the load.  supposedly its a high impact interview. 

I am also expecting an extra 100 to 300 million pageviews a month as we head into the peak season which will put me over 1 billion monthly pageviews.  I don’t think i’ve ever been so busy as i’ve been in the last few days!  When it comes to databases I keep wishing i’d created something as simple as a blog network or social network.   I’m sure the database people at other dating sites know exactly what i’m talking about.

 I am working on getting my 60 Drive 15000 RPM SAN up and running tomorrow and adding 2 Dual Core Quads with 64 gigs of ram each in a fail over cluster setup.   This will give me extremely high redundancy for the master database and then allow for scaling out of CPU intensive queries over slave databases.  I hope this will solve my master database issues till i’m in the 6 to 10 billion monthly pageviews.

Now i’m running SQL Server 2005  which allows paritioning of tables across drives.    So i’ll be looking at spreading out things like my messages table over 5 drives/arrays as well as the indexes.  Currently no single array is fast enough.  Maybe someone from microsoft could post a comment on real world use/ best practices?

As for webservers,  i’m aiming at 1 webserver (IIS 6)  per billion monthly pageviews, so no real issue there and scaling that is a simple matter of doing round robin DNS.   I’ve also got 3 web servers that handle my Instant messenger polling/syncing etc.  Combined they handle between 100 and 500 million pageviews a day between them depending on what settings i use.   They max out at 5 billion pageviews a month per server,  but at only $1k/piece who cares.

31 Responses to “Scaling SQL Server 2005 & National TV (USA, Canada)”

  1. Stuart Padley Says:

    Markus, I have sent your request into the engineering team that knows. Hopefully someone will respond shortly. Let me know if they don’t and I can follow up.

    Good luck this weekend.

  2. Mike Says:

    “When it comes to databases I keep wishing i’d created something as simple as a blog network or social network. I’m sure the database people at other dating sites know exactly what i’m talking about.”

    I can’t help but laugh when I read that. Markus, care to fill us in on exactly what you feel is so complex about a dating site? As far as database complexity is concerned I would rate it as being on the “very simple” end of the scale.

    As well do you plan on implementing any fail-over redundancy on your web servers? Round robin DNS works okay for load balancing but it of course isn’t designed to handle fail-over.

  3. Christian Guirguis Says:

    Hi Markus,
    Just wanted to chime in here. Please take a look at
    http://msdn2.microsoft.com/en-us/library/ms177411.aspx
    and
    http://msdn2.microsoft.com/en-us/library/aa178389(SQL.80).aspx

    for various tips and best practices on how to handle the expected IO bottlenecks.

    I’m sure more people will chime in with their experience as well.

  4. Markus Says:

    Mike, you are right database complexity of a dating site can be extremely easy. But there is no way in hell you would be able to scale it. All those dating site packages you can buy collaspe when you have more then a few thousand users.

    Match.com Yahoo personals, americansingles and others aren’t spending millions a month on tech because its “easy”. Creating a dating site that is used by thousands is totally different from a site used by millions.

    If you compare costs between dating and social networking sites, dating sites are spending on average 10 to 50 times more on tech costs per pageview.

  5. Norman Says:

    “I’m sure the database people at other dating sites know exactly what i’m talking about.”

    I realize the border between arrogance and confidence is thin, but come on Markus.

    Be honest. You’re not solving the Mount Everest of DB problems. Unless you are, in which case you should explain it in technical terms.

    And only two DBs, when each has 8 cores, 64GB memory, and a large SAN attached is not exactly impressive scaling.

    You wouldn’t be hittng disk I/O issues if you were using an all-in-memory MySQL cluster or extremely efficient caching.

    Memcached isn’t in your setup at all? Ouch.

    RR DNS also won’t distribute load as evenly as you might want. Failover also sucks also as Mike pointed out. Scares me to imagine how Windows handles IP-take over (and your switch).

    I recommend a couple Foundry SLBs to save yourself some serious pain on the RRDNS front.

  6. Mike Says:

    Markus, you still haven’t answered my original question:

    What makes dating sites so complex in the database department?

    You clearly explained that scaling to billions of page views per month is no small feat. I think we all understand that no matter what you are serving. But what makes dating sites (or POF specifically) so much more complex to scale than social networking sites or blog sites?

  7. Anatoly Lubarsky Says:

    Make wide use of filegroups.

    Indexes, data, should be on separate filegroups and filegroups should be spread on physical disks. The same is about each separate partition of data.

    Also logs should be separated on separate disk arrays.

  8. Chris Says:

    Mike – complex searches involving heavy queries and scans over millions of rows of data – date that you cannot index effectively. This simply doesn’t happen on a blog.

  9. Markus Says:

    I’ll try my best, but its like google trying to explain why they need a few hundred thousand servers for only a couple of thousand queries per second. Its hard for people to understand unless they see the code or have done something similar.

    1. A blog network is nothing more then a forum with a different front end. I have a forum with over 6 million posts and a ton of pageviews a day. Everything is a clustered Index seek, no real issues.

    2. A social network is a dating site without search functionality. Friendster was a social network that tried to include search ie friend of friends and they got totally killed. Again social networks for the most part are clustered index Seeks. The list of your friends on myspace/facebook is a little more intensive but not much.

    3. A dating sites complexity is in the search, resources needed grow expotentially. You do index scans AND order by logindate a field that is constantly updated 10 to 200 times per second. You can’t break up your tables, or use things like memcache, and all data has to be conncurrent and real time and is unique on nearly every single pageview and unique PER user. You also search based on user preferences, and the preferences of the user that may appear in the search. You end up with 7 to 15 predicates per search and index scans over millions of records.

    As for a SAN being overkill that is nuts. I’ve got 16 SCSI drives in my current machine, ALL IO maxed. Should the site continue to grow how on earth could I replicate out data from my database to other machines AND maintain a master up to date copy of the DB? There isn’t a single website this size in the dating/social networking space that doesn’t make use of a SAN, you simply have no other options. Myspace is using those + Solid State drives!

  10. L Says:

    Hey Markus –

    You can actually easily break up the tables. It is actually a very simple architecture, and can scale pretty well.

    Might want to look into it.
    🙂

  11. Mike Says:

    Markus, thanks for the explanation.

    “You can’t break up your tables, or use things like memcache, and all data has to be conncurrent and real time and is unique on nearly every single pageview and unique PER user.”

    I don’t buy that for a second though. From what I can see, the hardest part as far as the database is concerned is searching for users within a given proximity, X miles from the person doing the search. The thing about this though is the data that needs to be searched isn’t that large, especially if you split the searchable data from the non-searchable data, it could easily be stored entirely in memory.

    Lets take 5 million profiles and currently assume there are 20 possible fields that can be searched (POF currently has 21 in its advanced search). If each of the 20 fields was approx. 8 bytes (probably on the high side if you consider many fields are just small integers) you have about 160 bytes per profile.

    160 Bytes per profile * 5,000,000 profiles = 762.93 megabytes of total searchable data.

    Add some overhead and room for growth, lets call it 2GB. Considering you can only search for one sex at a time, you could have each sex on different servers/clusters for further scaling. Even though there may be a 2:1 ratio of guys to girls, that works out quite well since you probably have more searches over the smaller dataset (women). If you limit searches to specific countries you could again split the data between servers. Put all US profiles on one server, and every other country profiles on another, or however your dataset happens to work out load wise.

    Once the initial search is done, everything else can be load balanced across servers however you wish. Profiles themselves can be cached, sending messages to/from users can EASILY be split onto different database servers like A-M profile messages on one server, N-Z on another.

    Did I overlook anything?

  12. Mike Says:

    What I was trying to get across in a earlier thread, but WordPress cut it off was, in PostgreSQL at least, you can do queries like this:

    select ( dblink_query(select * from profiles where distance

  13. Mike Says:

    Note: Damn WordPress.

    What I was trying to get across in a earlier thread, but WordPress cut it off was, in PostgreSQL at least, you can do queries like this:

    select ( dblink_query(select * from profiles where distance lt 100, server*1* )
    UNION
    select ( dblink_query(select * from profiles where distance lt 100, server*2* )
    UNION
    select ( dblink_query(select * from profiles where distance lt 100, server*3* ) order by distance,login_date;

    The DBLINK queries can be sent to other servers asynchronously, so besides the time it takes to combine the results you can scale quite well to 5-10 different servers without really increasing the query time.

    I’m not sure, does MSSQL offer anything close to this?

  14. Bisi Says:

    Markus,
    I would like to know what kind of money you spend on bandwidth and how much more do you expect o spend in the future . I am guessing that bandwith is your most expensive monthly expense .

  15. Norman Says:

    Markus, I appreciate you explaining further. A lot of people benefit from your detailed posts, since even those of with large sites don’t have your specific site, or (frequently) the complete autonomy you have.

    Google needs hundreds of thousands of servers because it indexes the entire internet as quickly as possible and lets you query it. That’s a “Mount Everest” type problem.

    MySQL Cluster is all-in-memory and shared-nothing, meaning no SAN involved. Just because a lot of spindles are necessary does necessarily mean a SAN is.

    Why can’t you have separate DB’s for each major region, and some “search further” functionality which can do cross-region DB queries built into your app code?

    memcached should be able to save your DB’s a lot of busy work in other places. How about sessions read-caching for example or is that already out of the DB?

    It seems like you’re insisting that everything be absolutely real-time and up-to-date. It’s not reasonable to have 30 second delays in certain areas that are not particularly user-noticeable, if you can think of those.

    What about caching search results with out preferences and then applying preferences as a filter in the app later?

    What about using something like Xapain for searching certain parts of your data?

  16. Ty Says:

    Isn’t America’s Most Wanted the show where they look for fugitives?

  17. Yo Baby Says:

    Hi Markus, great info! Thanks for sharing.

    I see someone’s already suggested this, but how about having two databases: one for boys, one for girls? The nice thing about the m/f one-bit attribute is it’s high cardinality — it immediately divides your dataset in two — where as there are many thousands of geographic locations. It is also immutable and you never search both databases at once. Seems ideal!

    It’s so obvious, you’ve obviously thought about and rejected it. Why won’t this work?

  18. Mayo Says:

    AMW, thats about that criminal you mentioned before??

    Congrats!!!!

  19. Mayo Says:

    Markus, be sure to capture the show and air it via YouTube or similar — AND post it here!!!

  20. G Says:

    How LiveJournal scaled from 0-50M pages/day:

    http://www.danga.com/words/2004_mysqlcon/mysql-slides.pdf

  21. David Evans Says:

    Interesting statements Markus. I’m the first to admit I don’t know a lot about large scale databases, but I’m hearing that your way of doing things is one of several. Not sure where the whole “dating is hard blogs are easy”, you have no idea what you’re talking about, not that that ever stopped you. Most of the blog network stuff is people-oriented. Having 50 editors all working on a system at the same time may sound easy, but then again, so is herding cats, if you know what you’re doing. My point is that blog networks are a whole different kind of complexity. You stick to the databases and I’ll keep working with humans.

    I find this database and scaling discussion fascinating, thanks for sharing your experience with us. Funny how this has turned into a tech forum.

    PS, you’ll have 50 machines soon enough😉

  22. Mayo Says:

    Markus,
    You have two ways:

    1) Don’t know if your using Microsoft Cluster Services, i have just been reading articles about clustering on my local SciComp mag — have you investigated this path?

    2) OR you can use a MAINFRAME😉 here is a link to techworld:

    http://www.techworld.com/features/index.cfm?RSS&FeatureID=3018
    😉

  23. John C. Moore Says:

    Was the traffic from the show what you were excpecting? Did your site handle it ok?

    It was like you had an advance “storm warning”….
    Good job on AMW.

  24. Eric Stone Says:

    I will run your whole site as a service — you are nuts to run MS-SQL the way you are. If you want some help let me know.

  25. Lindsey Says:

    Plesae check out our SQL Storage tips: http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/storage-top-10.mspx

    For partitioned tables, you can put each partition on its own file group, and each file group on its own drive/LUN, but it is different from adding more spindles to a LUN/drive. For partition and performance tips, please see SQLCAT team blog:
    http://blogs.msdn.com/sqlcat/archive/2005/11/30/498415.aspx

  26. SAN for the common man Says:

    Markus, we were glad to be of service.
    Congrats again!

  27. Rush Says:

    I don’t know if some people just love to hate or they are really providing constructive criticism. Either way I don’t think anyone else here has built a bigger site than yours so keep up the good work.

  28. joe Says:

    We dare to be diffrent, No more hassle rebillings,
    Bemygal.com delivers service. The way it suppose to be!
    users that never log in are weeded faster than your backyard weedwacker! Meet people,post,chat,upload as you like.
    What are you waiting for?
    check it out.

  29. zxevil160 Says:

    QHRK1T U cool ))

  30. ZAREMA Says:

    Thanks the author for article. The main thing do not forget about users, and continue in the same spirit.

  31. analia Says:

    :{}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: