Get rid of those database constraints

Interesting report from John Lam on a presentation by Werner Vogels (of Amazon.com) given at Middleware 2004. 

"They’re also pushing databases beyond what they are capable of. Amazon.com E-Bay does not maintain integrity constraints in the database – they’re maintained at the application layer. They don’t maintain indexes in the databases; instead lookups are done in Berkeley DB indices since lookup speeds are an order of magnitude faster there than on a relational database. Effectively, Amazon E-Bay is using their expensive database system as a transactional file system!"

Note: Amazon has been changed to E-Bay per the information from Werner himself in the comments of this post.  Check out his post on How Databases Used at Big Customers for more info...

 

# re: Get rid of those database constraints

Monday, October 25, 2004 1:07 AM by Werner    
Actually, John misqouted me, the example of the databases was from eBay, not Amazon.com. That data was from a talk by James Barese at HPTS last year.

See <a target="_new" href="http://weblogs.cs.cornell.edu/AllThingsDistributed/archives/000280.html">http://weblogs.cs.cornell.edu/AllThingsDistributed/archives/000280.html</a>

# re: Get rid of those database constraints

Monday, October 25, 2004 2:14 AM by Steve    
Thanks Werner, I've updated the post to reflect this.

# re: Get rid of those database constraints

Monday, October 25, 2004 2:19 AM by Frans Bouma    
I don't see why indexes in another db are faster, as the RDBMS which needs the indexes has to have access to them through is own statistics engine.

Also, re-implementing FK constraint logic in your own code is odd, as the RDBMS engine is properly tested on that in all circumstances, like multi-threaded inserts. Having these in the app layer can be dangerous to the database contents, as checks have to be 100% bugfree in ALL circumstances.

And who says these checks are faster?

# re: Get rid of those database constraints

Monday, October 25, 2004 2:26 AM by Frans Bouma    
the only reason I can think of is distribution of the pressure of logic on the systems: if you centralize all data-oriented logic on the database servers, it will be a bottleneck, however if you filter on static sets of data (semi-static, updated every x seconds for example, in memory) on various systems you have distributed the burden already for a great deal.

But in that light: is it really possible to compare these systems with normal relational databases?

# re: Get rid of those database constraints

Monday, October 25, 2004 3:03 AM by Mark Bonafe    
Hey, let's just go back to FoxPro or Clipper! What a step backwards, on so many levels! FK constraints on the application layer? I don't see - at all - how that can be faster or more reliable. No indexes??? Why not just write your own proprietary flat file system and be done with it?

# re: Get rid of those database constraints

Monday, October 25, 2004 4:56 AM by John Lam    
An interesting part of Werner's talk was about how large customers like eBay and Amazon don't represent &quot;typical customers&quot; to the DB vendors. While you might think that having eBay/Google/Amazon/Yahoo as a major customer would give you major crowing points, solving *their* DB scalability problems doesn't really help your mainstream customers who are running much smaller shops. So effectively those guys are off on their own.

# re: Get rid of those database constraints

Monday, October 25, 2004 5:19 AM by Steve    
That's what's so interesting about all this. They are so different then everyone else that they're forced to use a database in ways which if any of us recommended for our &quot;normal clients&quot; we'd probably be shown the door.

# re: Get rid of those database constraints

Tuesday, October 26, 2004 6:07 AM by Brad Wood    
&quot;Effectively, E-Bay is using their expensive database system as a transactional file system!&quot; - How in the heck can that be faster / more efficient than a straight file system? It would seem that in this case the database is just a wasted layer on top of the file system and would almost certainly add to the time necessary to fetch data...?

# re: Get rid of those database constraints

Wednesday, October 27, 2004 6:21 PM by Joe    
You have to think about typical use cases for Amazon and eBay (and any web-sites that sell 'stuff'). There are far more reads than writes.

So optimizing for write isn't your concern (which is where the Oracles of the world tend to shine). While you may have a backing store built on a relational model, dumping into a cache/flat-file will gain much more speed than driving through a DB designed for higher write usage. You can basically segregate your common write mechanisms (shopping carts) from your read mechanisms (products) with correlation with product IDs ... and bingo (a tad simplified).

Compare this to, say, investment/banking shops that need to transact high volumes of writes.

eBay is a bit more interesting since while reads outweight writes, users are responsible for the content, so writes aren't as small as typical commerce sites (ratio-wise). I can certainly understand needing to customize this from what a typically DB offers.

For these kinds of systems you can envision a system that basically writes into the db, which triggers an automatic update to a more read-only version (probably batched as appropriate to balance latency of user experience with 'db' load).

Bottom-line, you never know what you really need to do for your system until you have had a million people hammer it into submission and you can see the weak-spots.

Post a Comment

 
 
Prove you're not a spammer: 
7 + 7 =