« Return to Thread: [scala] Lift and Goat Rodeo
Great Manifesto.
Skewing the trend towards the Enterprise Business needs perspective (slightly different than the social network application).
A System Of Record is a system designed to maintain data. Think RDB, form frameworks, tables, data integrity, ACID, O/R frameworks, etc. J2EE, and similar frameworks are geared for System Of Record applications. Setup a SKU, coupon, contract, price list, campaign. Data maintenance.
Its not too far of stretch to say commercial vendors, and most frameworks are very System Of Record oriented. Need to build a new app to maintain X. Install RDB, app server, select O/R mapping and GUI form framework, place warm bodies in front of drag and drop IDEs and go for it.
A System Of Service must service 1,000s of requests per second in millisecs, 24/7/365 with 99.99 % reliability e.g. a pricing service. The business logic has to run in microseconds. Your favorite O/R mapping framework hasn't even initiated a JDBC call, heck hasn't even allocated a connection from the pool and its already exhausted its 1 ms in allotted time.
An item may undergo a few 10s of price changes a year on the System Of Record, yet that item's price may be served 100,000 times for each change on the System Of Service.
Systems Of Record deal with lots of meta-data associated with maintaining the core data. I might need only 30-50 data elements to determine a price, however, the System Of Maintenance has 100's of data elements in dozens of tables for data associated with SOX, security, versioning, authentication, approvals, etc...
Data on a System Of Record should be in Boyce-Codd normal form. Data on System Of Service should be structured in whatever way is necessary to achieve the workload, think denormalized data structured primarily for update workload and secondarily by read-only workload.
There are lots of options for building a System Of Record. Only the Amazons, Facebooks, LinkedIns, and Googles have solutions for building Systems Of Service. Your average Joe-Sixpack enterprise has few options out there to build Systems Of Service. Enterprises need to bring their mashable corporate API out onto the internet. To offer that API they need a System Of Service to implement it. But there are no off the shelf solutions.
Systems are rarely (would be nice) both the System Of Record and the System Of Service. Lets define what a System Of Service looks like.
1. A System Of Service shall run on a commodity box cluster.
2. A System Of Service shall support "hot" code changes (business logic).
3. A System Of Service shall be "consistent" in its answer.
4. A System Of Service shall be capable of incremental, near perfect horizontal scale out.
5. A System Of Service shall support failure.
6. A System Of Service shall support maximum performance via local resident data.
7. A System Of Service is session stateless.
8. A System Of Service is fed the data necessary to perform its function from a System Of Record.
9. Any server member of a System Of Service shall handle an update request from a System Of Record.
10. A Client of, or a System Of Record for, a System Of Service shall not observe a distinguished member of the service. Any server shall be able to handle any request.
Clusters
Item #1 is just a given in today's world. Big mid-range boxes just don't make sense. The amount of pure horse power available on some Intel 64 bit commodity servers boggles the mind.
No Down Time, Micro Deployment And Provisioning
Current J2EE application servers are HUGE one size fits all monolithic entities. What is needed is a small framework capable if incremental functionality for what is needed by the application. The next generation application server will be a small, robust, OSGi framework server, which is configured to meet the needs of the application. Need JPA, ESB, BPEL, Messaging, Batch processing, Transactions, Paxos, Key Storage, Servlet, COMET, HTTP, RestLet, SOAP, XMLRPC, EDI, etc, just select the needed services for installation in the OSGi framework and create a customized application server for the specific requirements of the application.
If done correctly, one can micro hot deploy new versions or releases of the various modules, including your own OSGi modularized business logic.
Consistent, Scalable, Robust
Items #3,4,5,6,7 are really the key issues.
Computation is relatively easy to scale out. More boxes, more instances of executing code, even stateful applications aren't too bad with simple server affinity capabilities. Data scale out is the problem, specifically data mutation. By definition a System Of Service is primarily a service that operates upon mostly read-only data. The service may serve 10,000 prices for every one price change, but price changes must be supported, and there has to be consensus within the cluster on state changes (data mutations).
Item #5 means data must reside in multiple locations. This is satisfied by a Dynamo/Cassandra/Voldemort KV storage system, but item #6 is stronger, it requires all data to be co-located on all servers. Items #3 and #7 state a client may be serviced by any arbitrary member of the cluster and receives a consistent answer. But item #7 says state (data) is being mutated by an external agent.
One way to achieve the above set of constraints is to treat the entire cluster as a state machine. The cluster is in some state S and transitions to a new state S' on updating of state. If EACH member of the cluster applies the same globally ordered transformations then each server will provide the same consistent answer modulo latency.
This is the consensus problem. One solution to the consensus problem is the Paxos algorithm. Zookeeper uses a version of Paxos to achieve consistent binding of hierarchical Key-Values across a cluster. See "Paxos For System Builders", I am pretty darn sure it is the original paper used by the original Yahoo team that implemented Zookeeper. Zookeeper is Paxos without the ability to define Listeners.
If the cluster reaches consensus on which "command" to execute next and then each server in the cluster executes said command, the cluster acts as a single monolithic state machine.
Its safe, in the sense that if consensus cannot be reached the system "fails" in its current state. i.e., it will continue to serve prices, but will not process price changes. In the face of failure, a cluster will make progress if a majority of nodes can reach consensus. A failed node will reconcile and resynch its state machine with the cluster upon rejoining.
2PC, 3PC and e3PC transactional systems degenerate versions of Paxos (some simplification of).
Dynamo like KV storage systems are substantial improvements over RDBs for Systems Of Service. However, one still has to "fetch" the data for each request (it may have just changed). Depending on the performance needs of a System Of Service _any_ cross network data fetch is too slow. Therefore data must be cached, staleness must be dealt with and complexity explodes.
At this point one just says lets just colocate (cache) all the data necessary to execute the service on each server and be done with it. And why not? A 16 even 32 gig server is nothing out of the ordinary these days and are quite capable of holding the equivalent of 100's of millions of rows of relational data in memory. This raises the consistency question, answerable via distributed cluster node consensus. Paxos.
Under the System Of Service model Dynamo-like KV storage systems serve as a reliable drop off zone for data from Systems Of Record, and State Checkpoints. These data quanta can be as simple as JSON / REST oriented data updates. (See http://project-voldemort.com/blog/2009/06/building-a-1-tb-data-cycle-at-linkedin-with-hadoop-and-project-voldemort/ for a similar approach.) A failed node or a joining node to a System Of Service must roll forward the last checkpoint executing each globally ordered command.
Ah, yes, time to get to the point of all this... Its the overlap to your manifesto. Zookeeper <-> Paxos for transactions. JOSH (Jason) Needs, Dynamo/Voldemort/Cassandra KV-Storage. Lots of common overlap on some core technologies.
I'll be creating a Git repo shortly with the start of a Scala based implementation of "Paxos for System Builders".
Dave, I think you are on the right path, if for no other reason, I've observed similar trends and reached similar conclusions. :)
A System Of Service app server is the next JBoss, the analogue of what J2EE is to Systems Of Record applications today.
RayOn Thu, Jun 18, 2009 at 3:19 AM, David Pollak <feeder.of.the.bears@...> wrote:
Folks,
At the end of the Scala Lift Off, after I finished my third beer, Martin Odersky came over to me and asked, "so, what's the future of Lift?"
I gave a hand-waving answer about the features for 1.1. But Martin is not a hand-waving kind of guy and I think I owe him and the other folks in the Scala and Lift communities more.
There's a lot more that's necessary for web app development than Lift, an abstraction to the HTTP request/response cycle, can provide.
Over the last couple of years, I've been noticing trends in web development, in the needs of my various consulting gigs, and in some other projects. It's clear to me that it's time for a unified data and data management model that goes beyond OR mapping and that is scalably transactional. I've put together a model that looks to the developer like STM but is backed with ZooKeeper and Cassandra. I've blogged about it at http://blog.lostlake.org/index.php?/archives/94-Lift,-Goat-Rodeo-and-Such.html
Just as my web framework manifesto was the genesis of what has become Lift, I hope that my notions and ramblings in this blog post will become concrete, usable code over the next few months and a solid platform for building the next generation of web systems over the next few years... all built with Scala at their core.
Thanks,
David
--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp
« Return to Thread: [scala] Lift and Goat Rodeo
| Free embeddable forum powered by Nabble | Forum Help |