分布式系统

Spanner: Google s next Massive Storage and Computa

2010年12月18日 阅读(460)

Spanner: Google s next Massive Storage and Computation infrastructure

MapReduce Bigtable and Pregel have their origins in Google and they all deal with large systems . But all of them may be dwarfed in size and complication by a new project Google is working on which was mentioned briefly (may be un-intentionally) at an event last year. Spanner: Google s next Massive Storage and Computation infrastructure(zz) - 星星 - 银河里的星星

Instead of caching data closer to user it looks love Google is trying to receive the data to the user. If you use GMail or a Google Doc service then with this framework Google could auto-magically move one of the master copies of your data to the nearest Google data center without truly having to cache anything locally. And since they are structure one single datastore cluster round the world instead of building hundreds of smaller ones for different applications it looks like they may not don t need dedicated clusters for specific projects anymore.

Below is the gist of Spanner from a talk by Jeff Dean at Symposium held at Cornell. Take a seem at the rest of the slides if you are interested in some impressive statistics on hardware performance and reliability.

Spanner: Storage & computation system that spans all our datacenters Single global namespaceNames are independent of location(s) of dataSimilarities to Bigtable: table families locality groups coprocessors Differences: hierarchical directories instead of rows fine-grained replicationFine-grained ACLs replication configuration at the per-directory levelsupport mingle of strong and weak consistency along datacentersStrong consistency implemented with Paxos across tablet replicasFull support for distributed transactions across directories/machinesmuch more automated operationSystem automatically moves and adds replicas of data and computation based on constraints and usage patternsAutomated allocation of resources across entire fleet of machines.

image

ReferencesGoogle: Designs Lessons and Advice from Building Large Distributed Systems2010: Google s Traffic is giant Which is why it should be your ISP 2009: Google Spanner instamatic redundancy for 10 million servers ? 2007: What is Google doing with all that dark fiber 2005: Google wants dark fiber

Related posts:

Pregel: Google s other data-processing infrastructureThe real concerns about Cloud infrastructure (as it is today)Is Yahoo launching a cloud storage solution : MObStorBrewers CAP Theorem on distributed systemsGoogle app locomotive review (Java edition)

datastore eventually consistent framework google mapreduce replication scalabilitydatastore eventually consistent google mapreduce replication scalability

You can follow any responses to this entry through the RSS 2.0 feed.You can leave a response or trackback from your own site.

 http://sancc.com/archives/2293

谷歌曾提出一个名为 Google Spanner 的远景规划:该规划的中心是谷歌一旦流量激增、硬件负荷过重,数据便在百万级的数据中心中自动转移。

今年夏天,谷歌首次暗示将开发分布式技术增强服务器功能,当数据中过超载或者过热时,数据自动且即时进行再分配,该技术是一种后端技术。最近,谷歌工程师、杰出搜索专家杰夫·狄恩 (Jeff Dean) 证明了此消息,并提供了PDF介绍文档。

新平台名为 Google Spanner,在英文中,Spanner原意为“扳手”,又有“桥梁的交叉支撑”之意。狄恩在报告中描述说:“这是存储与计算系统,它横跨我们的数据中心,自动移动与复制数据”。它与带宽、数据包丢失、能源、资源相关,还包括失败模式等内容。他还说,平台是对整个数据中心资源的再定位。

目前,谷歌在全球有36个数据中心,少许还在建设之中。谷歌从未透露自己有多少服务器,但是从狄恩的报告中却透露谷歌的规划:未来将达到1000万台服务器。

在文档中,狡恩阐述了谷歌的远景规划 Google Spanner 将拥有100万-1000万台服务器,围绕10兆个虚拟目录,面向10的18次方字节存储空间,它们将分布在全球。

对此,谷歌拒评。谷歌的PR也未对 Google Spanner 多作说明。谷歌的高级工程师与架构师吉尔·沃克斯 (Gill waxed)在旧金山的一个会议上却略有介绍。他将谷歌的分布式在线架构叫做“仓库式”机群,可以从危险与过热的数据中心中自动移动数据。他说:“我们正在建设的是仓库式的计算平台,当然,目前还是纸面意义多些。你必须与一切整合,从冷冻到CPU,都必须整合。”系统无须人工干预,即可完成大量工作。

当问及新技术是否已经投入使用时,沃克斯采用了外交式言辞作答:“我恐怕无法对此做出评论。”

You Might Also Like