PrerequisitesView Answer on Quora
- Unix shell basics: http://www.amazon.com/Uni
x-Progr... - C: http://www.amazon.com/Pro
grammin... - OS basics: http://www.amazon.com/And
rew-S-T... , http://www.amazon.com/Lin ux-Kern... - Unix Programming: http://www.kohala.com/sta
rt/ - Networking Basics: http://www.amazon.com/Com
puter-N... , more advanced: http://web.mit.edu/dimitr ib/www/... - Sockets: http://www.amazon.com/TCP
-Socket... , http://www.amazon.com/Fou ndation... , and network programming: http://www.kohala.com/sta rt/
- Transmission of information: http://www.amazon.com/Inf
ormatio... , http://www.inference.phy. cam.ac.... and network coding - Intro to concurrency: http://www.amazon.com/Pra
ctical-... - Java: What questions are Java Software Engineers seeing the most of on technical interviews?
- Data structures and algorithms: Learning Algorithms: What are the most learner-friendly resources for learning about algorithms?
Courses
- CS525: http://www.cs.uiuc.edu/cl
ass/sp1... - 6.824: http://pdos.csail.mit.edu
/6.824/... - 6.828: http://pdos.csail.mit.edu
/6.828/... - CS264: http://www.cs264.org/lect
ures/le... - CS294: http://www.cs.berkeley.ed
u/~oded... - CS 707: http://www.cs.gmu.edu/~se
tia/cs7... - Advanced Computer Science Courses: http://the-paper-trail.or
g/blog/... - CS696: http://www.eli.sdsu.edu/c
ourses/... - Google Code University: http://code.google.com/ed
u/paral... - CS7960: http://www.cs.utah.edu/~j
effp/te... - 6.829 http://ocw.mit.edu/course
s/elect... - Textbook: http://www.tacc.utexas.ed
u/~eijk...
NotesOther:
- BitTorrent: http://en.wikipedia.org/w
iki/Bit... MapReduce: What's the best way to come up to speed on MapReduce, Hadoop, and Hive? )
- Start with Lin, Data-Intensive Text Processing with MapReduce, ISBN 1608453421, http://www.umiacs.umd.edu
/~jimmy... - See Tom White, Hadoop: The Definitive Guide, ISBN 1449389732, http://www.amazon.com/Had
oop-Def..., Sean Owen et al., Mahout in Action, ISBN 978193518268, http://www.manning.com/ow en/ - HDFS under the hood: http://assets.en.oreilly.
com/1/e... - Zookeper: http://highscalability.co
m/zooke... - Download Hadoop (http://hadoop.apache.org
/) and run some MapReduce jobs on your laptop in pseudo-distributed mode (see Google Research: What are the most interesting Google Research papers?
- Learn about Google technology stack (MapReduce, BigTable, Dremel, Pregel, GFS, Chubby, Protobuf, Snappy, Ganeti, Tenzing, Sawzall, BigQuery, F1, Spanner, Jingle, GCM, Google Talk, etc). (See
also http://www.columbia.edu/~ak2834/... , http://www.cs.rutgers.edu /~muthu... , http://the-paper-trail.or g/blog/...)
- Setup account with Amazon AWS/EC2/S3/EBS and experiment with running Hadoop on a cluster with large data sets (you can use Cloudera or YDN images, but in my opinion you can better understand the system if you set it up from scratch, using the original distribution). Watch the costs: http://www.networkworld.c
om/news... What are some promising open-source alternatives to Hadoop MapReduce for map/reduce?)
- Try out Hadoop alternatives, specifically the minimalist frameworks such as BashReduce: http://github.com/erikfre
y/bashr... and CloudMapReduce: http://code.google.com/p/ cloudma... (see
- See Machine Learning: What are some good class projects for machine learning using MapReduce?
- Run Bryan Cooper's Cloud Serving Benchmark on AWS, compare Hbase vs
Cassandra performance on a small cluster (6-8 nodes): http://wiki.github.com/brianfran... also see Pete Warden's tests: http://petewarden.typepad .com/se... , Hbase book: http://hbase.apache.org/b ook.html - Run LINPACK benchmark: http://www.datawrangling.
com/on-... - Run some experiments with MPI
(http://www.mcs.anl.gov/research/...) try to implement a
simple clustering algorithm
(e.g http://en.wikipedia.org/wiki/K-m...) with MPI vs
Hadoop/MapReduce and compare the performance, fault tolerance, ease of
use etc. Learn the differences between the two approaches, and when it
makes sense to use each one.- Check out Dongarra' papers: http://www.netlib.org/utk
/people... , works by Gibbons: http://www.bell-labs.com/ org/112... , Lamport: http://research.microsoft .com/en... , Blelloch: http://www.cs.cmu.edu/~gu yb/pubs... , also see What are the seminal papers in distributed systems? Why? - There is a new library called MPI-Mapreduce
(http://www.sandia.gov/~sjplimp/m...) see how it works and how
it compares to other MapReduce implementations- Run some tests with Scalapack (http://www.netlib.org/sc
alapack/), try to port one of the routines to Hadoop, compare the performance and scalability. See how stability of numerical algorithms is evaluated: http://portal.acm.org/cit ation.c... - Write your own simplified MapReduce runtime in C or any other programming language.
- Try xargs -P and GNU Parallel, also see What are some lesser known but useful Unix commands?
- Check out http://www.cascading.org/
, http://clojure.org/ , http://www.bloom-lang.net /features/ - Learn about distributed hash tables (http://en.wikipedia.org/
wiki/Dis... , http://www.linuxjournal.c om/arti...), run some experiments with Paxos (Algorithm) http://the-paper-trail.or g/blog/..., Kademlia: http://en.wikipedia.org/w iki/Kad... , See Wolf Garbe's answers on Peer-to-Peer Technology. Also see Petar Maymounkov's answer to Computer Science Research: Have there been any new advances in distributed hash tables? - Download Nutch (http://nutch.apache.org/
) or Solr
(http://lucene.apache.org/solr/), run a crawl on Wikipedia. Analyze the
collected data with R (see item 2 above) or Python
(http://www.nltk.org/)- Write you own simplified crawler/indexer, test the performance and
scalability, look at the Lucene source for ideas, look
at http://infolab.stanford.edu/~bac... for inspiration. You
can probably build it as a term project in either Information Retrieval
or Search Engines course.- Learn about prefix-sum: http://en.wikipedia.org/w
iki/Pre... ,parallel
matrix multiplication: http://www.cs.berkeley.edu/~yeli... ,streaming: http://infolab.stanford.e du/stream/ and
BSP: http://en.wikipedia.org/wiki/Bul... , DSM: http://www.amazon.com/Dis tribute... and "The Use of Name Spaces in Plan 9" - Check out Persistent Linda: http://www.google.com/#sc
lient=p... , if you find it interesting see Linda and Friends (an article by Sudhir Ahuja from 1986), also search for "Linda in Context", "tuple space", Javaspaces, Gigaspaces. Read "How to write parallel programs: a guide to the perplexed" by Nicholas Carriero and David Gelernter: http://portal.acm.org/cit ation.c... - Learn about Compute Unified Device Architecture (CUDA): http://www.amazon.com/CUD
A-Examp... , Graphics Processing Unit and Field Programmable Gate Arrays accelerators, PlayStation 3 programming: http://en.wikipedia.org/w iki/Cel... , http://www.hotchips.org/h c21/mai... - Pick one of the PGAS languages
(http://en.wikipedia.org/wiki/Par...), e.g.
X10 (http://en.wikipedia.org/wiki/X10..., go
through the tutorials
(http://ppppcourse.ning.com/forum...),
run some HPC benchmarks (LU, FFT) and the examples (the streaming
example in particular): see how it scales on a cluster/AWS, compare to
sequential and Hadoop/MapReduce implementation, see what kind of
performance/scalability gains it gives you on multicore boxes.- Some good references on parallel programming: Herlihy& Shavit, The art
of multiprocessor programming:
http://www.amazon.com/Art-Multip... , Blelloch, Vector models for data-parallel computing:
http://citeseerx.ist.psu.edu/vie... , Valiant, A bridging model for parallel computation:
http://portal.acm.org/citation.c... ,Hillis & Steele, Data
Parallel Algorithms: http://portal.acm.org/citation.c... , Miller & Boxer, Algorithms sequential and parallel: http://www.amazon.com/Alg orithms... , Leighton, Introduction to Parallel Algorithms and Architectures: http://www.amazon.com/Int roducti... , JaJa: Introduction to Parallel Algorithms: http://www.amazon.com/Int roducti... - You should probably start with Dijkstra, Cooperating Sequential Processes: http://www.cs.utexas.edu/
users/E... and Ben-Ari, Principles of Concurrent and Distributed Programming: http://www.amazon.com/Pri nciples... (I have the older edition from 1982, which is an excellent intro)
- Take a course in Parallel Computer Architecture: http://www.eecs.berkeley.
edu/~cu... , http://www.amazon.com/Par allel-C... , http://people.engr.ncsu.e du/efg/... - Check out Cilk: http://software.intel.com
/en-us/... and
Matlab Parallel computing toolbox:
http://www.mathworks.com/product... - For some theoretical background on distributed algorithms, information decomposition and complexity see: Feldman et al., On the Complexity of Processing Massive, Unordered, Distributed Data: http://arxiv.org/abs/cs/0
611108, Traub, An introduction to information based complexity: http://octopus.library.cm u.edu/C... and Is Nancy Lynch's book still the best intro to distributed algorithms? - Parallel Distributed Processing (PDP) by Rumelhart and PDP research group: http://www.amazon.com/Par
allel-D... - look up the computing architectures for Artificial Neural Networks, e.g. http://www.amazon.com/Par allel-A... - Run some experiments with Weka (http://www.cs.waikato.ac
.nz/ml/w...) or RapidMiner (http://rapid-i.com/), pick a simple algorithm and port it to MapReduce, see how it scales on a cluster/AWS - Experiment with distributed 'NoSQL' data stores (Voldemort, Hbase, Redis, Tokyo, Cassandra etc). Figure out what is CAP theorem all about
(http://www.allthingsdistributed.... , http://www.cloudera.com/b log/201... ).
Create a simple app with key-value or column-based store as a back-end.
Import several GBs of interesting data into it and run some simple
clustering/KNN algos (http://en.wikipedia.org/wiki/Clu..., http://en.wikipedia.org/w iki/Nea...).
Optimize your algo to better utilize random access patterns, experiment
with various tuning options. Build a frond-end visualization for the
results (Check out Protovis or similar visualization
package: http://vis.stanford.edu/protovis/) - A good resource on 'NoSQL': Daniel Abadi's publications: http://cs-www.cs.yale.edu
/homes/... and Varley, No Relation: The Mixed Blessings of Non-Relational Databases: http://ianvarley.com/UT/M R/Varle... - Doozer: http://xph.us/2011/04/13/
introdu... - Learn about main-memory
databases: http:YouTube: What is YouTube's architecture?//en.wikipedia.org/wiki/In-memory_dat abase , http://scholar.google.com /schola..., http://monetdb.cwi.nl/ , http://hstore.cs.brown.ed u/ , Microsoft Trinity - a graph database over distributed memory cloud: http://research.microsoft .com/en... - Write a distributed hash table in C, here is a good reference: http://pdos.csail.mit.edu
/papers... or use node: https://github.com/stbueh ler/nod... - Networking: http://www.amazon.com/Uni
x-Netwo... , http://www.amazon.com/TCP -Illust... , Network Programming: What are some good resources for learning about network programming? - Write a distributed file system in C. See git for inspiration: http://apenwarr.ca/log/?m
=200801#31 , Frangipani http://portal.acm.org/cit ation.c... . For a good intro see the Tanenbaum's series: http://www.amazon.com/Dis tribute... , http://www.amazon.com/Mod ern-Ope... and http://www.stanford.edu/c lass/cs... , The Amoeba Distributed OS: http://www.cs.vu.nl/pub/a moeba/a...
- Graph databases, etc: http://nosql-database.org
/ , http://www.graph-database .org/ , GraphLab: http://graphlab.org/ - Facebook Engineering: What is Facebook's architecture? , What are the most interesting Facebook Data papers/projects?
- Hadoop/Hbase at Facebook: http://borthakur.com/ftp/
Realtim... - YouTube: What is YouTube's architecture?
- Justin.tv: How does justin.tv work?
- Scalability: How does Heroku work?
- Netflix: What is Netflix's architecture?
- Hulu: What is Hulu's architecture?
- eBay: What is eBay's architecture?
- Dropbox: What is Dropbox's architecture?
- What are the core technologies that Twitter uses for their platform and what is the Twitter Macro architecture?
- LinkedIn SNA: http://sna-projects.com/s
na/ - What is LinkedIn's database architecture like?
- Quora Infrastructure: How does LiveNode work?
- Scaling LiveJournal: http://danga.com/words/20
07_06_u... - Content Delivery Networks: How does Akamai CDN work?
- GitHub architecture: https://github.com/blog/5
30-how-... - Twitter Rainbird: http://www.slideshare.net
/kevinw..., http://www.slideshare.net /nkalle... - Yahoo! S4: https://github.com/s4/cor
e , http://docs.s4.io/ - IBM Infosphere streams/System S: http://www-01.ibm.com/sof
tware/d... - BackType Storm: http://news.ycombinator.c
om/item...
- Octobot, a distributed task queue worker: http://octobot.taco.cat/
- F* by Microsoft: http://research.microsoft
.com/en...
- HN thread on the architecture of backend systems: http://news.ycombinator.c
om/item... - The secrets of Node's success: http://radar.oreilly.com/
2011/06... - Druid: A Distributed, In-Memory OLAP Store: http://metamarketsgroup.c
om/blog... (some dissing here: http://news.ycombinator.c om/item...) - FathomDB response to AWS outage: http://news.ycombinator.c
om/item... - Google Ganeti - Cluster-based virtualization management software: http://code.google.com/p/
ganeti/ , http://k1024.org/~iusty/p apers/i... - Google GO: http://www.theregister.co
.uk/201... - Erlang/OTP: http://learnyousomeerlang
.com/co... - Cloud Haskell: http://research.microsoft
.com/en... - NASA Nebula: http://nebula.nasa.gov/
- Platform MR: http://www.platform.com/P
roducts... - Fast 2011: http://www.usenix.org/eve
nts/fas... - The history of consensus: http://betathoughts.blogs
pot.com... (via http://the-paper-trail.or g ) - Distributed Linked List: http://www.google.com/sea
rch?scl... - GIbbons, Synopsys Data Structures For Massive Data Sets: www.cs.princeton.edu/cour
ses/archive/spring04/cos5 98B/bib/GibbonsM-syn.pdf - Lock-Free Linked Lists and Skip Lists: http://www.cse.yorku.ca/~
ruppert... - Maekawa's lock: http://www.google.com/#sc
lient=p... - Crossbow, searching for SNPs with cloud computing: http://www.biomedcentral.
com/con... - Distributed Caching: Hazelcast, Ehcache, Terracotta (company), Memcached, Oracle Coherence
- Bitcoin, A Peer-to-Peer Electronic Cash System: www.bitcoin.org/bitcoin.p
df - Distributed computing with JS: BitCoin miner: http://news.ycombinator.c
om/item..., MapRejuice: https://github.com/ryanmc grath/m... , http://www.igvita.com/200 9/03/03... - OpenCirrus - Cloud Computing Research Testbed: https://opencirrus.org/co
ntent/r... - Antonio Piccolboni, A Comparison of Eight MapReduce
Languages: http://www.dataspora.com/2011/04... - Caching and processing 2TB in memory with Hazelcast: http://highscalability.co
m/blog/... - Dapper: a Large-Scale Distributed Systems Tracing Infrastructure: http://static.googleuserc
ontent.... - On the performance of distributed lock-based synchronization: http://portal.acm.org/cit
ation.c... - Tonika: social routing with organic security: http://pdos.csail.mit.edu
/~petar... - http://www.linuxvirtualse
rver.org/ - It's time for low latency: http://www.matt-welsh.blo
gspot.c... - OS Research Wanted: http://surriel.com/resear
ch_wanted - FlightPath: Obedience vs. Choice in Cooperative Services: http://www.usenix.org/eve
nt/osdi... - Piccolo: Building Fast, Distributed Programs with Partitioned Tables: http://piccolo.news.cs.ny
u.edu/p... - CIEL: a universal execution engine for distributed data-flow computing: www.usenix.org/event/nsdi
11/tech/full_papers/Murra y.pdf - Directed Edge: On building a stupidly fast graph database: http://blog.directededge.
com/200... - What is the best tutorial for Python's Twisted framework?
- Node.js: What are the best resources to learn Node.js?
- Parallelism /= Concurrency: http://ghcmutterings.word
press.c... - PyCon 2011: Handling ridiculous amounts of data with probabilistic data structures: http://blip.tv/pycon-us-v
ideos-2... - Go (programming language) at Heroku: http://blog.golang.org/20
11/04/g... - Meijer & Lamport, Mathematical Reasoning and Distributed Systems: http://channel9.msdn.com/
Shows/G... - Concurrency's Shysters: http://blogs.oracle.com/b
mc/entr... - Horton: Online Query Execution On Large Distributed Graphs: http://www.graph-database
.org/20... - DataDomain: http://www.datadomain.com
/
- WebRTC: https://sites.google.com/
site/we... - Bloom: http://www.bloom-lang.net
/ and http://boom.cs.berkeley.e du/pape... - Jini tutorial: http://jan.newmarch.name/
java/ji... - java distributed cache for low latency, high availability: http://stackoverflow.com/
questio... - Scalable, Distributed Data Structures for Internet Service Construction (2000): http://usenix.org/events/
osdi200... - Cloud Programming: From Doom and Gloom to BOOM and Bloom: http://neilconway.org/tal
ks/boom...
- SEDA: An Architecture for Highly Concurrent Server Applications: http://www.eecs.harvard.e
du/~mdw... (http://matt-welsh.blogsp ot.com/2... ) - Scalable Network programming: http://bulk.fefe.de/scala
ble-net... - Protocol Buffers: http://news.ycombinator.c
om/item... (This is a live list. Edits and additions welcome)
- What are some current directions in operating system research?
- Distributed Systems: What are the best resources for learning about distributed file systems?
- How do I approach building a distributed queue architecture?
- What are some good resources for learning about data compression? Why?
- Information Retrieval: What are some good resources to get started with Information Retrieval? Why?
- What are good resources to learn about search engine architecture?
- What are the good resources to learn about distributed, scalable, robust software architecture/infrastructu
re building? - What are some common approaches to error aggregation, alerting, and analysis in distributed systems?
- What are some good research papers and articles on fault-tolerant systems design?
- Big Data: What are the most influential papers in the world of big data? Why?
- Large Scale Learning: What are some introductory resources for learning about large scale machine learning? Why?
- Computer Science Research: Which CS areas have the most low-hanging fruit for research?
- Big Data: Why the current obsession with "big" data?
- Distributed Systems: Which conferences are the best to follow for Distributed Systems?
- What are the best recommended research topics on databases according to edge technologies and recent research trends?
- Distributed Systems: What are the most interesting research projects related to the management of distributed systems?
- TCP/IP: What are some high performance TCP hacks?
- Pike, Systems Software Research is Irrelevant: http://herpolhode.com/rob
/utah20... - Concurrency in Go (programming language): http://golang.org/doc/eff
ective_... - Communicating Sequential Processes (CSP): http://www.usingcsp.com/
- Scalable Joins: http://research.microsoft
.com/en... - Kestrel, tiny queue system based on starling, in scala: https://github.com/robey/
kestrel - Disruptor - concurrent programming framework: http://code.google.com/p/
disruptor/ - DataTurbine streaming engine: http://www.dataturbine.or
g/ - The Task Parallel Library (TPL) in .NET: http://msdn.microsoft.com
/en-us/... - A crash course on modern hardware: http://www.infoq.com/pres
entatio... - Infinispan data grid on top of JGroups: http://www.jboss.org/infi
nispan - Memcached distributed cache on top of Jgroups: http://www.jgroups.org/me
mcached... - Scalable Application Layer Multicast: http://pages.cs.wisc.edu/
~suman/... - Hadapt: Efficient Processing of Data Warehousing Queries in a split execution environment: http://portal.acm.org/cit
ation.c... - Danga's open source projects: http://danga.com/
- Hadoop on MPI: http://hadoopbi.com/index
.php/te... - Hadoop on Pallet: http://sritchie.github.co
m/2011/... - Oracle Grid Engine: http://en.wikipedia.org/w
iki/Ora... - Ejabberd: a scalable XMPP instant messaging server: http://www.ejabberd.im/
- Cheetah - Circuit-switched High-speed End-to-End Transport ArcHitecture:
http://www.ece.virginia.edu/chee... - PVM: http://www.snakebytestudi
os.com/... - Systems at ETH Zurich: http://www.systems.ethz.c
h/resea... - OpenCL: http://www.khronos.org/op
encl/ - Cloud Haskell: http://research.microsoft
.com/en... - Concurrent programming in Erlang: http://www.erlang.org/erl
ang_boo... - Concurrent programming in Occam 2: http://www.amazon.com/Pro
grammin... - Concurrent and Real-Time Programming in Ada: http://www.amazon.com/Con
current... - Distributed Programming in Ruby: http://www.amazon.com/dp/
0321638... - REST: http://rest.elkstein.org/
- Gearman: http://gearman.org
- Drizzle: https://launchpad.net/dri
zzle - Distributed logging with syslog: https://wiki.archlinux.or
g/index... - Logs are streams, not files: http://adam.heroku.com/pa
st/2011... - Utilizing Redis in distributed Erlang systems (Heroku): http://erlang-factory.her
okuapp.... - MS Command Shell: http://arstechnica.com/bu
siness/... - MS PowerShell: http://en.wikipedia.org/w
iki/Win... - DRb - Distributed Ruby: http://segment7.net/proje
cts/rub... - God - The Ruby Framework for Process Management: https://github.com/mojomb
o/god - Taco Bell programming: http://teddziuba.com/2010
/10/tac... - Rush - the Ruby Shell: http://rush.heroku.com/
- CloudCrowd: https://github.com/docume
ntcloud... - Coda: http://www.coda.cs.cmu.ed
u - Tenzing: http://research.google.co
m/pubs/... - GNTP: http://www.growlforwindow
s.com/g... - CycleCloud : http://blog.cyclecomputin
g.com/2... - GNU Parallel: http://www.gnu.org/softwa
re/para... - Torque: http://www.adaptivecomput
ing.com... - Chef: http://www.opscode.com/ch
ef/ - Dempsy - Nokia's Distributed Elastic Message Processing System: http://dempsy.github.com/
Dempsy/ - F1 -The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business: http://research.google.co
m/pubs/... - Galaxy - a distributed in-memory data grid by Parallel Universe: http://blog.parallelunive
rse.co/... - Spanner: Google's Globally-Distributed Database: http://research.google.co
m/archi... - Jingle: http://code.google.com/p/
libjingle/ - Paxos Made Live: http://www.eecs.harvard.e
du/cs26... - TeleHash: https://github.com/quartz
jer/Tel... - Adobe RTMP: http://en.wikipedia.org/w
iki/Rea... - UPnP: http://en.wikipedia.org/w
iki/Uni... - IP multicast: http://en.wikipedia.org/w
iki/IP_... - Reliable multicast: http://en.wikipedia.org/w
iki/Rel... - JGroups: http://www.jgroups.org/
- LDPC Codes: http://en.wikipedia.org/w
iki/Low... - Erasure codes: see the last chapter in D.J.C MacKay's http://www.inference.phy.
cam.ac.... (note that some of these are heavily patented, e.g. http://www.inference.phy. cam.ac.... ) - Blaze: Next-gen NumPy on which to build out-of-core and distributed algorithms: https://speakerdeck.com/s
diehl/b... - Spark cluster computing by UC Berkeley AMPLab
- NSQ by bitly | ♥ your bitmarks: NSQ: realtime distributed message processing at scale
- Celery: Distributed Task Queue
- Go'Circuit by Tumblr: Paradigm for developing and sustaining Big Data apps
- List of popular backend stacks: ragingwind/backend-archit
ectures.md - CRAN Task View: High-Performance and Parallel Computing with R
- Apache Accumulo by NSA
- alternative-internet
- OpenFlow for programmable networks: Page on Openflow, switch spec: http://archive.openflow.o
rg/docu... - On data center scale, OpenFlow, and SDN
- Docker, The linux container engine , and one use case at Etsy: LXC - Running 14,000 tests per day and beyond! (Part 1)
- Apache Mesos (A Platform for Fine-Grained Resource Sharing in the Data Center, tech report: Page on Berkeley )
- Apache Spark: Lightning-Fast Cluster Computing via https://news.ycombinator.
com/ite... - Google's MillWheel: Fault-Tolerant Stream Processing at Internet Scale: Page on Googleusercontent
- Google's Sibyl, a distributed learning system: www.magicbroom.info/Paper
s/Ladis10.pdf - Berkeley amplab - SQL Benchmark: Redshift vs Hive vs Impala vs Shark : Big Data Benchmark based on Page on Brown
- The Log: What every software engineer should know about real-time data's unifying abstraction | LinkedIn Engineering by Jay Kreps (also Kafka: a Distributed Messaging System for Log Processing )
Tuesday, February 25, 2014
What are some good resources for learning about distributed computing? Why?
Answer by Alex Kamil:
Subscribe to:
Post Comments (Atom)
What data quality is (and what it is not)
Like the radar system pictured above, data quality is a sentinel; a detection system put in place to warn of threats to valuable assets. ...
-
Answer by Alex Kamil: Prerequisites Unix shell basics: http://www.amazon.com/Uni x-Progr... C: http://www.amazon.com/Pro grammin... OS basic...
-
While most organizations have data quality issues, not every organization has a budget for software to monitor, report and remedy data qua...
Chuyennhasgthanhhung.com - dịch vụ chuyển nhà trọn gói Hà Nội - chuyển nhà tại Hà Nội - chuyển nhà trọn gói Thành Hưng. Công ty cung cấp gói giải pháp chuyển nhà số 1 Việt Nam, liên hệ ngay với Thành Hưng để được báo giá nhanh nhất và chính xác nhất.
ReplyDeleteTừ khóa chuyển nhà Hà Nội - công ty cung cấp dịch vụ chuyển nhà Thành Hưng:
#chuyennhasgthanhhung #chuyennhasgthanhhung.com #chuyennhahanoi #chuyennhataihanoi #dichvuchuyennhahanoi #dichvuchuyennhataihanoi #dichvuchuyennhathanhhung #chuyennhatrongoithanhhung #chuyennhatrongoihanoi #chuyennhatrongoitaihanoi #donnhathanhhung #donnhatrongoihanoi #chuyennhagiarehanoi #dichvuchuyennhagiarehanoi
Hệ thống social chuyển nhà Thành Hưng:
https://visual.ly/users/vantaithanhhung/
https://www.pearltrees.com/vantaithanhhung300
https://500px.com/thanhhung300
https://dashburst.com/vantaithanhhung/
https://ko-fi.com/vantaithanhhung
Thietkenhadepmoi.vn - công ty thiết kế nhà đẹp uy tín. Thiết kế nhà - thiết kế nhà đẹp mới - thiết kế thi công nhà đẹp - thiết kế xây dựng nhà đẹp - mẫu nhà đẹp mới. Tổng hợp các mẫu nhà mái thái đẹp - mẫu nhà phố đẹp - mẫu biệt thự đẹp sang trọng.
Liên hệ ngay với công ty Thiết kế nhà đẹp mới để được tư vấn và báo giá miễn phí!
Từ khóa thiết kế nhà đẹp mới :
#thietkenhadepmoi #thietkenhadepmoivn #thietkennha #thietkennhadep #maunhadepmoi #thietkennhadepmoi #thietkennhauytin #maubietthudep #maunhamaithaidep #mauthietkenhadep #thietkexaydungnhadep #maunhaphodep #congtythietkenhadep #thietkethicongnhadep #thietkexaydungnhadep #maunhadepmoi #maunhamoi #xaydungnhadep #congtynhadep
https://medium.com/@thietkenhadepmoivn
https://dashburst.com/thietkenhadepmoivn
https://www.vietnamta.vn/profile-57466/
https://gumroad.com/thiekenhadepmoivn
https://500px.com/thiekenhadepmoivn
Website: Thietkenhadepmoi.vn
Thietkenhadepmoi.vn - công ty thiết kế nhà đẹp uy tín. Thiết kế nhà - thiết kế nhà đẹp mới - thiết kế thi công nhà đẹp - thiết kế xây dựng nhà đẹp - mẫu nhà đẹp mới. Tổng hợp các mẫu nhà mái thái đẹp - mẫu nhà phố đẹp - mẫu biệt thự đẹp sang trọng.
Liên hệ ngay với công ty Thiết kế nhà đẹp mới để được tư vấn và báo giá miễn phí!
Từ khóa thiết kế nhà đẹp mới :
#thietkenhadepmoi #thietkenhadepmoivn #thietkennha #thietkennhadep #maunhadepmoi #thietkennhadepmoi #thietkennhauytin #maubietthudep #maunhamaithaidep #mauthietkenhadep #thietkexaydungnhadep #maunhaphodep #congtythietkenhadep #thietkethicongnhadep #thietkexaydungnhadep #maunhadepmoi #maunhamoi #xaydungnhadep #congtynhadep
https://www.wishlistr.com/thiekenhadepmoi
https://ko-fi.com/thietkenhadepmoivn
https://weheartit.com/thiekenhadepmoivn
https://www.deviantart.com/thietkenhadepmoivn
https://masthead.social/@thietkenhadepmoivn
Website: Thietkenhadepmoi.vn
Englishtivi.com is a free website for English learners. You can improve your English story, vocabulary words, grammar, sentences, speaking, writing, idioms …. Thousands of English videos and lessons are waiting for you.
ReplyDeleteThat's why, this website was founded with a simple vision: To become your go-to resource to Improve Your English Skills | Help You Change Your Life!
Official website: englishtivi.com
https://sites.google.com/view/englishtivi
https://trello.com/englishtivi
#Englishtivi
#Englishtv
#englishtiviyoutube
#englishtivionline
#englishtivilevel3
#englishtivilevel1
#learnenglishthroughstory
#learnenglishthroughstories
#englishwords
#englishgrammar
#englishstories
#englishstory