Is Big Data better Data Quality?
Big Data is everywhere. Chances are you've used a big data solution today. However, are big data solutions delivering big data quality?
High Availability versus High Data Quality
Typically, Big Data solutions are designed to ensure high availability. High availability is based on the concept that it is more important to collect and store data transactions than it is to determine the uniqueness or accuracy of the transaction. Some common examples of big data / high availability solutions are Twitter and Facebook.
It is possible to configure a big data solution to validate uniqueness and accuracy. I want to make sure I state that clearly. However, in order to do so you need to sacrifice some of the aspects of high availability to do so. So, in some regard, big data and data quality are at odds.
This is because one of the fundamental aspects of high availability is to write transactions to whichever node is available. In this model, consistency of transactional data is sacrificed in the name of data capture. Most often, consistency is eventually configured on data inquiries, or on data reads as opposed to data writes.
In other words, at some given point in time you do not have consistency in a big data dataset. Even more troubling is the fact that most transactional conflicts are resolved based on timestamps. Which is to say that the most recently updated transaction is commonly regarded as the most accurate. This approach is, obviously, an issue that requires further examination.
Room for improvement
As we examine big data solutions and learn more about implementing them, it is important to design more robust conflict resolution approaches that ensure that big data includes big data quality.
More on that to come ...
Great post, William
ReplyDeleteThe Gung Ho attitudes of, "Just give us the data!" and "Big Data is Good Data", are major obstacles to data quality.
What may people and enterprises fail to appreciate is that data of itself has no intrinsic value. It is only valuable when it conveys information and then, only if that information is significant to the enterprise - i.e. it supports the Business Functions.
So, rapidly collecting large amounts of data might be - and all too often is - a means of overwhelming an enterprise with garbage.
Remember the old days of programming when the warning was GIGO - Garbage in Garbage out. That is still true today.
It is this failure to follow the fundamentals of business and data quality that are in danger of turning the Data Quality business in a Data Garbage business - in some quarters this has already happened.
Regards
John
John,
ReplyDeleteI agree. As we move forward into the age of big data solutions, there also needs to be a message that conveys the fact that collecting more means analyzing more. Analyzing in the sense of verification that the data is accurate and consistent
Big Data ... Little Data Quality - http://goo.gl/PBUfY via @dqchronicle
ReplyDeleteThe "biggest" problem with data quality is obesity. We are clogging our system arteries with data plaque and at some point it will result in a massive system coronary. Before becoming too enamored with big data it is best to determine if ths glut of data is of any use to the business. Before embarking on data quality for big data, we need to still make te business case for small data.
ReplyDeleteRichard
ReplyDeleteAgain, I agree. Sadly, I think business is bedazzled with Big Data and we'll be cleaning it up before business realizes the quality over quantity advantage.
Thanks for the comment, Richard!
Interesting post http://t.co/jgUCgSN6 by @dqchronicle
ReplyDeleteTableau Data Blending
ReplyDeleteuseful information.thank you for sharing.
ReplyDeletedata science training in noida
cool stuff you have and you keep overhaul every one of us
ReplyDeletedata science certification
Thanks for sharing valuable information and very nice article. Keep posting.
ReplyDeleteetl testing training
online etl testing training
Through this post, I realize that your great information in playing with all the pieces was exceptionally useful. I advise this is the primary spot where I discover issues I've been scanning for. You have a smart yet alluring method of composing.
ReplyDeletecertification of data science
perde modelleri
ReplyDeletesms onay
mobil ödeme bozdurma
nft nasıl alınır
ankara evden eve nakliyat
trafik sigortası
dedektor
Site Kurma
aşk kitapları
Smm Panel
ReplyDeleteSmm Panel
iş ilanları
instagram takipçi satın al
hirdavatciburada.com
Https://www.beyazesyateknikservisi.com.tr
servis
tiktok jeton hilesi
uc satın al
ReplyDeletenft nasıl alınır
en son çıkan perde modelleri
en son çıkan perde modelleri
özel ambulans
minecraft premium
yurtdışı kargo
lisans satın al