What are the traditional 3 vs of big data(?)

Spread the love

The 3Vs of big data are typically described as; volume, variety and velocity which are three defining properties or dimensions rather, of big data. Volume would mean to the amount of data, while variety refers to the number of types of this data and then velocity would be referring to the speed of data processing. According to the 3Vs model, the challenges of big data management result from the expansion of all three properties, rather than just the volume alone which is the sheer amount of data that is to be managed. Big Data is not only about lots of amounts of data, it is also a concept providing an opportunity to develop new insights into your existing data(s), as well as guidelines to capture and analysis your future data. It will allow any business to be more agile and robust so it can adapt to its users and overcome many business challenges. Volume is the V mostly linked with big data because, well, volume can be large. What we’re talking about here is quantities of data that reach almost incomprehensible proportions. Velocity is basically the measurement of how quickly the data is being received. Facebook has to handle a plethora amount of photographs every day. It has to digest it all, process it, file it, and somehow, later, be able to grab it quickly and easily. Variety; Photos and videos and audio recordings and email messages and artifacts and books and presentations and tweet text and ECG strips are all forms of data, but they’re generally ill-structured, and incredibly varied; all this data diversity makes up the variety vector of which is big data. Essentially; the three Vs describe the data that is to be analyzed. Now, analytics would be the process of deriving value from this data. Taken all together, there is the potential for incredible insight or worrisome oversight. Like every other great power, big data comes with great promise, but more so a huge responsibility.

WHY COMPANIES LIKE GOOGLE AND AMAZON WERE AMONG THE FIRST TO ADDRESS THE BIG DATA PROBLEM.

It is because these companies are at the highest tier of data consumption, users, and cutting edge technology. The present judicial system appears to be still catching up on how to handle and regulate such mass and expansive data consumptions and reach. Take Facebook, for example, which stores incredible mass amounts of photographs. That statement doesn’t begin to boggle the mind until you start to realize that Facebook has more users than China has population. Each of those users has stored quite a bit of photographs. Facebook is storing roughly 250 billion images. Having such large amounts of data should be a large responsibility and not taken lightly. These companies are the first to reach such volumes of data, high user rates, and cutting edge technologies; they are the pioneers of big data and founders of the new frontier of internet data. An additional theory; during the dot-com bubble which lasted from about 1997-2000, there was a giant growth in the Internet sector and fields related to it. The period came to an end afterwards during which most of the smaller companies failed and fell off completely. This led to only a few companies surviving. However the ones that survived gained all the market share that was previously held by those many small failed companies. Microsoft and Google were among those few big corporations which survived the burst of the dot-com bubble and thereby acquired an enormous market share. This caused them to handle and maybe also inherited enormous amounts of data, more than ever known before.

THE VALUE COMPONENTS OF A KEY-VALUE DATABASE AND A DOCUMENT DATABASE DIFFERENT?

Key-value databases and document based databases are pretty similar. Key-value databases are the simplest form of the NoSQL databases: The basic data structure is a dictionary or map, rather. You can stash a value, such as an integer, string, a JSON structure, or also an array, along with a key used to reference that value, for instance. A brief example, a simple key-value database might have a value such as “Douglas Adams”. This value is then assigned an ID, such as cust1237. The data which is a series or collection of key value pairs is compressed as a document store quite similar to a key-value store, but the only difference is that the values stored (referred to as “documents”) provide some structure and encoding of the managed data. Both are NoSQL but they are a different way of holding the data. A document would be a separate entity which collects data and relationships (identifier). Usually document is understood as JSON, if you will.

THE MOST RELEVANT DIFFERENCES BETWEEN OPERATIONAL AND DECISION SUPPORT DATA?

Operational data would more likely represent transactions as they are happening live in real time fashion. Decision support data is a snapshot of the operational data at any given point in time. So, decision support data are likely historic, representing a time slice of the operational data. Operational databases are typically updated in live, in real time. For example, a sales transaction is inserted into a database as the sale would occur. While, in contrast, data warehouses are typically loaded in batches or at regular intervals, such as one per day.

THREE EXAMPLES OF LIKELY PROBLEMS WHEN OPERATIONAL DATA ARE INTEGRATED INTO THE DATA WAREHOUSE.

One may be tempted to consider that the data warehouse is just a big summarized database. But a good data warehouse is a lot more than this. A complete data warehouse architecture would include support for a decision support data type store, a data extraction and a integration filter, also a specialized presentation interface. To be pretty useful, the data warehouse must conform to uniform structures and formats to avoid data conflicts and alas to support decision making. Before a decision support database may be considered a true data warehouse, it would have to conform to the twelve rules as described for example. So, for big-performance internet transaction environments it is not easy or inefficient to scan the whole database every time data needs to be refreshed into the data warehouse environment, for example. In these said environments it would make more sense to determine what data actually needs to be updated into the data warehouse by examining the log tape or journal tape itself. The log tape is developed for the purposes of online backup and recovery in the possible event of a failure during online transaction handling. But the log tape would contain all the data that would need to be updated into the said data warehouse. The log tape is read offline and is used to gather the data that would need to be updated into this data warehouse.

References:

(n.d.). Retrieved from https://whatis.techtarget.com/definition/3Vs

Coronel, C., & Morris, S. (2015). Database Systems: Design, Implementation, and Management (11th ed.). : Cengage Learning.

(n.d.). Retrieved https://www.zdnet.com/article/volume-velocity-and-variety-understanding-the-three-vs-of-big-data/

(n.d.). Retrieved https://www.3pillarglobal.com/insights/exploring-the-different-types-of-nosql-databases

(n.d.). Retrieved
https://www.sciencedirect.com/topics/computer-science/data-warehouses

(n.d.). Retrieved from http://faculty.washington.edu/ocarroll/infrmatc/database/design/tsld022.htm.