Even a cursory study of the history of the internet, especially after the advent of the World Wide Web, indicates the power of the medium. In terms of users, it has grown 588.4% between Dec 2000 and June 2013, from around 369 million to 2.4 billion, though only 34.3% of the world's population had access to the internet in this period [
source]. The penetration is expected to grow to 45% by 2016, bringing the digital world to almost 3.4 billion users.
Businesses and governments have taken advantage of the internet, especially of the WWW, to create applications that many of us can't do without. Shopping for most of your needs, reservation of your airline, train, bus tickets, planning and managing your holiday tours, most banking activities, paying your bills, buying and renewing your insurance, filing your tax returns - all of these can be done over the internet. Streaming music and video keeps us entertained, social networking applications and blogs fulfil our need to share, Skype and WhatsApp help us to connect. Google Drive, Onedrive, Dropbox store our important documents and allow us to share restrictively. Almost every sphere of activity have applications dedicated to it, photography, ornithology, skill enhancement, stocks, funding of ideas to production, you name it, all allowing us to generate and use content.
All of these generate data, huge volumes of it. The infographic in this
article gives an indication of how much user driven data is generated every minute. CISCO
estimates total Global IP traffic by 2017 will be 120643 PB per month up from 43570 PB per month in 2012.(What is a
Petabyte (PB)? also see
here).
Not all data is user generated.
A example is India's 'Adhar' initiative, implemented by the Unique Identification Authority of India [
UIDAI]. Briefly it aims to provide a 12 digit unique identification number to every Indian (that's over a billion people) starting August 2009. It's mandate is to provide these numbers to 600 million (60 crores) citizens by 2014. It enrolls a citizen by collecting his/her iris and thumb scans and demographic data. Each enrollment pack is 5 MB of data. Till now it has generated 1500 PB of data [
source].
Every department of governments generate a humungous amount of data. Analysis of these data create more data, usually in the form of reports. Government rules and regulations mandate business to generate and publish to the public domain their Annual Reports. Organisations like Lexis Nexis collate business data from the public domain all over the world and create tools such as their
Dossier Suite.
Look around you and everyone is creating data and very large volumes of it: the United Nations, stock exchanges and financial and credit rating organisations, researchers and scientists, even machines and processes where sensors report the working parameters at regular intervals.
The question then is: How do we persist and maintain large volumes of data? And how do we use these? We’ll check these out in subsequent posts.