IDC and EMC have released a new study today - 'The Expanding Digital Universe: A Forecast of Worldwide Information Growth Through 2010'. (press release here).

It's fascinating stuff. The research follows previous work conducted at the University of California, Berkeley (I've blooged this previously here). The methodology used for the IDC/EMC study varied from the Berkeley study in that Berkeley study examined the creation of original information (not including copies) and estimated how much digital information that would represent if all of it were converted to digital format (think: total amount of information we create).  The IDC/EMC study is a forecast for devices that create or capture digital information – PCs, digital cameras, servers, sensors, etc. – and estimates the total number of megabytes they capture or produce in a year (think: actual and forecasted size of the digitial universe).

So on to the interesting tidbits from the IDC/EMC study:

  • Between 2006 and 2010, the information added annually to the digital universe will increase more than six fold from 161 exabytes to 988 exabytes*, a compound annual growth rate of 57%.
  • While nearly 70% of the digital universe will be generated by individuals by 2010, organizations will be responsible for the security, privacy, reliability and compliance of at least 85% of the information.
  • Images, captured by more than 1 billion devices in the world, from digital cameras and camera phones to medical scanners and security cameras, comprise the largest component of the digital universe.
  • The number of images captured on consumer digital still cameras in 2006 exceeded 150 billion worldwide, while the number of images captured on cell phones hit almost 100 billion. IDC is forecasting the capture of more than 500 billion images by 2010
  • The number of e-mail mailboxes has grown from 253 million in 1998 to nearly 1.6 billion in 2006. During the same period, the number of e-mails sent grew three times faster than the number of people e-mailing; in 2006 just the e-mail traffic from one person to another – i.e., excluding spam – accounted for 6 exabytes.
  • Unstructured Data – Over 95% of the digital universe is unstructured data. In organizations, unstructured data accounts for more than 80% of all information.
    • The report says: "DC believes that over time it will become easier to deal with unstructured data as (1) more and more metadata is added to unstructured data, (2) structure is added to unstructured data, and (3) access systems provide structured views of both structured and unstructured data."
    • Interestingly, the study refers to the Semantic Web as a research area to follow regarding this topic.
  • Chevron's CIO says his company accumulates data at the rate of 2 terabytes – 17,592,000,000,000 bits – a day
  • Wal-Mart - reputed to have the largest database of customer transactions in the world In 2000, that database was reported to be 110 terabytes, with recordings and storage of information on tens of millions of transactions a day. By 2004, it was reported to be half a petabyte

So if the digital universe is expanding exponentially - and it looks like 'we' are the ones generating most of it (and consuming it) - how are we going to cope with the ever increasing amount of information?


*In case you're wondering, an exabyte is 1,000,000,000,000,000,000 bytes OR 1018 bytes - there 1024 petabytes in an exabyte or 1,073,741,824 gigabytes in an exabyte.  To give you an idea of what this means, five exabytes of information is equivalent in size to the information contained in 37,000 new libraries the size of the Library of Congress book collections.


