A Short History of Big Data

Big Data didn’t arrive overnight — it grew from classic data warehousing to the web’s unstructured flood and today’s mobile/IoT streams. Here’s the short history behind the term and why it matters now.

By |Published On: August 15, 2025|Last Updated: September 10, 2025|Categories: , |

Where does ‘Big Data’ come from?

The term ‘Big Data’ has been in use since the early 1990s. Although it is not exactly known who first used the term, most people credit John R. Mashey (who at the time worked at Silicon Graphics) for making the term popular.

In its true essence, Big Data is not something that is completely new or only of the last two decades. Over the course of centuries, people have been trying to use data analysis and analytics techniques to support their decision-making process. The ancient Egyptians around 300 BC already tried to capture all existing ‘data’ in the library of Alexandria. Moreover, the Roman Empire used to carefully analyze statistics of their military to determine the optimal distribution for their armies.

However, in the last two decades, the volume and speed with which data is generated has changed – beyond measures of human comprehension. The total amount of data in the world was 4.4 zettabytes in 2013. That is set to rise steeply to 44 zettabytes by 2020. To put that in perspective, 44 zettabytes is equivalent to 44 trillion gigabytes. Even with the most advanced technologies today, it is impossible to analyze all this data. The need to process these increasingly larger (and unstructured) data sets is how traditional data analysis transformed into ‘Big Data’ in the last decade.

To illustrate this development over time, the evolution of Big Data can roughly be sub-divided into three main phases. Each phase has its own characteristics and capabilities. In order to understand the context of Big Data today, it is important to understand how each phase contributed to the contemporary meaning of Big Data.

Phase 1.0 – Databases & Warehouses (foundations)

Data analysis, data analytics and Big Data originate from the longstanding domain of database management. It relies heavily on the storage, extraction, and optimization techniques that are common in data that is stored in Relational Database Management Systems (RDBMS).

Database management and data warehousing are considered the core components of Big Data Phase 1. It provides the foundation of modern data analysis as we know it today, using well-known techniques such as database queries, online analytical processing and standard reporting tools.

  • Rooted in relational database management (RDBMS), SQL, ETL, and data warehousing.
  • Techniques: queries, OLAP cubes, scheduled reports.
  • Goal: consistent, structured views for finance, operations, and executive reporting.

Phase 2.0 – The Web & Distributed Data

Since the early 2000s, the Internet and the Web began to offer unique data collections and data analysis opportunities. With the expansion of web traffic and online stores, companies such as Yahoo, Amazon and eBay started to analyze customer behavior by analyzing click-rates, IP-specific location data and search logs. This opened a whole new world of possibilities.

From a data analysis, data analytics, and Big Data point of view, HTTP-based web traffic introduced a massive increase in semi-structured and unstructured data. Besides the standard structured data types, organizations now needed to find new approaches and storage solutions to deal with these new data types in order to analyze them effectively. The arrival and growth of social media data greatly aggravated the need for tools, technologies and analytics techniques that were able to extract meaningful information out of this unstructured data.

  • The 2000s web introduced massive semi-structured and unstructured data (clickstreams, search logs, social media).
  • New approaches emerged: distributed compute (e.g., MapReduce paradigms), NoSQL stores, and large-scale log processing.
  • Goal: mine behaviour at scale and personalise digital experiences.

Phase 3.0 – Mobile, IoT & Real-Time AI

Although web-based unstructured content is still the main focus for many organizations in data analysis, data analytics, and big data, the current possibilities to retrieve valuable information are emerging out of mobile devices.

Mobile devices not only give the possibility to analyze behavioral data (such as clicks and search queries), but also give the possibility to store and analyze location-based data (GPS-data). With the advancement of these mobile devices, it is possible to track movement, analyze physical behavior and even health-related data (number of steps you take per day). This data provides a whole new range of opportunities, from transportation, to city design and health care.

Simultaneously, the rise of sensor-based internet-enabled devices is increasing the data generation like never before. Famously coined as the ‘Internet of Things’ (IoT), millions of TVs, thermostats, wearables and even refrigerators are now generating zettabytes of data every day. And the race to extract meaningful and valuable information out of these new data sources has only just begun.

  • Smartphones add continuous behavioural and location signals; IoT devices stream telemetry from “everything.”
  • Cloud data lakes/lakehouses, streaming pipelines, and ML/AI turn fast, messy data into real-time decisions.

A summary of the three phases in Big Data is listed in the figure below:

 

Phase 1 – Structured Content Phase 2 – Unstructured Web Content Phase 3 – Sensory and Location Content
Period: 1990-2000 Period: 2000-2010 Period: 2010-Present
– RDBMS
– Data warehouses
– Online analytical processing
– Dashboards and scorecards
– Data mining
– Information retrieval and extraction
– Opinion mining
– Question answering
– Web analytics
– Social media analytics
– Social network analysis
– Spacial analysis
– Location aware analysis
– Person-centered analysis
– Context-relevant analysis
– Mobile visualization
– Human-computer interaction

Why This History Matters

Understanding the phases clarifies today’s landscape: we still need the rigour of Phase 1, the scale of Phase 2, and the real-time intelligence of Phase 3. Modern data teams blend all three.

Keep Going With dASCIN

Turn insight into skill with our short, practical, and vendor-neutral certifications.