1  Introduction: Big Data

1.1 What is big data?

Big data refers to datasets that are so large – and possibly messy – that computers are needed to process them and visualize them. Often, they are so large that traditional processing methods can’t manage them – we will not work with data that is that big in this class. This class is about principles we need in order to understand the issues when working with big data, and some skills that should transfer to many other contexts. This class is more about the data science of big data than the mechanics of managing large databases.