4  Complexity of computing

When dealing with big data, we have to consider the following questions when it comes to running code:

These are questions of space complexity and time complexity. Computer scientists use what is called Big-O notation to describe the space and time complexity of different algorithms. However, before we get technical, we should identify what sorts of things take time and space when working with data. I’ll use big-O notation below, but explain it later.

If there are \(n\) rows in a dataset, and each row contains \(d\) (numeric) columns,

4.0.1 What takes space?

  • Storing the data frame with \(n\) rows and \(d\) columns is \(O(nd)\)

  • Saving temporary value(s) that are only updated is \(O(1)\)

  • Creating a new column in a dataframe is \(O(n)\)

  • Computing/storing a paired distance matrix is \(O(n^2)\).

4.0.2 What takes time?

  • Accessing a single entry in a list if we know its location, or accessing a single dictionary item, is \(O(1)\).

  • Iterating through every row in a dataset (e.g. to find the maximum of a single column) is \(O(n)\)

  • Computing the distance between a single (new) datapoint and each of the previous datapoints is is \(O(nd)\)

  • Computing the paired distance matrix is \(O(n^2d)\)

  • Repeating an operation until an algorithm stops (e.g. in k-means) cannot necessarily be expressed in this manner so easily; you’d typically express how much the time complexity is in each step.

4.0.3 How do we use Python to time things?

The timeit module in Python is the standard way to time things. To see how much time a single chunk of code took to run, use the following.

You may notice this does not always print the same amount if you run it again and again with the same code. To get a more consistent value, you might choose to use the timeit.timeit function. However, this should only be used for tiny bits of code, since it runs the same snippet many, many times.

4.0.4 How to access how much memory an object takes?

The amount of space, in bytes, that an object takes up on your computer can be measured with sys.getsizeof(). A byte is the smallest unit of memory in a computer. To see how much space an object take,

4.1 Comparing some of our algorithms

Consider the locate_max function. The way we have written it is, all things considered, quite efficient.

def locate_max(list_of_values):
    temp_max = list_of_values[0]
    temp_max_loc = 0 

    n = len(list_of_values) 
    
    for i in range(1,n): 
        if list_of_values[i]>temp_max: 
            temp_max = list_of_values[i]
            temp_max_loc = i 
        else: 
            pass 
            
    return temp_max, temp_max_loc 

4.2 Examples of algorithms

  • See Class Notes!