def locate_max(list_of_values):
temp_max = list_of_values[0]
temp_max_loc = 0
n = len(list_of_values)
for i in range(1,n):
if list_of_values[i]>temp_max:
temp_max = list_of_values[i]
temp_max_loc = i
else:
pass
return temp_max, temp_max_loc 4 Complexity of computing
When dealing with big data, we have to consider the following questions when it comes to running code:
Will my computer be able to store everything in RAM that it needs to in order to perform the calculation?
Will the calculations finish before my next assignment is due, or at least before the heat death of the universe?
These are questions of space complexity and time complexity. Computer scientists use what is called Big-O notation to describe the space and time complexity of different algorithms. However, before we get technical, we should identify what sorts of things take time and space when working with data. I’ll use big-O notation below, but explain it later.
If there are \(n\) rows in a dataset, and each row contains \(d\) (numeric) columns,
4.0.1 What takes space?
Storing the data frame with \(n\) rows and \(d\) columns is \(O(nd)\)
Saving temporary value(s) that are only updated is \(O(1)\)
Creating a new column in a dataframe is \(O(n)\)
Computing/storing a paired distance matrix is \(O(n^2)\).
4.0.2 What takes time?
Accessing a single entry in a list if we know its location, or accessing a single dictionary item, is \(O(1)\).
Iterating through every row in a dataset (e.g. to find the maximum of a single column) is \(O(n)\)
Computing the distance between a single (new) datapoint and each of the previous datapoints is is \(O(nd)\)
Computing the paired distance matrix is \(O(n^2d)\)
Repeating an operation until an algorithm stops (e.g. in k-means) cannot necessarily be expressed in this manner so easily; you’d typically express how much the time complexity is in each step.
4.0.3 How do we use Python to time things?
The timeit module in Python is the standard way to time things. To see how much time a single chunk of code took to run, use the following.
You may notice this does not always print the same amount if you run it again and again with the same code. To get a more consistent value, you might choose to use the timeit.timeit function. However, this should only be used for tiny bits of code, since it runs the same snippet many, many times.
4.0.4 How to access how much memory an object takes?
The amount of space, in bytes, that an object takes up on your computer can be measured with sys.getsizeof(). A byte is the smallest unit of memory in a computer. To see how much space an object take,
4.1 Comparing some of our algorithms
Consider the locate_max function. The way we have written it is, all things considered, quite efficient.
4.2 Examples of algorithms
- See Class Notes!