Big O notation
5 min read

Big O notation

The big O notation is a way to understand and model complexity in algorithms. Let's first explain time complexity.

Time complexity

Time complexity shows how much time a task will take depending on some parameters.
Let's take an example :

def func(array):
    half = len(array)//2
    return array[half]

This function returns the middle element of an array. Seems like an easy task (because it is), but let's measure the time it takes when we give it arrays of different sizes. We'll introduce a wrapper function to take care of measuring the time:

import time

def timeit(f):
    def run(*args):
        t_start = time.time()
        f(*args)
        t_end = time.time()
        print(f.__name__,":", str(t_end-t_start)+"s")
    return run

@timeit
def func(array):
    half = len(array)//2
    print(array[half])

We can then test our function:

x = [1,2,3]
y = [i for i in range(10000)]
z = [i for i in range(1_000_000)]

for i in [x, y, z]:
    func(i)

Our output is:

2
func : 0.00048613548278808594s
5000
func : 0.0009989738464355469s
500000
func : 0.0003273487091064453s

Even though we might be tempted to say that the runtimes are somewhat different, they are actually pretty similar considering the size difference between the three lists. Let's take more measures and graph them to show the constant nature of this function:

import matplotlib.pyplot as plt  

array_size = [t for t in range(1, 10_000_000, 100_000)]
time_taken = list()

for i in array_size:
    test_list = [x for x in range(i)]
    t_start = time.time()
    func(test_list)
    t_end = time.time()
    time_taken.append(t_end - t_start)
    
plt.plot(array_size, time_taken)
plt.xlabel('array size')
plt.ylabel('time taken to find the middle element')
plt.show()

We get this graph:

Which is more or less constant if we disregard the noise. This means that whatever the length of the array is, the task will be completed in a constant amount of time.
There is however an important detail: the creation of the lists does not have a constant time complexity, we can highlight this by changing the code a little bit if we change:

for i in array_size:
    test_list = [x for x in range(i)]
    t_start = time.time()
    func(test_list)
    t_end = time.time()
    time_taken.append(t_end - t_start)

By:

for i in array_size:
    t_start = time.time()
    test_list = [x for x in range(i)]
    func(test_list)
    t_end = time.time()
    time_taken.append(t_end - t_start)

Then we get this graph:

This graph shows a linear time complexity behavior from our code. This means that the time it takes to create a list will linearly increase depending on the size of the list. Linear complexity however is pretty straightforward and self-explanatory but let's check out other time complexities.

@timeit
def func(array):
    for element in array:
        count = 0
        for i in array:
            if element == i:
                count += 1
        if count > 1:
            return True
    return False

This function checks for duplicates in an array, if we dissect the code a bit, we can observe that for each element in the array it checks every element in the same array to check for a match. Now, if we run this function through the same code to graph the time, we get:

Notice I had to reduce the array size, or the program would never end because it would take too much time. We can observe that the time it takes to complete the task goes up faster than for the linear complexity. The time complexity here is proportional to the length of the array squared, we'd write it O(N2), N being the length of the array.
So let's now talk about this notation.

Big O notation

This notation is used to express the time complexity of an algorithm. We saw three examples: first of all, the constant complexity would be written as O(1).
The linear one would be written as O(N), N being the parameter (generally N can denote any parameter, here it is the length of an array).
The last one would be written as O(N2).

Even though I only illustrated three complexities, there are a lot more, for example, O(log(N)), O(Nlog(N)), O(N!), etc... (which are slightly more complex than simple monomial complexity and are out of this scope of this article but still follow the same rules).

Now, the real question is: how do we figure out the complexity of an algorithm? The basic principle is to count how many operations there are (depending on the parameters), but there are some additional rules. For instance, if we take this algorithm:

def func(N:int):
    s = 0
    for i in range(N):
        s += 1

We start by counting the operations: first, we create a variable (1 operation) and give it the value 0 and then we add 1 to this variable N amount of times (N operations), so we have N+1 operations? Well yes but no, because we say that the one operation is negligible compared to the N operations, so it has a complexity of O(N). To be more general we only keep the highest term complexity in the algorithm (this is somewhat similar to the concept of equivalence in mathematics, for example, in a polynomial, as x approaches infinity, you can say that x2+x~x2).
For the example given in this page, this is the order from least to most "powerfull":/p>

  1. O(1)
  2. O(log(N))
  3. O(N)
  4. O(Nlog(N)
  5. O(N2)
  6. O(N!)

To give another example:

def func(N:int):
    s = 0
    for i in range(N):
        s += 1
    for i in range(N):
        for i in range(N):
            s += 1

We might be tempted to say that this algorithm's complexity is O(N2+N), so we actually just write O(N2)

The other rule to remember is to drop all constants, this means you will never see a complexity written as O(3) or O(2N) or O(12N2). To illustrate how that might be confusing, let's take a look at this algorithm:

def func(N:int):
    s = 0
    for i in range(N):
        s += 1
        s *= 2

If we count the operations, we'd come up with 2N+1 operations, so the complexity is written as O(N), not O(N).
Same idea with an algorithm like this :

def func(N:int):
    s = 0
    for i in range(N):
        for i in range(N):
            s += 1
    for i in range(N):
        for i in range(N):
            s += 4

We'd write its complexity as O(N2)

If you want to train, here are some exercises.
If you read until the end, thank you very much and please consider subscribing to my newsletter, it keeps me motivated !