Mastering Python Standard Library: itertools.chain

Imagine, you need to iterate over some N iterables.

For example, you have two lists: l1 and l2.

In [2]: l1 = list(range(5))
In [3]: l2 = list(range(10))

In [4]: l1
Out[4]: [0, 1, 2, 3, 4]

In [5]: l2
Out[5]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Here is the easiest way to do so:

for i in l1+l2: print(i, end=", ")
# 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,

However, it may not be the best one. l1+l2 statement is a list concatenation, and that give you a new list with len(l1+l2) == len(l1) + len(l2). If you positive that both lists are rather small, then it’s kinda okay.

But, let us assume they are each of 1GB in RAM. At peak, your program will consume 4GB, twice the size of input lists. And what if you don’t have much RAM? - maybe your code is in AWS Lambda, etc.

Actually, we want to do something like this:

def gen(l1, l2):
    yield from l1
    yield from l2

for i in gen(l1,l2): print(i, end=", ")
# 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,

No new lists, no copies, no memory overhead. Just iterate over the first list and then iterate over the second one.

And that gen iterator is already coded for you, and also known as itertools.chain

import itertools

for i in itertools.chain(l1,l2): print(i, end=", ")
# 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,

By the way, there is another form of itertools.chain, itertools.chain.from_iterable. It does absolutely the same, except input arguments unpacking:

for i in itertools.chain.from_iterable([l1, l2]): print(i, end=", ")
# 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,

So, in general:

# this is itertools.chain
def my_chain(*collections):
    for collection in collections:
        yield from collection

# this is itertools.chain.from_iterable
def my_chain_from_iterable(collections):
    for collection in collections:
        yield from collection

Why there are 2 chains, with one tiny “*” difference? I really don’t know - but who am I to judge authors of itertools module, they are true gods.

But I do know, that “entities should not be multiplied beyond necessity”. And this thought brings us back to our unnecessary extra list creation issue.

So what’s the point?

Well, use chain! Learn itertools module. Think about performance. Save the memory, in production environment it is actually limited and not really cheap!

Anything else to read?

Sure.

Whole lotta docs - Master the power of standard library!

Itertools module docs - chain is not the only one, there are plenty more

Occam’s Razor - really, read it