Mastering Python Standard Library: itertools.chain
Imagine, you need to iterate over some N iterables.
For example, you have two lists: l1 and l2.
Here is the easiest way to do so:
However, it may not be the best one. l1+l2
statement is a list concatenation, and that give you a new list with len(l1+l2) == len(l1) + len(l2)
. If you positive that both lists are rather small, then it’s kinda okay.
But, let us assume they are each of 1GB in RAM. At peak, your program will consume 4GB, twice the size of input lists. And what if you don’t have much RAM? - maybe your code is in AWS Lambda, etc.
Actually, we want to do something like this:
No new lists, no copies, no memory overhead. Just iterate over the first list and then iterate over the second one.
And that gen
iterator is already coded for you, and also known as itertools.chain
By the way, there is another form of itertools.chain
, itertools.chain.from_iterable
. It does absolutely the same, except input arguments unpacking:
So, in general:
Why there are 2 chains, with one tiny “*” difference? I really don’t know - but who am I to judge authors of itertools module, they are true gods.
But I do know, that “entities should not be multiplied beyond necessity”. And this thought brings us back to our unnecessary extra list creation issue.
So what’s the point?
Well, use chain! Learn itertools
module. Think about performance. Save the memory, in production environment it is actually limited and not really cheap!
Anything else to read?
Sure.
Whole lotta docs - Master the power of standard library!
Itertools module docs - chain is not the only one, there are plenty more
Occam’s Razor - really, read it