Mastering Python Standard Library: infinite iterators of itertools

Let’s continue our little research of itertools module.

Today we’ll have a look at 3 infinite iterator constructors:

from itertools import count, cycle, repeat

itertools.count

itertools.count - is like a range, but lazy and endless.

By the way, if you have never heard of laziness (well, I’m sure we all heard of it, and moreover, practice it everyday) - then you really should check it out, at least briefly. Someday we will walk the path of David Beazley and his legendary “Generator Tricks For Systems Programmers” in 147 pages, but not today. Today is for the basics.

Well, count is super easy, it just counts until infinity. Or minus infinity, if step is negative.

def my_count(start=0, step=1):
    x = start
    while True:
        yield x
        x += step

That’s it.

But there is a caveat. It never stops, so you can’t “consume” it.

To consume - is to read all iterable at once, for example, to store it in a list.

Well, actually, you can try, but this code line will freeze to death any machine. And yeah, many-many Ctrl+C won’t help. Only hard reset, I did warn you ;)

list(itertools.count())

Then, how am I supposed to work with it, if I can’t call list/set/sum/etc. on it?

First of all, you can iterate over it (and break out - when time comes):

for i in count(start=10, step=-1):
    print(i, end=", ")
    if i<=0: break

# 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0,

Second, some programs never break from endless loop, waiting for something to happen: workers waiting for incoming tasks, http servers waiting for incoming request, etc. But we shall skip this case. For now.

Finally, you can combine infinite iterator with another lazy iterators: map, zip, islice, accumulate, etc.

When iterators like zip or map iterate over multiple iterables at once, they finish when any of iterables finishes. It gives us exit from infinite iterator.

Here is an example from itertools.repeat docs:

list(map(pow, range(10), repeat(2)))
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Our machine is staying alive - although, technically we “consume infinite repeat with list”. Well, range is finite and map finishes together with it.

Infinite iterator rejects its infinity - just to finish together with some finite collection…

Wow! Some serious Highlander & Queen vibe around here …

itertools.repeat

itertools.repeat is even easier, than itertools.count. It doesn’t even count, but simply repeats the same value infinitely. Also, there is a form with fixed amount of repeats.

According to itertools docs, itertools.repeat is roughly equivalent to:

def repeat(object, times=None):
    # repeat(10, 3) --> 10 10 10
    if times is None:
        while True:
            yield object
    else:
        for i in range(times):
            yield object

For “fixed form” and since python generator statements are also lazy, itertools.repeat(42, 10) can be simplified as:

( 42 for _ in range(10) )

For “infinite form”, we can’t simplify it with range, but one can notice, that itertools.repeat equals to itertools.count with step=0.

I guess, repeat and count add a little bit of readability to your code, and they might also be quite faster than python generator statements. However, it is not that easy to test performance of iterators (especially, infinite ones :) ) since they exhaust, and performance test is multiple repetition and comparison.

Still, let us try:

In [49]: i1 = lambda: ( 42 for _ in range(100000) )

In [50]: i2 = lambda: repeat(42, 100000)

In [51]: %timeit sum(i1())
3.49 ms ± 36.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [52]: %timeit sum(i2())
333 µs ± 1.27 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

itertools.repeat seems to be 10 times faster!

By the way, do you think that performance test with “lambda-style factory” is valid and comparison is correct?

Wait, what do you mean by “exhaust”?

If you are confused with “exhaust” in the previous section - then I’ll show you only this …

In [3]: i = ( x for x in range(10) )

In [4]: sum(i)
Out[4]: 45

In [5]: sum(i)
Out[5]: 0

… and strongly encourage you to dive into Python Functional Programming HowTo

itertools.cycle

Endless cycle over iterable. As simple as that:

# cycle('ABCD') --> A B C D A B C D ...

def my_cycle(iterable):
    while True:
        yield from iterable

Despite its simplicity, it is very convenient.

I really love to rotate proxies/useragents/etc with itertools.cycle for regular parsing/scraping of web pages.

For instance, you can define some “global” iterators:

PROXY_CYCLE = itertools.cycle(proxy_list)
UA_CYCLE = itertools.cycle(ua_list)

And each time you need to make a new request, you just ask “global” iterators for new proxy/ua values with next:

proxy = next(PROXY_CYCLE)
ua = next(UA_CYCLE)

It turns out as a distributed iteration from different places of the program at the same time. But iterator is shared. Iterator as a service, huh.

It’s like we defined a class ProxyManager with method ProxyManager.get, which handles proxy rotation and selection. But instead of class we have itertools.cycle, and instead of get - we have next, instead of 10 code lines - only 1. So do we really need to define a class? :)

That’s all, folks!

Thank you for reading, hope you enjoyed! Consider subscribing on DEV - we shall go deeper :)

Anything else to read?

Always.

Python Functional Programming HowTo

For bravehearts

Of cource, itertools module docs