neutralrobotboy (neutralrobotboy) wrote,

some results for stackless python vs. normal python

below, i discuss some of my results coding-wise and profiling-wise for 'normal' python code vs. stackless.

exhibit a (stackless dir crawling):
    import stackless
    def dir_crawl(d):
        print d
        dirlist = eval('dir(' + d + ')') #string passed as argument
        if ('__doc__' in dirlist):
            print '--------------------'
            print d, 'DOCSTRING:\n'
            print eval(d).__doc__
            print '--------------------\n\n\n'
        for x in dirlist:
            if (not '__' in x) and (not '-' in x) and (x != d):
                stackless.tasklet(dir_crawl)(d + '.' + x)

i wrote the above code because stackless doesn't come with documentation other than the normal python documentation. so this code will crawl through a module or class and give you a full listing of its attributes and their docstrings if they have one.

one amusing thing that came up in writing this is that the stackless module is self-referential, so trying to crawl through it gives an endless list of "stackless.stackless.stackless.stackless.stackless"...etc. which would have crashed a normal recursive algorithm, but stackless just kept going until i did a ctrl+break and changed the code to include that handy little 'and (x !=d)' you see there.

this code did one thing that kinda bugged me, though: it didn't give me everything in the order i expected (and would have gotten from a normal recursive python implementation). so i decided to play around with stackless, figuring out what it would take to get everything in the 'right' (expected) order. hence

exhibit b (stackful list crawling):

    listy = [[1,[2,3]],[[4,5],6],[[7],8,9]]
    end_list = []
    def stack_crawl(l):
        for x in l:
            if isinstance(x, list):
    print end_list

i decided to write a 'normal' list-crawling algorithm that would print the above numbers in the expected order (1-9), so that i'd have something to compare stackless to. to my mind, the above code is readable, concise, and perfectly logical. but i guess i'm used to doing things the 'normal' way.

when you run this, you get the expected output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

so what happens if we just implement this in stackless? from my little dir_crawl function, we should expect things to be out of order, but it's worth testing at least, right? enter

exhibit c (basic stackless re-working):
    import stackless
    listy = [[1,[2,3]],[[4,5],6],[[7],8,9]]
    end_list = []
    def crawl(l):
        for x in l:
            if isinstance(x, list):
    print end_list

ok, so this is the first thing to try. it's basically the previous algorithm broken into tasklets. doing this means that instead of traditional recursion, each new call to the function will simply add a tasklet to stackless's scheduler. this means no stack in the traditional sense. this affords some benefits, but for our purposes, there is the expected drawback. here's the output:

[1, 6, 8, 9, 2, 3, 4, 5, 7]

suddenly the algorithm is breadth-first, and we want depth-first. it works out this way because of stackless's scheduler. in this case, each tasklet completes in turn, rather than the first waiting on the second to return, etc.

the benefits will be discussed a little bit later. first, let's take a stab at getting it right using tasklets and channels in

exhibit d (getting it right in stackless):
    import stackless
    def crawl(l, n):
        for x in l:
            if isinstance(x, list):
                stackless.tasklet(crawl)(x, n+1)
                while chan.receive() > n:
        chan.send_sequence(n-1 for i in xrange(n))
    listy = [[1,[2,3]],[[4,5],6],[[7],8,9]]
    end_list = []
    chan =
    statement = t = stackless.tasklet(crawl)(listy, 0)
    print end_list

now this is starting to look convoluted to me, but it gives the expected output of a list containing 1-9 in order. i'm not totally convinced this is the right way to do it, but the principle is similar to avoiding recursion in C/C++ using 'for' statements with a value of 'i' that gets incremented with each level of depth and decremented at the end of each iteration (a technique some savvy people use to avoid potential stack overflow problems).

instead of using a 'for' statement, however, we've used channels. the inherent problem here is that the receive()er that needs to hear when the last level of depth returned is not guaranteed to be the next to receive() on the scheduler, and only one receive()er can get the message from a send().

my solution (and i hope to god it's not the best one for reasons i'll mention below) was to use the channel.send_sequence() function to send the same message to all listening tasklets. i arrived at this after some frustration, since i'm not entirely familiar with all of the ins and outs of stackless yet, but thankfully i had the handy output from my docstring crawler (exhibit a) to help me out. i also tried the creation of a new channel for each iteration, but after a little time spent doing that, it seemed that this method would be simpler to write.

ok, so now we have two algorithms we can compare at least, to give us some idea of what's going on. i did some basic tests using python's 'timeit' module. here's what i got:

#4.66379394371 for 10,000 passes

#4.65859911321 for 10,000 passes

this is using the list "listy" above. as you can see, the stackless algorithm is slightly slower, but the difference isn't really too horrible.

the next step, though, was to look at a list with much greater depth and see how the two compared in processing that. that's why we have

exhibit e (quick algorithm to make a list with much greater depth):
    big_list = []
    for x in xrange(5000):
        temp = big_list
        for y in xrange(x):
            temp = temp[-1]

i dunno if there's a 'better' way to do this or not, but i don't think it really matters for now, it was the first thing that popped into my head. it creates a list with a depth given by the value passed to xrange(). so, for 5,000 (as above), it makes a list with 5,000 lists nested inside one another. to be clear, given a value of 5, big_list becomes:

[[0, [1, [2, [3, [4]]]]]]

with this, we can start to put recursion to the test. i don't know exactly where the cutoff is, but certainly for any value over 1,000 when creating big_list, the stackful recursive function of ours fails. we get some ridiculous repetetive traceback followed by: 'RuntimeError: maximum recursion depth exceeded'.

so, stack_crawl() fails entirely for high values. at a depth of 5,000 (as above), crawl() yields (when put through timeit):

#stackless (depth: 5000)
#5.6066390925 for 1 pass

over 5 seconds? jeez, man... it must be because of the way we have to cycle through all those listeners with each level of depth. the algorithm above scales horribly. by the end, we have xrange(5000) being used to send_sequence() to everyone. ouch. and it really shows with the difference in performance when they can be directly compared:

#stackless (depth: 500)
#58.6125148907 for 1,000 passes


#stackful (depth: 500)
#0.65360185753 for 1,000 passes

almost a minute? compared to 0.65 of a second? holy hell. there's got to be a better way!

as a point of interest, 'exhibit c', when given these same conditions yields:

#exhibit c (depth: 500)
#0.989843840955 for 1,000 passes

still losing out to the normal recursive algorithm.

so, i'm sure there is a better way to do this, but as a starting point, these are my personal results with stackless vs. normal python for this particular task. i thought this was interesting, but it turns out i was wrong. there went your life, sucker!

hot off the presses!

exhibit f(probably the right way to do depth-first in stackless):
    def crawl(l):
        for x in l:
            if isinstance(x, list):
    end_list = []

this yields the desired result, is readable and concise, and doesn't have an outrageous amount of overhead. timeit results:

#exhibit f (depth: 500)
#1.2240577246 for 1,000 passes

this is still roughly twice the time needed for the normal recursive version, but considering the fact that this is probably a bit more than half of what the stackful version can handle AT ALL, this doesn't seem really outrageous.

other results:

#exhibit f (depth: 5000)
#0.0123416283771 for 1 pass

clearly, this is a HUGE improvement over exhibit d, which was obviously a result of inexperience with stackless.

of course, the real thing to do now would be to reimplement this stacklessly using more traditional means (probably within a while loop, as a python for loop would likely be terribly inefficient due to the necessity of an iterable) and compare those results. in all likelihood, this is not really a circumstance where the functionality of the stackless module is strictly needed or useful anyhow. still! it's one way of getting a comparison of sorts, for whatever that's worth.
Tags: coding, stackless
  • Post a new comment


    default userpic