Thursday, October 16, 2008

Reasons for loving FLOSS, #<fairly large number>

As I've mentioned before, I've been poking at a gmail backup application based off libgmail. It's been making slow progress, partly because I've been distracted by other things (such as Project Euler), but it hasn't completely stalled.

Recently, while fiddling with it, though, I hit a snag - libgmail would throw an exception when trying to access certain threads. After much head scratching, and testing, I eventually twigged that the problematic threads all included messages sent via google chat. A little focussed debugging showed that gmail creates pseudo mail messages for these when accessed via the API libgmail uses that don't quite match the messages libgmail expects. A little crude patch later, and the problem was fixed for me.

Thus armed, I submitted a bug to the Debian bug tracker, at around 19:00 local time. At 21:00, I received a acknowledgement from the maintainer, who had forwarded upstream, and at 21:10, I received a note that a patch had been applied upstream. Just over 2 hours from bug report to upstream fix strikes me as pretty good going.

Saturday, October 11, 2008

Adding periodicity

Still generating complete sequences, so linear growth is expected.


Found period 32, difference 126
time to generate 200000 numbers (U (2, 5)) is 0.37129
time to generate 2000000 numbers (U (2, 5)) is 3.66487
time to generate 20000000 numbers (U (2, 5)) is 38.38641
Found period 26, difference 126
time to generate 200000 numbers (U (2, 7)) is 0.36446
time to generate 2000000 numbers (U (2, 7)) is 4.06793
time to generate 20000000 numbers (U (2, 7)) is 44.24620
Found period 444, difference 1778
time to generate 200000 numbers (U (2, 9)) is 0.40023
time to generate 2000000 numbers (U (2, 9)) is 5.73586
time to generate 20000000 numbers (U (2, 9)) is 38.33744
Found period 1628, difference 6510
time to generate 200000 numbers (U (2, 11)) is 0.41977
time to generate 2000000 numbers (U (2, 11)) is 3.65161
time to generate 20000000 numbers (U (2, 11)) is 37.85133
Found period 5906, difference 23622
time to generate 200000 numbers (U (2, 13)) is 0.65191
time to generate 2000000 numbers (U (2, 13)) is 3.69802
time to generate 20000000 numbers (U (2, 13)) is 36.82868
Found period 80, difference 510
time to generate 200000 numbers (U (2, 15)) is 0.36584
time to generate 2000000 numbers (U (2, 15)) is 3.61351
time to generate 20000000 numbers (U (2, 15)) is 37.08529
time to generate 200000 numbers (U (2, 17)) is 4.52874
Found period 126960, difference 507842
time to generate 2000000 numbers (U (2, 17)) is 7.54118
time to generate 20000000 numbers (U (2, 17)) is 39.30235
time to generate 200000 numbers (U (2, 19)) is 4.54097
Found period 380882, difference 1523526
time to generate 2000000 numbers (U (2, 19)) is 35.07058
time to generate 20000000 numbers (U (2, 19)) is 36.92766
time to generate 200000 numbers (U (2, 21)) is 4.10050
time to generate 2000000 numbers (U (2, 21)) is 43.80197
Found period 2097152, difference 8388606
time to generate 20000000 numbers (U (2, 21)) is 456.09959


There's a bunch of improvements that can be made to the code - especially the periodicity check is not that efficient, and tends to repeat work which could be avoided. This bites for the U(2, 21) sequence, for which the direct approach presented earlier can calculate 20000000 elements in around 290 seconds, but the overall problem is now something that can be solved in "reasonable" time periods

Presented without further comment.


# time python ./solve_167.py
Answer = <answer redacted, but confirmed correct>

real 7m15.052s
user 6m41.213s
sys 0m15.365s

Friday, October 10, 2008

This may actually be feasible now

With CTPUG 14 in the near future, and two failed attempts to tackle the Ulam seuqence problem from the previous meetings, I decided to give the problem some thought today.

So, from today's bright idea,:

$ python ./Ulam.py
time to generate 100 numbers (U (2, 5)) is 0.00426
time to generate 1000 numbers (U (2, 5)) is 0.03167
time to generate 10000 numbers (U (2, 5)) is 0.22796
time to generate 100000 numbers (U (2, 5)) is 1.44548
time to generate 1000000 numbers (U (2, 5)) is 14.71493
time to generate 100 numbers (U (2, 7)) is 0.00214
time to generate 1000 numbers (U (2, 7)) is 0.01558
time to generate 10000 numbers (U (2, 7)) is 0.15068
time to generate 100000 numbers (U (2, 7)) is 1.43910
time to generate 1000000 numbers (U (2, 7)) is 14.62419
time to generate 100 numbers (U (2, 9)) is 0.00245
time to generate 1000 numbers (U (2, 9)) is 0.01770
time to generate 10000 numbers (U (2, 9)) is 0.15190
time to generate 100000 numbers (U (2, 9)) is 1.45868
time to generate 1000000 numbers (U (2, 9)) is 14.93112
time to generate 100 numbers (U (2, 11)) is 0.00211
time to generate 1000 numbers (U (2, 11)) is 0.01602
time to generate 10000 numbers (U (2, 11)) is 0.15409
time to generate 100000 numbers (U (2, 11)) is 1.49055
time to generate 1000000 numbers (U (2, 11)) is 15.08445
...
time to generate 100 numbers (U (2, 21)) is 0.00486
time to generate 1000 numbers (U (2, 21)) is 0.03741
time to generate 10000 numbers (U (2, 21)) is 0.25073
time to generate 100000 numbers (U (2, 21)) is 1.78683
time to generate 1000000 numbers (U (2, 21)) is 17.9669

(Sequences U(2,5) and U(2,7) verified against the On-Line Encyclopedia of Integer Sequences (to 1000 elements))

The period detection and related footwork is still missing, but with a linear time growth, the problem looks much more doable.

So, the bright idea:

Looking at various articles, it's mentioned several times that Scherml and Speigel proved that "The sequence (2, v) for odd v >= 5 had precisely 2 even terms".
It can also be shown that the second even number occurs fairly early in the sequence.

Since we're dealing with sequences of that form, after the initial terms to reach the second even term, the candidate numbers we're dealing with must be odd. Generating the new candidates is thus trivial, and, with a little housekeeping, the list of candidates to consider can be kept short on every loop.

I'm not sure whether to feel chuffed about spotting this, or annoyed that I didn't spot this earlier.

Wednesday, October 8, 2008

Paging Captain "Ya know, that's really pretty obvious"

Occasionally, it's useful to be able to write a one-line script (it's especially handy for pasting into IRC channels). Python's one-line capabilities are somewhat limited though. Statement lists (see the offical grammer) are always terminated by a newline. Thus, it's impossible to directly express the following snippet as a single line:

z=0
for x in range(5):
   z += x
print z


Of course, there are a vast number of ways to achieve the same effect using the constructs that can be placed on a single line, and, other than showing off, there's usually very little need to coerce something to be a single line.

Today, however, it occurred to me that it's also possible to abuse eval for this (I don't know why this only struck me today), leading to:

python -c 'eval(compile("z=0\nfor x in range(5):\n z+=x\nprint z", "test", "exec"))'


While not exactly readable, I think combination of eval, compile + a one-liner is just too amusing to ignore. I'm sure that eventually, somewhere, I'll be able to convince myself that this trick will be just what I need.