Documenting, Documenting, Documenting

. . . Turns out to be really boring. And because I was too busy getting working software to document as I went, I’m stuck doing it after the fact and taking crazy amounts of time to get it done. So let the lesson be learned by someone else: document as you go to the extent possible.

Some Forward Progress, a Quandary

As for forward, progress: Another couple of functions got written up these last couple days.

As for a quandary, I’m still trying to get a feel for how detailed my writeups on the more elementary functions need to be. I could cheat and rewrite the package to make those user-inaccessible, but I prefer to put them out in the open . . . at least until the next major code revision. My big issue is with the Monte Carlo integration scheme I implemented. It’s not really a hard thing for anyone who’s working in this area before. So unless I get feedback that everything needs a very detailed writeup I’m going to leave it at a lower level of detail.

Progress Update

Still cranking through on the intro paper . . . one function at a time in no particular order. Turns out it’s been easiest just to start with each of the script files in alphabetical order and knock out each user-accessible function therein. I can always organize them later into a more intelligent order.

Each function gets a brief description followed by a the syntax and finally a discussion of design considerations and any particular algorithm I used therein. Only a few of them really require that level of detail, but IMA, BCIMA, and ABCIMA will each get very detailed discussions since that’s where the meat of the project is. On the other hand, things like computing distances, intersections of circles, etc, are quite light on details of implementation because I expect people to understand Euclidean distance, how to find the intersections of circles, and other basic stuff to the point where all I have to say is something like “this function computes the Euclidean distance between two points in the plane” and everyone gets it. Why it makes sense to manipulate eigenvalues of the variance-covariance matrix of the candidate distribution to maintain a proper minimum amount variation in the candidate points is a different subject entirely.

On a side note, that and other neat little statistics and R hacks and tricks might make a good page for this blog . . .

How the Writing Gets Done

Sporadically. Read more »

Today’s Progress on the Paper

It’s pretty simple:

  1. I redid a chunk of the section about modeling SPECT.
  2. I figured out how to lay out all the explanation of how the various functions work.
  3. I realized I’ll need to lay out all the data classes used in the package since a user might generate one based on a real work dataset vice something pre-generated or otherwise generate a test set of some sort.
  4. I wrote up the explanation for how abcMetropolis works and set up to write up findArcVertices.

Writing the Intro Paper

I’ve started writing the package intro paper. Some of it (just a touch) I was able to reuse from a previous status report. However much of it is going to need a complete rewrite in order to get it right.

Goals:

1) Review the explaination of all the modeling and make sure it’s completely clear. Insert the odd image as necessary to get this done.

2) Start top to bottom with documenting all the functions in the package. This will probably have to get done from scratch (since I changed a bunch of things since the last time I tried to write this kind of paper). However it’s more important to get each of them done so then I can paste the work into the .Rd files in the package. Better to overkill now and cut down for the package’s documentation files.

Documentation

Now that ABCIMA seems to work, I’m starting to work on all the .Rd files required to give the package to someone else and have them be able to figure out what’s going on. This will doubtless take many hours as there are many functions documented in the code but not elsewhere.

Once that’s done (or perhaps as I get bored doing it) I’ll start working on an introductory paper/guide to go along with it. More than a vignette, but less than a full-blown manual, it’s going to need to walk someone through all the options and how they interact.

All that and I need to script the experimental runs in order to automate the data collection and analysis. So much to do.

Documentation and CVS

Two new goals have emerged now that I have successfully (as far as I can tell . . . ) the ABCIMA code. These are writing all the documentation and running the simulations I’ll need in order to both refine the inflater used in the Adaptive parts of the ABCIMA algorithm and to compare IMA to BCIMA to ABCIMA. So far I’d say that ABCIMA might be the best in theory, but I’m having a hard time justifying the longer run times if 20+ minutes turns out to be normal.

A Small Update: ABCIMA Seems to Work

So other than it taking forever on my lappy to get the runs done, it looks like the changes suggested last time work well when coupled with a slightly larger reserve variance. I upped it by a factor of ten because some runs had adaptations which collapsed too fast because the chain would get stuck for 50 points and thus everything would go haywire.

All in all things look better now.

Run Time Issues

Okay, I should say I mean this not quite in the normal programmer sort of way. I am worried running a reasonable number of simulations is going to take weeks of computing time on my poor three year old 2.2 GHz AMD Turion 64 laptop.

Part of me (the part which knows the software is single thread mathematics) wonders if I should just go find some 3.x Ghz Pentium 4 box with just enough hard drive and RAM to install Xubuntu and R and then simply run it on there expecting (yes, with massive power consumption) the kind of speed increase which would justify it.

On the other hand, it would be even better to run it on about ten machines in parrallel and then move all the individual datasets to one file. Too bad I’m not in college anymore and ergo don’t have free access to ten desktops for a weekend at a time.

Next Page »