Author Archives: Mike Sokolov

Multithreaded testing with JUnit

This post demonstrates an easy way to test the effect of multiple concurrent threads running your software, using your existing unit tests.

Have you ever wondered whether your software is really thread safe? Whether it will stand up to the punishment of thousands of concurrent users when your site (or app) goes ballistic? Multi-threaded programming is notoriously difficult, and running software in a environment (like a web application server) that spawns multiple threads can often expose architectural problems that don’t arise in typical test scenarios. Running a set of tests in parallel is a good way to gain confidence that your code is thread safe.

Another benefit of running multiple tests at once is: they run faster! All modern computers have multiple cores, and many have multiple CPUs: when we run our tests single-threaded, we aren’t making use of all of that latent power we have just lying around.

Let’s say you have a test class called MyTestClass, and it defines a number of tests. Using the test runner we provide, you can run all of its tests in parallel by adding a single annotation (the standard org.junit.runner.RunWith annotation that comes with JUnit) to your class:

@RunWith (MultiThreadedRunner.class)

This runner plugs in to the JUnit framework by subclassing BlockJUnit4ClassRunner; this is the class that usually runs all tests from a test class. The runChild() method is called for each test that is run; we take that over and arrange for each test to run in its own thread. We also want to ensure that not too many threads run at once: each thread consumes memory, and, depending on the size of the test class, we may end up running hundreds of threads at once if we’re not careful, and run out of memory. Here is the code for runChild, which simply waits until there are fewer than maxThreads tests running, and then creates a Runnable called Test which actually runs the test:

    protected void runChild(final FrameworkMethod method, final RunNotifier notifier) {
        while (numThreads.get() > maxThreads) {
            try {
                Thread.sleep(25);
            } catch (InterruptedException e) {
                System.err.println ("Interrupted: " + method.getName());
                e.printStackTrace();
                return; // The user may have interrupted us; this won't happen normally
            }
        }
        new Thread (new Test(method, notifier)).start();
    }
 

Note that we keep track of the number of threads in a variable called numThreads. That is an AtomicInteger, which is a thread-safe primitive built into the standard JRE. We use it to ensure that the thread count isn’t updated simultaneously in two threads. Here is the core of the code for the Test class:

        public void run () {
            numThreads.incrementAndGet();
            MultiThreadedRunner.super.runChild(method, notifier);
            numThreads.decrementAndGet();
        }

All this does is keep track of the number of running threads. I haven’t shown the constructor and members used to track the test method and notifier, but as you can imagine, that is just straightforward copying of variables.

The only slight complication with using the code as shown so far is that JUnit will finish before all the tests do. It’s necessary to make the “outer loop” in the test runner wait until the last test has completed before it exits. Usually this happens implicitly, because tests are all run in the same thread, but now, when the runner starts a test, it returns immediately, while the test is still running. To solve this problem, we need to override the childrenInvoker method.

protected Statement childrenInvoker(final RunNotifier notifier) {
        return new Statement() {
            public void evaluate() throws Throwable {
                MultiThreadedRunner.super.childrenInvoker(notifier).evaluate();
                // wait for all child threads (tests) to complete
                while (numThreads.get() > 0) {
                    Thread.sleep(25);
                }
            }
        };
    }

Couldn’t be simpler: just call the super method to do all the real work, and then wait until there are no more test threads running before returning. Does anybody see the potential race condition here? It’s possible that when the last test is run, childrenInvoker will return and test numThreads before that last test has a chance to increment it. In practice this doesn’t seem to happen since there will generally be several threads running already when the last test is started, but just to be safe, it is better to increment the threadCount in the main runner thread, just before calling Thread.start(), and then to decrement it in the child thread, just before exiting. The attached file has that change.

Download the source code here:
MultiThreadedRunner

Note that this code is distributed under the Mozilla Public License (2.0), which basically says you can use this code freely, embed it in your software, and even redistribute it, as long as it retains its license and attribution (includes the comments it has now saying who wrote it), and as long as any changes you make to the software are distributed under the same terms: also I encourage you to post any changes you make here so I can incorporate them.

linux firefox ipv6 snafu: FIXED!

I post this in the hope that it will save someone the headaches I suffered:

I found that certain web sites were unusable (super slow – minutes to load) in firefox on linux, although retrieving pages from the command line w/say wget worked just fine – python.org, for example, and the firefox addons site. It turns out that DNS records for these sites include ipv6 addresses, and I guess we can’t route ipv6 packets properly? I was able to fix the problem by disabling ipv6 on my box:

in /etc/modprobe.d/blacklist I added
blacklist ipv6

a foot of water in the well

Like many of my neighbors, I came home to a basement full of water last night. First thoughts – all the stuff that would be ruined; we quickly abandoned the pile of childcraft books, the old rugs, some boxes of breakfast cereal that hadn’t been put away yet. But my tools, some lumber, the leaf for the dining room table: it would be a shame if they were destroyed. We lifted them up out of the drink onto plastic tubs, tables, whatever. Later we’d assess the damage.

Then, a bucket brigade. My girls chipped in mightily: one in the basement, filling buckets with a bailer from the boat; one carrying up the stairs, and I was the anchor leg, carrying and heaving out into the street, while it continued to rain and sleet.

Finally the water was down to the well: a little-used foot-deep pit in the basement where the sewer pipe can just be seen poking up from the surrounding dirt. “Just five more minutes, girls!” I encouraged them. We got the water down about six inches. OK, time to eat some hot stew.

Sadly, after dinner, we found the water had risen almost to the top of the well again. Calls to all the late-night, big-box, home improvement stores were met with “Hello, this is Home Depot. We are all out of sump pumps, in Massachusetts, Rhode Island and New Hampshire.” I guess we weren’t the only ones. I pressed a little electric pump from the dehumidifier into service; it produced a thin flow, but seemed to be just holding its own: the water level held.

Update: this morning the weather is fine: sunny and dry. The water is receding. Time to begin the cleanup!

Balisage 2009 trip report

Balisage (once Extreme markup) seems to be the Summer conference of choice for schema architects, developers, visionaries and markup philosophers: in short, those laboring on the underpinnings of XML and related technologies. Conference literature calls us “markup geeks.” Mere users of these technologies might find this conference heavy sledding (it is quite technical), and marketers should be aware that this is not a trade show: there are few opportunities for promoting goods and services. However, idea-pushers will find a welcoming audience at Balisage.

This year, I attended Balisage for the first time, and I found it a highly rewarding experience. I came away full of ideas for new ways to work, new tools to use, and new directions in which to push our XML-based publishing practice. In addition I made a passel of new friends: The community that has grown up around this conference is very welcoming, and surprisingly so, given how close-knit it seems to be. The conference is well-organized and run by folks from Mulberry Technologies and friends, with a wide array of reviewers vetting papers and presentations. My one complaint is that I was unable to remain for the full schedule and missed some very exciting-looking presentations on Thursday and Friday.

Disclaimer: I was unable to see every talk, so this report represents merely some highlights, as I saw them. This report is biased and unfair, but honest and without the intention of brutality.

Day 1

On Monday, Michael Kay of Saxonica chaired the Symposium on Processing XML Efficiently, a separate but related series of presentations that shared the same logistical backbone with the main Balisage conference.

The surprise hit of this session was Rob Cameron’s presentation on Parallel Bit Stream Technology. At first, I had no clue what was going on in this talk. The basic idea seemed clever: rotate your bytes sideways, packing all the first bits into a wide (64-bit or wider) word, all the second bits into the next word, and so on. But what was the point? When Cameron demonstrated that some simple addition and shifting operations led to a bit pattern that encoded the positions of all the valid XML numeric entities, the “aah” of dawning comprehension rippled through the audience. He went on to show how this could be used to jump-start a tokenizer and accelerate XML parsing, resulting in impressive speedups in a core technology. There are some remaining challenges, though: to make these results available via Java will require some tricky bridgework.

I was curious to hear about the internals of our competitor, Highwire’s, much-ballyhooed H2O platform, presented by James Robinson. Given the context of the conference, it was no big surprise that the workhorse of H2O (Firenze) is an XSLT-based rendering pipeline with built-in caching features. What was most interesting was what it’s not built on top of: a database. The core storage mechanism of the public-facing content delivery system is essentially a hierarchical file-system store. Search features are provided by FAST, (acquired last year by Microsoft). Some interesting observations:

  1. The use of XML data and processing models in every step introduced some overhead (for example, performing HEAD requests that check the cache went through an inefficient XSLT and consumed 30% of overall execution time). Resolving these performance bottlenecks requires some special attention and custom optimizations. However, Robinson felt such costs to be outweighed by the benefit of the more convenient programming model.
  2. The LRU cache was distorted by search engine crawlers, which tend to perform broader-spectrum requests than typical users. Robinson plans to optimize for human users by introducing an ARC cache, which will take into account the frequency of requests for a given resource when aging cache entries.

David Lee presented a performance comparison of various scripting techniques for running multi-step XML pipelines. The resulting comparison of xmlsh and xproc, which run xml processes internally, with shell scripts which execute xslt and xquery externally, was largely a cautionary tale about the cost of repeatedly starting and stopping the Java VM. Although this wasn’t news to many attendees, it is the kind of implementation mistake that is easy to make even if one’s theoretically aware of the issue. The talk was also valuable as an introduction to xmlsh and xproc, two scripting technologies that ought to be in every XML-hacker’s toolbelt. Lee developed xmlsh, an XML scripting language with syntax based on Unix shell; XProc is a nascent W3C standard with a few available implementations, including Norm Walsh’s Calabash, which was tested as part of this work.

Balisage, Day 1

After some introductions, Tommie Usdin launched the conference with a dose of tart wisdom. She issued cautions about blind adherence to standards, urging a deeper understanding of requirements. The rest of the conference was peppered with mild disagreement with her plenary point: Standards considered harmful.

Alex Milowski reminded us that Netscape has had sophisticated XML support in the browser since 1999, although the lack of comparable support in Internet Explorer was treated as a minor, if unfortunate, side note. Milowski made a plea for a core set of XML vocabularies including HTML, SVG and MathML. He then demonstrated some pioneering work with audio-enabled eBooks (in the DAISY format) using Firefox extensions, and indicated a similar model is also available in the mobile space (using iPhone and Android). What I learned: you can associate MIME types with Firefox plugins so as to trigger custom behavior for your content type.

Somehow I managed to miss David Birnbaum’s talk on writing optimal XPath and XQuery for eXist’s evaluation engine, which I had intended to see. I guess I’ll have to read his paper when it’s posted. I also sadly missed seeing Michael Sperberg-McQueen’s talks, which, to judge from the questions with which he prodded the speakers, would have been fascinating.

Uche Ogbuji presented his application development platform, Akara, which integrates XML-native operations with Python in clever ways. Python developers working with XML will want to stay abreast of this work from the former developer of the 4Suite toolset, currently available in Beta.

To wrap up the first day, we had to choose between a fascinating-sounding flight into abstraction from Wendell Piez, and a survey of XML visualization tools from Mohammed Zergaoui. In the end, my practical orientation won out, and I was rewarded with a number of links to follow. In particular I am intrigued by Xopus, an in-browser WYSIWYG XML editor that looks very promising. I got a demo at the conference (from Betty Harvey), and will definitely follow up since this is something we need to embed in our solutions. I still wonder what a “nomic game” is, though.

Day 2

Fabio Vitali presented a new approach to handling overlapping markup. I learned that this is a major area of theoretical interest in markup that has been addressed by numerous proposed schemes. XML’s element tree can’t represent overlap directly in a natural way, and markup such as bold, italic is inherently ambiguous (which tag should be the outer one?). Although new to me, this seemed to be well-trodden ground for many at the conference, and the new contribution was the idea of using RDF and OWL as representation meta-languages.

Norman Walsh shared some data gathered from the “phone home” feature of XML Calabash, his XProc implementation. Every time Calabash runs, it sends some basic profiling information to xmlcalabash.org, unless the user explicitly opts out. Some insights: it appears that many (most) pipelines are streamable, at least from the viewpoint of XProc, although it was less clear what the benefit of streaming might be in these cases. One odd feature of Walsh’s data was that a surpriing number of pipelines have only a single step in them; he attributed this to the convenience of executing other XML processes via XProc, or possibly to a pattern of tire-kicking behavior.

The final regularly-scheduled presentation was Michael Kay’s discussion of the polarity of streaming processes. This included a fascinating discursion on the merits of co-routines for programming inherently multi-processing programs; their performance benefits relative to multi-threading; the difficulty of implementation in single-stack languages such as Java. Remarkably, there was a dearth of questions after Kay’s presentation. This was not a case of lack of interest; rather it seemed as if all the bases had been covered, and in the end there was just nothing more to be said once the master had spoken.

After a deeper dive into eBook formats with Alex Milowski, and a presentation on REST and XML that (according to him) came to Kurt Cagle during a sleepless night in the hotel room, it was time for me to drive home. Perhaps next year I’ll be able to stay for the full week, and not miss the XML games that were scheduled.

(note: cross-posted at the ifactory blog)

25 selfish things in 2^10-1 words

1. I live in one of the wealthiest parts of the only inhabited planet I can think of at the moment, but I still sometimes feel jealous of other people who seem to have it better than me.
2. I like to make things, especially things I’ve never made before; here’s a partial list:: a guitar, a virtual paper-folding simulation, a cryptic crossword puzzle, the coffee table my feet are resting on, a couple of sweaters, both long gone, a bavarian creme, also gone. Most of these things are misbegotten in one way or another, because perfecting a craft doesn’t happen if you keep skipping around like that.
3. I’d still like to write a novel, but the one I started kept growing and growing until I became afraid of getting lost in it with no way out, ever, and feared spending my whole life there. I saw that I could end up having written some all-consuming banality like Arcadia. Some time I’ll go back there and try again, though.
4. I am sometimes homesick for dirty old grumpy New York, but now everything seems so nice there. That city is too big for anyone to hold on to for more than a moment or two: it keeps getting polished with money and never remains the same.
5. 25 things is a lot to come up with
6. I never forward chain letters, or participate in group writing projects
7. I am not a team player, but people seem to tolerate me anyway. Maybe I am a team player. Or maybe people are actually finding me intolerable, like the woman on the train this evening who didn’t seem to like the way I blew my nose. OK, it was kind of loud, but she was really offended.
8. I like garlic. A lot.
9. I like to destroy things, especially other people’s preconceptions. This has probably tended to tie in to the whole “not a team player” thing.
10. I haven’t killed anybody, or written a symphony, or been to the moon, or made more money than Croesus, or taken over a small country in a bloodless coup.
11. My nose runs a lot, especially in the winter. I am starting to get some aches and pains that I am told will only increase. I have a small black spot on the sole of my foot that makes me feel stupid when I think about cancer and that I chose not to bother to have it cut out of my flesh with a sharp knife. Did I say it’s in the bottom of my foot?
12. The closest I have come to dying is when I drove a car off a cliff into some trees, but that was a long time ago, so I guess with hindsight we can see that I am closer to death now, in fact, as I sit in my cozy chair writing to you with my feet up on the coffee table that I made.
13. I used to think I was messy, but then I started a family and had dogs for a while. Now I am just a lazy neat freak, but the house looks about the same.
14. I don’t believe in God right now, but when I do, God exists. Sorry if that seems solipsistic, but this is 25 things about me, after all, and that’s just how I feel. I’m sure your God is always there for you – don’t get all defensive.
15. I can’t write anything about Ellen here because all the good stuff is secret.
16. I talk too much sometimes and too loudly; apparently it’s a family trait. I am trying to be a better listener.
17. I am very good at giving clear instruction, but terrible at enforcing anything. I am very good at following instructions, but can’t stand being told what to do.
23. Actually, about #8 – garlic is alright, but it’s definitely not one of the 25 most important things about me. I just liked the mood-enhancing short quality of the sentence more than anything else. Sorry, can I take it back?
18. Something about me you might not know is that my heart rate is really low.
19. Being competent has always been very important to me, and knowing the answer, too, but the more I think about that, the less important it seems.
20. I keep thinking there must be more to life than this, and then there is. But maybe I’m not unique in that regard. I think actually, that personality is overrated. Luck has a lot more to do with it. I’ve been very lucky, having made so incredibly many blunders, and yet I keep getting a second chance.
21. I like music that’s so weird that I don’t even like it, but every now and then I just want to sing along to the Beatles.
22. Virtue seems tedious to me. Of all the seven deadly sins, I think Pride is the best, followed closely by Envy. I’d like to invent a new one. Can Frivolity be a sin? Maybe not a deadly one, but it suits me. Humor is the best tonic for a bruised soul, and really the only attitude towards Death I can condone since the alternatives are so dreary. Maybe when someone I love dies, I’ll feel differently, but I think I’ll owe it to them to keep on laughing anyway.
23. When I die and people start rooting around in my stuff, they’ll find a lot of files called “TODO.” I hope someone will bother, but I wouldn’t wish that task on anyone I like.
24. 25 things is not enough to say what I want to
25. I’m right-handed, married, have three children, live in a house, drive a car, take the train to my job where I work with occasional complaints 240 days a year, eat red meat, green vegetables, white rice and drink coffee and beer and wine and water, go sailing and play soccer in the summer, get fat in the winter, stay up too late, get up too early, love my wife, fight with my wife, spoil my children, indulge myself far too much and go on far too long. Basically, I would say, to sum up, that I am just one helluva regular guy!

Puzzle Solver

I’ve posted my first working puzzle solver here. It is based on a flexible framework with a high degree of abstraction, with the idea of adding new puzzle types easily; so performance is not the main goal. So far, it can solve Sudoku and Kakuro type puzzles; it will be fairly straightforward to build brute solvers (guess-and-check) for puzzles of the type where you fill a grid with one of a small number of tokens and there is only a single solution.

At one point I was considering creating a constraint language in which declarative rules could be expressed in a shorthand notation, but this is beginning to seem like a lot of work with a dubious payoff. For now, all the constraints are expressed as Java programs. My current plan is to add more puzzle types (perhaps including some simple word puzzles at some point), and then to evaluate whether there is any benefit in further abstraction.

Scrapple

Play the scrapple word game against the computer, or play a solo game and try to achieve the perfect score. You will need to have installed a Java runtime environment (at least version 1.5) to use this applet-based program. Also don’t forget about the sudoku solver and the snowflake creator, all on the toys page.

This is the so-called official s c r a b b l e players dictionary (3) word list: ospd3, used by scrapple to determine allowable plays. It is missing some pretty real words, but apparently is used in tournament play nonetheless. What do they do when someone challenges zen, straddles, or irker?

WORD LIST

This list of words is pretty big (1.7M); it is all lower-case, includes various word forms and excludes proper names.  I wish I could remember where it came from (I think it was called WORDS, if that helps you identify it??)

words