Using fuzz testing in physics research
First, thanks to those who talked to me after last post. Your help is appreciated! I’ve received tips on using Julia’s environments and Manifest.toml to get things working on other machines with little fuss; I’ll definitely check that out, although for now what I want is to make a library, not a standalone program. It should be easy to call from Python and C, easy to vendor, and so on. I was also pointed to Julia’s discourse forum, I’ll ask there when I get the time.
Now properly speaking of fuzz testing / property-based testing, Supposition.jl, and how that’s helping me do research.
If you don’t know what those are, here’s a quick recap:
Writing unit tests for your code takes long, is boring, and mostly catches the bugs you anticipate anyway, so it’s not a great investment of your time. Fuzz testing means that instead you have the computer randomly generate thousands of tests, exploring many possible combinations of inputs that you couldn’t think of. Traditionally, a fuzzer will just check if your program crashes or not, but you can also ask other kinds of questions to it. You can ask things like: is the output of this function always a valid input to that function? Is FooError the only possible exception raised by this method? Is this equation always satisfied?
Asking those questions is checking properties of your code, and that’s where the name “property-based testing” comes from. In pratice, to do this, you have to
- Tell the computer how to generate inputs to your code
- Tell it what properties the code should satisfy
- Let it do its thing. It will find counterexamples and automatically shrink them to give you the smallest, simplest one that reproduces the bug.
Finally, if you write the tests first and the code later, this is called property-driven development (like TDD, test-driven development, but smarter). And Supposition.jl is a Julia package designed for property-based testing.
I started using Supposition.jl this Tuesday (4 days ago).
I’m having a great time with it! It didn’t take long to learn, the API is clean. And it has found so many useful counterexamples that when a test passes I’m really confident that the property in question is satisfied.
It has found trivial and shallow bugs, where I just didn’t understand Julia well enough (for example, type instabilities). It has found important, subtle bugs in my code where the logic was wrong, but required just the right combination of inputs to see. And it has found bugs in the math itself: I’m implementing stuff from a paper and it seems that some of its mathematical claims are wrong, because Supposition.jl found counterexamples.
This is great because I would never have thought of those counterexamples in so little time, or at all. It led me to a rabbit hole of math that was ultimately very productive, because now I have a much deeper understanding of what I’m doing, I know what the authors of the mistaken papers missed, how to fix it, and even how to make it more practical. Overall, great investment of my time as a researcher.
And to emphasize: I didn’t spend this week playing with Supposition.jl! I learned all I needed to learn from around half an hour in total looking up things in the documentation, and that was it. All my brainpower went to fixing bugs and understanding my math, while the fuzzer was the one trying to find clever counterexamples. I’m honestly impressed with how well it worked and how easy it was. It finds bugs faster than you can fix them.
It looks like fuzz testing is useful not just for software but for research. I’m definitely going to put any general claims I make in a future publication through a fuzzer before submitting it. It’s a cheaper, faster version of having your local mathematician check your stuff to find counterexamples; it’s not as rigorous, but damn does it catch edge cases.
A good physicist, like a good programmer, always takes the time to check that their reasoning is correct, at least in the common case. But it’s rare to have the time to rigorously verify that every step is completely correct in every possible case. Given that we are not going to use formal mathematics to prove our stuff correct anyway, fuzz testing is the cheapest, most effective option to find those mistakes. And I’m glad I’m using it.
I need help to make good scientific software
Part of my job as a PhD student is to write data analysis software for a future space mission. Because the launch is about 10 years into the future, there is still a lot of R&D being done, so all that I am writing now is really a prototype. It may or may not be used in the final pipeline.
Now, I really want my contribution to this effort to survive the next decade. I want it to be there, even if only in essence, in the pipeline that will run the first analysis of the mission data. But whether my code will make the cut depends on many things. Some of them could be grouped into the umbrella of “software quality”.
Software of good quality, in this context, is something that will happily work in other machines, years or decades from now, with no hassle. That won’t crash, won’t require a special environment to be set up for it, won’t yell at the user about dependency version incompatibilities. That will scale up and down to the size of the problem at hand and the computing resources available. Something that will just do its job efficiently, silently, and above all correctly.
In practice that’s just an ideal. We physicists are not very good at making robust, interoperable, pleasant to use software. We don’t have much training on software, we mostly learn to write for loops that compute big formulas. But I want to try.
The plea for help
Now, given that time is finite and my thesis won’t write itself, I need practical advice on doing this (if you have some experience and are interested, please give me a hand!). Some specific constraints are the following:
- I am pushing for the use of Julia in an environment dominated by Python and C, but I am not actually experienced in Julia. I just think it is a wiser engineering choice.
- I am writing signal processing and Monte Carlo routines.
- This has to run in standard Linux x86_64 clusters (slurm, kubernetes).
- My routines should be easy to call from Python and C for interoperability.
- We might want to throw GPUs at the problem.
What I thought so far
From my little experience I know that writing unit tests by hand takes time and can only catch the bugs you anticipate anyway; inline snapshot testing at least helps by making that easier. Random test generation seems promising and I will try out Supposition.jl very soon. Formal methods stuff like Alloy and TLA+ look fun but I have way too little time to learn them, and besides they seem most useful for concurrent algorithms which is not what I am doing.
On the robustness and ease of installation side of things, I want to avoid dependencies like the plague, unless they are very stable.
Fortunately for now I depend only on (other than Julia itself) the C library fftw and its Julia bindings. Those look stable enough to me that I shouldn’t worry about using them. I am wondering if I should vendor them to make things easy for the users and myself. Other Julia packages that I use (Revise, PyPlot) are only for development and so I don’t count them. Supposition.jl is an interesting case; if I do end up using it I should have some test suites that depend on it. That is still just a development dependency, but unlike Revise and PyPlot, its usage goes into the source tree. I have no idea what I should do then.
For interoperability with C, I know that compilation of Julia code into C libraries is possible but I never did it. The other direction should be easy.
For interoperability with Python, there is PythonCall which I hope won’t break too much. But if we are ambitious, in the long term this is not needed: all current Python code can be replaced by Julia. Not in the timeframe of my PhD, though.
Where that leaves us
This week I will try out this fancy property driven development thing. It looks like it can be very effective, and fun. Hopefully this won’t distract me too much from the actual thesis goals.
Anyway, cheers!
tags: software