Bluetooth LE Woes

A large part of the difference between a senior engineer and one more junior has less to do with the kind of work that they *can* do, and more with how they do it. When I was starting out, once I understood how to program, there wasn’t really a whole of code that I would have had trouble writing. You could have dropped me in just about at any company and I would have been able to produce the code that was required. The code may not have been pretty, but in the end it would have gotten the job done. This point is one that people outside of the programmers can have difficulty grasping. All they see in the end is the output, so from their perspective, junior programmers are the same as senior programmers, just cheaper.
Once you’ve worked with a number of different engineers and codebases, you’ll start to understand that some code bases are easier to work with, and things seem to behave as expected, while others seem to be difficult. This is one of those feelings that can be difficult to explain or quantify (or predict), but as you get even more experience, you begin to be able to predict these things.
One of the reasons that some code bases are easier to work with than others is predictability. You remember one of the first applications you wrote? Have you ever worked with a code base where every change seems to result in 100 bugs that you have to work through? When you finish that feature, and everything seems to be running just right, and someone asks for “one more change” and it all falls apart again? That’s a lack of predictability. Some of you may have had the opposite experience. Where you work with a code base, and every feature that is requested seems to just fall into place? It doesn’t mean that the work you did was any easier, but you spent less time trying to figure out how to make it do what you expect, and more time writing the code people expected of you. That’s the difference between good code and bad code.
Recently I’ve been working with Android and the Bluetooth Low Energy (BLE) libraries, and it is a perfect example of this. At first glance, it’s not that difficult (the BLE spec is a little obtuse, but that’s a different problem). You take the example code, and play around with it, and it seems to work. When you use the BLE library on Android, all of the calls are asynchronous. This means that I make a request of the device (like turn on an LED) and give it a callback function, and some time later the light turns on (or doesn’t) and it calls my function to tell me the action is complete. I made a number of different tests, like connecting to the device, or turning on a light, and everything seemed to work. Then I started working on my actual implementation, where I connected to the device, activated some features, and turned on a light. Sometimes it would connect, sometimes it wouldn’t. Sometimes the light would turn on, sometimes not. I couldn’t figure it out, this was the same code! Why was it not working? So I looked at the documentation, and it said every function will tell you where it fails. So I checked the return value of every function. They all succeeded. And still, my light wouldn’t turn on consistently. I was now several days into this mess, and I wanted to throw my computer at the wall. I finally found a comment deep in some forum thread that someone made, and voila, it had the answer. It ends up that on Android, when you make a request of BLE, it calls you back after some period of time. However, if you make a *second* request before the first one has completed, it will tell you that everything worked, but you will only get one of the callbacks. On some level, this makes sense. This is true for *every* request though. If I connect to the device, I have to wait for the callback before making *any* other requests. If you write a value, you have to wait until that write is complete before making *any* other requests. This isn’t a requirement that is documented, the function won’t tell you it failed (even though it claims to tell you whether a request succeeded or not), and it isn’t enforced by the API (you aren’t required to even give a callback function, if you don’t want to be notified. Just don’t make any other requests until you should have been notified).
This problem is particularly insidious because it gives you the impression that it works. If you make only single requests, or your requests are delayed (say, by waiting for user input), then it works. Make 2 requests in a row, and boom! Your application stops working. The end result? I spent a week working on my app, getting frustrated, trying to figure out why it wasn’t doing what it was supposed to. This is with new hardware, so it wasn’t clear if I was calling out to the hardware wrong, if the hardware was buggy, or if maybe my code was broken. The one thing I didn’t expect was that the platform I was running on would lie to me. This is on a store bought production device, nothing weird or out of the ordinary running.
That is the difference between code written by a good engineer and a bad one. Both will let you make calls to the API, and both will work. On both of them, all the examples work, and the documentation is identical. In one world, I would call the API, and it would blow up if I do something wrong. Or tell me it failed. It would do *something*! And in the other, it fails silently and wastes roughly a week of my time. Multiply this times anyone who I’m blocking, and you can easily waste a man-month in a matter of days.