A poison message, for the uninformed, is a “queued” message that can’t be processed for one technical reason or another. Doesn’t sound like much of a bother, but due to the way that queues operate, they can muck up the works if left unhandled.
Azure WebJobs does a decent job of handling poison messages out of the box. Basically, you’re given five tries (by default) to process a given message. If, for instance, the database you’re trying to write to is unreachable or maybe the instructions embodied in an message were somehow malformed then the message would be automagically moved to a “poison” queue for further processing; but only after five failed attempts. As to any further handling, it’d be up to you.
There are a few problems with this scheme, though. For a start, there are entire classes of messages that can never be processed. The most obvious of these are “unparsable” messages (i.e. a random stream of bytes when you’re expecting a JSON document, a JSON document that is correctly shaped for your receiving POCO but missing data, etc.) In such a case there’s no point in doing any retries since–by definition–they will repeatedly fail. In any case, a little bit of context would go a long way in helping to decide the best way to handle specific errors; something the default scheme doesn’t provide for.
To help in this regard, I crafted a WebJobsPoisonDemo.CloudHelpers library that simplifies message parsing, validation, and processing, as well as error detection, handling and alerting. It also simplifies WebJob development significantly. The pattern should be pretty easy to adopt, as the WebJobsPoisonDemo (see GitHub to download the source) will make clear.
To implement a queue-triggered WebJob method of your own, do something along the lines of the following:
- Create a “work” class derived from WebJobsPoisonDemo.Helpers.WorkBase
- This is the class that supplies all of the parameters needed to do some sort of work. For instance, the ResizeImageWork class has properties for the Source and Target blobs as well as the Format, Width and Height of the resized image.
- Be sure to decorate the public properties with validation attributes; standard ones like “Required” and “Range,” as well as custom ones like “BlobName,” as appropriate. These will be used to validate that the enqueued work before being submitted to a worker to be performed.
- Create a “worker” class derived from WebJobsPoisonDemo.Helpers.WorkerBase
- This is the class that does the actual work; within the HandleWork method.
- If the work completes without error, then the message returned from the GetSuccessLogMessage will be logged.
- Should an exception occur while processing the message, the original message will be moved to a “poisoninfos” queue, along with contextual info such as the error stack, detection date/time, work kind, etc.
- Add a queue “triggered” method to Functions.cs that will invoke the worker and try to do the work via a worker’s ParseAndDoWork method
- The first stage of poison message handling (i.e. the detection and persistence of poison messages to the “poisoninfos” queue) requires no additional coding
- If you want to perform an additional stage of poison-message handling, implement a method, in Functions.cs, similar to HandlePoisonInfo.
- This method, in turn, could call the PoisonHandler’s SaveAsBlobThenEmailAnAlert method to save the PoisonInfo to blob storage then email an alert to one or more recipients via SendGrid
It’s a bit more involved, plus you’d do well to perform some initialization tasks like I do in Program.cs so as to allow for early failure in the case of misconfiguration. Regardless, I trust you will find the program fairly self explanatory. If not, though, feel free to reach out to me and I’ll be happy to answer your questions. Enjoy…