Supporting the Safety of Software in the Machine Learning Stack

It is common to hear from colleagues that they always have a ‘draft paper’ they’re working on.  I’m beginning to think they’re understating things…at the last count I had 5!  All in splendidly different stages of maturity, and all being written collegiately for differing publications and or conferences.

T’ would sadden me immensely if you suspected me remiss in my writing and publications, so I thought I would provide an occasional update on some of the problems we are researching (it isn’t just typing words you know – a LOT of effort continues at alarming pace in the background).

The first research area I want to give an update on concerns the problem of how we can start to argue over the safe contribution of software in systems that deploy Machine Learning (ML) applications – or how we support the safety of software in the ML stack.  Those who know me well will confirm my reluctance to decry the inadequacies of Open and Defence Standards for functional safety (I hope you can smell, nay, taste the irony [1], dear reader)…but, despite any misgivings some un-named person may have over the guidance found in such luminary publications, they do assert a number of objectives to be fulfilled by requirements levied against software development.

Such requirements are argued to have been met through demonstrating compliance/conformance with rules associated with software attributes (the distinguishing characteristic of a software item), features (a defined property of an entity/object), architectures, and testing regimes to name but a few. For example, a functional safety standard may require the programming language to use a defined subset, or not to use dynamic variables.  The same standard may require data to be protected against modification, or require the architecture to control complexity though modularity or abstraction.

Our first (loosely bounded) research question is “Are such rules still valid for safety-related systems that implement/include ML functionality?”  As such, our research starts with Open and Defence Standards to establish what their objectives and requirements currently are.  I acknowledge there exists a gluttony of papers ‘out there’ that on face value appear to have already embarked on researching software engineering for ML, but they don’t really do what they say on the tin.

Many papers purport to discuss issues relating to Software Engineering for Artificial Intelligence (oh so coolly abbreviated to SE4AI) – with ML being a specific instance of AI.  However, what they ACTUALLY discuss is AI as a specific instance of software.  We’re researching how software engineering can SUPPORT ML applications in a safe manner, and not how ML functionality is developed.

We’re embarking on this research for many reasons.  The foremost reason being that whilst much-needed attention is paid to ML, and how one can argue over its safe use, there is scant attention paid to how good the software in the ML stack needs to be (from neither a safety perspective, nor ‘just’ a software engineering perspective – from what I have found to date).

Secondly, and by virtue of our research plan, we will set out to establish whether the objectives, requirements, and rules of Open Standards remain valid.  Ostensibly we will determine whether they are valid for ML applications, but by the very nature of our questioning analysis, we may reveal that the opinions of international/national committees are no longer valid for the software engineering practices of today – regardless of whether inherent complexity is introduced through ML specifically.

Having completed the analysis of the existing state of required practice (which, owing to the nature of how standards are published is already at least a decade old), we will question what further rules, attributes, features, and testing regimens should be required of software engineering for the ML stack.  This has not been asked by committees to date (or if it has, it hasn’t been published to my knowledge) – and I wonder whether features such as MC/DC remain important when considering software used in ML stacks; and whether worst case execution time needs a new definition.  Perhaps we should instead be focussing exclusively on data (another instance of software, I accept), data processing capabilities, and how the SW in the ML stack can deliver and assure this?

I just don’t know…yet…but watch this space.

Have you encountered any literature on supporting the safety assurance of software in the ML Stack?  Please do let us know if you have.

[1] Yes, Alanis Morrisette and I share an incorrect understanding of irony.  The meaning of irony in this case being ‘a downright lie’, I grant you…

Leave a Reply

Your email address will not be published. Required fields are marked *