How to Protect Your Machine Learning Models

Thales | Security for What Matters Most More About This Author >

Contributors:
Dr. Werner Dondl and Michael Zunke

Introduction

In computer technology, few fields have garnered as much attention as artificial intelligence ( AI) and machine learning (ML). This discipline – sitting at the intersection of computer science and data analysis – has become integral to mobile applications, voice assistants, fraudulent transaction detection, image recognition, autonomous driving, and even medical diagnostics.

Because machine learning models require significant investment in time and financial resources, and because it is becoming pervasive across so many industries, it’s extremely important to secure your AI model from hacking attacks and intellectual property theft. If you are a vendor utilizing an ML model as part of your software code, you need to be extra careful to plan for your AI model security.

Let’s look at where the vulnerabilities lie and how they can be addressed.

AI Model Security: A Look at the Vulnerabilities

Model Theft

If your application that uses the ML model is deployed in a place where your customer has control, there are multiple ways your ML model could be stolen or used without an official license. One attack is a classic software copy attack. In this case, the real risk is not just making a copy, it’s the ability to run the software without a license. Without protection against unlicensed use, nothing prevents an adversary from running an unsanctioned copy of your application.

A second, and even less detectable theft is the extraction of your ML model from your application for use in the hacker’s application. If this is a direct competitor’s application, it could result in significant lost revenue. A large-scale analysis of mobile applications that use machine learning shows a high reuse factor of ML models, proving inadequate protection against ML model extraction.

SOLUTION:
To prevent this, it is advisable to use a strong licensing platform that protects against model extraction and ensures full flexibility for you and for your customer.

In 2021, about two thirds of the trending applications in China used machine learning and were protected against copying. (Mind Your Weight(s): A Large-scale Study on Insufficient Machine Learning Model Protection in Mobile Apps | USENIX)

Open Web Application Security Project (OWASP) lists model theft in its top ten attacks on machine models (Open Web Application Security Project (OWASP), 2023)

USE CASE: Industrial Automation
Computer vision is a good example of where it is imperative to feed your ML model with an extensive dataset that has been meticulously curated. Computer vision is installed in robots to recognize obstacles while navigating a shop floor or into a pick-and-place machine to identify incorrectly positioned components during PCB assembly. Failing to tightly control access to the model and its tuning poses the risk of allowing a competitor to extract and replicate the model. They can then fine-tune it for their specific needs and seamlessly integrate it into their own applications. The more accurately they can discern the structure of your model through reverse engineering, the better they can conceal its origin. Consequently, not only would they catch up with the quality of recognition you provide, but it would also become exceedingly challenging to substantiate claims of intellectual property repurposing.

Model Modification

It is crucial that your ML models function as intended. Without the right AI model security, you risk the integrity of the model becoming compromised by those who wish you harm. This can happen in any deployment phase—during the application delivery, model update, or after installation. Among the OWASP top 10 attacks are Model Poisoning and Transfer Learning Attack, both of which replace the authentic model with a modified version or a completely different model.

This type of attack requires knowledge of the interface between the ML model and the application, which is attainable through reverse engineering. By understanding the structure, an attacker can produce a fake model that provides the right interface to replace the original model. In a case where the attacker aims for a transfer learning attack, he would likely tune the model to act maliciously only on very specific cases that work in his favor.

SOLUTION:
An often-used countermeasure against both breaches to AI model security is to encrypt the model and allow only the correct application to decrypt and use it. AI model encryption makes the code all but useless without the correct decryption key. Decryption logic and the secret decryption key prevent the model from analysis and protect it from being replaced with a different one, as the encryption would not match. This way, you not only prevent replacement, but you also keep the attacker from analyzing your model structure.

Combining AI model encryption with a licensing system provides greater flexibility and protection because a licensing system that issues license-specific cryptography creates a secure binding between licensing and protection.

USE CASE: Autonomous driving
A model poisoning attack on a car’s machine learning model could cause the model to misbehave in specific situations, with dire results. For example, a hacker could re-train the model to instruct the car to accelerate through a red light if the optical sensor registers bumper sticker on the car in front of it.

Attack on the ML Application

The ML model can also be affected by attacking the behavior of the application system rather than a “frontal assault” on the model directly. Every ML model application has parts of code that are executed on the main CPU. Receiving and preparing the data for input to the ML model or post processing of the ML model output are points that are subject to Input Manipulation Attacks and Output Integrity Attacks (OWASP top 10). Applications that are not protected from reverse engineering and/or modification are vulnerable to these threats.

SOLUTION:
Sophisticated software protection tools harden the application against reverse engineering and modification to prevent these threats to your AI model security. These tools are found in advanced copy protection and licensing systems.

USE CASE: Network Security
For machine learning models to accurately identify network intrusion and data leakage, it is crucial that the input data remains unaltered, and the alert flagging mechanism functions correctly. When the input (e.g. manipulating input operations) and output logic (e.g. tampering with alert flags) are manipulated, there’s a risk of malicious activities going unnoticed. Attackers can evade detection by concealing the triggering alerts at specified dates and times.

Attacker Leapfrogging Your Model Training

Training a model requires significant investment in time and expense. In addition to collecting an efficient training dataset, you also need to curate it by correctly labeling the samples. An adversary trying to leapfrog your progress would typically use your model to label their unlabeled training dataset, saving them the extensive time and effort required to generate the correct labeling. In this way, your competitor could negate your advantage by quickly creating a comparable model using big training sets that match yours.

SOLUTION:
Since the attacker must utilize the application to run their datasets, you can use combination of the protections listed earlier to tightly control the application use (licensing). This is achieved by defining how many classifications are possible in each timeframe, limiting the total number of classifications, and restricting the number of concurrently running instances of the application. By adding custom controls to detect and limit abnormal usage with integrity protection, you further secure the application by preventing the controls from being removed from inside the application.

USE CASE: Medical Devices
Your medical MRI machine is trained to classify images against specific illnesses. Your competitor wants to use your application to label their training dataset. Fortunately, you have protected your application so that the competitor can only run very few images at a time, making it impossible to use your technology to leapfrog your training to their own advantage. By controlling the detection parameters via licensing properties, you can safely even modify them in the field for the customer’s specific use case.

AI Model Security Summary

In this era, AI and ML play pivotal roles across various industries. If you are commercializing a product that relies on machine learning models, proactive measures are essential to ensure the integrity of your model, secure your investment and intellectual property, and maintain your competitive edge.

The key vulnerabilities outlined here require a multi-faceted approach to model protection, combining robust licensing, encryption, and sophisticated software protection tools.

For thirty years, Thales has been the address for enterprise companies looking for a partner and a platform to safeguard and monetize their software. Count on our expertise and our Sentinel solution for the freedom to innovate with the confidence of security.

Contact the experts

About the authors

Dr. Werner Dondl is working as software architect and advisory engineer in the of Chief Technology Office for Software Monetization at Thales in Munich, Germany. Werner joined Thales after finishing his PhD in semiconductor physics at the Technical University in Munich as specialist for cryptographic libraries. He has held various positions over the last 28 years, in development, as a team leader and software architect. In his current position in the CTO office, he focuses on cutting-edge projects related to software security and monetization.
Michael “MiZu” Zunke is a software security and monetization expert. Currently serving as the Chief Technology Officer for Software Monetization at Thales where he is responsible for driving security and innovation. Today, he is most focused on how to deal with the challenges of intellectual property protection in machine learning. A graduate in physics from TU Munich, he holds several valuable patents in software protection and reverse engineering.

With over 30 years of experience in Software Protection and Licensing technology and research, Mizu has served as Technical Advisor of the Board at former Gemalto and program chair for the SPRO workshop at ACM CSS London.