Cyber Attacks on Devices in the AI-ML Era
Any offensive advance in a conventional war requires taking down ground defenses with air power before launching a ground invasion. In the early days of building cybersecurity fortresses the strategic multi-layer defense initiatives comprised of meticulously designed heterogeneous and redundant security countermeasures for intrusion detection and prevention. The evolution thereon was guided by the paradigm of proactive rather than reactive security controls based on reliable real-time threat intelligence feeds, regular expressions, and context-sensitive grammar for deep-packet inspection and deep-code analysis.
Artificial Intelligence (AI), Data Science (DS), and Machine Learning (ML) undeniably offer innovative methods to analyze big data at high velocity with elastic compute and AI processors for deep learning, training, and inference engines. In the hyper digital age of AI, transitioning to pre-trained and learning models for cyber threat detection will require net assessment of the tools and methods of the adversaries. One of Murphy’s laws of combat is that “Tracers work both ways”. Adversaries can gain visibility into pre-trained models and stage invasive maneuvers that exploit the inherent algorithmic and data biases. While ML in a tightly coupled and controlled environment may offer an effective means to monitor and analyze the functional model of field devices, the forensic model for risk and vulnerability assessments in a loosely coupled and open environment requires a greater degree of introspection for safeguards against predatory exploits.
The CISO and product security challenge
The status quo is undeniably changing and so must the resistance to change. The future risks posed by DS/ML powered cyber-attacks, direct or indirect through supply chain infiltration, transcend evasion of edge and network level detection and prevention decision logic to stage intrusions and data theft. The multi-layer defense model will be seriously challenged by sophisticated kill chains powered by DS/ML. ML uses techniques and algorithms that are beyond neural networks and deep learning. The operational technology (OT) devices in the soft core are vulnerable targets, and therefore hardening OT devices will become the call to action. Sadly, the current state of network-based security is analogous to prescription anti-anxiety medication and pain killers. They may provide short term reinforcements but are addictive, ramp-up to higher dosage, and withdrawal to reduce high dependency in the long term may be extremely painful. Identifying the root cause of the condition and building a natural immune system should be the preferred solution. Offsetting the quantum computing staging infrastructure available to global adversaries will require equivalent investments in resources and a significantly higher information technology (IT) budget to modernize, with major upgrades or rip-and-replace, network level security appliances for the post quantum era. A well-thought-out strategy to harden brownfield and greenfield devices with software level upgrades would also be a pragmatic approach to defuse future risks. The current state of cyber physical systems is insecure. In the emerging era of asymmetric cyber warfare orchestrated with attack models pre-trained to penetrate threat models, cyber physical systems require a high degree of cyber resilience for protection-in-depth with risk models.
The real cybersecurity challenges
The cybersecurity quadrant of AI/ML faces a unique set of challenges. The probability of evasion in AI/ML powered cyber-attacks may be higher and attain a greater penetration rate (e.g., only one in millions of intrusion attempts may evade detection and penetrate network defenses today). Real time cybersecurity threat assessment models require a high dimensional feature vector for effective event correlation based on threat intelligence, network observations, content inspections, and device monitoring. In an unsupervised problem, data may be inferred without specific objectives for ubiquitous goals (e.g., for clustering, data interrelationships, or learning from experience). Cybersecurity is fundamentally a supervised problem that requires decision making, training data with labelled instances, and learning from episodes (security incidents). Traditional IT systems require a high degree of visibility and control for configuration and patch management by network/security operations center (NOC/SOC) operators. Resource slack IT systems and devices benefit from third-party after-market security plugins. In contrast, OT devices are vulnerable to data tampering and data-based attacks through subversion of trust, access control, and control over their functionality. Therefore, OT devices require built-in protections provided by the original equipment manufacturer (OEM) for cyber resilience. The initiatives OEMs embark on to plug security gaps with technology innovations will determine the trajectory onset for ubiquitous digital trust in controlled, open, and distributed networks.
Algorithm selection and beyond
As illustrative examples, for spam detection and filters the naive bayes method may use histograms and multi-variate or multinomial event models to classify embedded words in emails. The classification of large and complicated images may require a convolutional neural network, whereas image recognition, sentiment analysis, or language translation may require vector sequences or sequential data analysis methods with recurrent neural networks, long short-term memory networks, or transformers. Decision trees provide high variance and low bias, but low predictive accuracy. Neural networks comprise of layers – an input layer, one or more intermediate (hidden) layers, and an output layer for classification. In simple terms, a neuron comprises of a linear part (weight and bias) and an activation part (distribution function). A neural network model requires an architecture of one or more neurons and parameters (the weights and biases). The output layer must have as many neurons as the number of classes required for reclassification. The diagnosis of a cybersecurity event may be classified as true positive, true negative, false positive, or false negative. The false negatives are the misses. The strength of the hypothesis depends on the probability (based on the data in the training set) and the likelihood (based on the parameters of the model). The cost function is the error between the predicted and expected values, that is a measure of the performance of a learning model. The variant types in cybersecurity are multi-variate (multiple variables) and non-linear (i.e., not a dot product of features). Support vector machines find non-linear decision boundaries by mapping the feature vector to a high dimensional feature vector but are less effective than neural networks. If the assumptions made by the selected algorithm to efficiently compute the mean and covariance of data are incorrect, then discriminative algorithms such as logistic regression that use an iterative process over big data may be required for better performance.
Product security by design
Categorical trust is rooted in the strength, resilience, and agility of cryptographic keys, key distribution methods, and protection techniques for operational keys in the custody on both services and devices. Digital keys are required to enforce device authentication, data authentication, and data protection without service disruption in live environments. Non-repudiation of device identity is critical for device authentication in private and public hybrid OT ecosystems. Local authentication on the first mile is rooted in an authoritative identity provider and identity verifier. Remote authentication on the last mile is rooted in authentication protocols and ceremonies that require a set of public, private, or shared cryptographic artifacts (e.g., passkeys, symmetric keys, root certificates, leaf certificates) based on standards for interoperability. Product security by design requires developers to harden applications at the core with effective use of secure and insecure transport protocols, protection APIs and cryptographic keys. There is no one size fits all solution in cybersecurity and there are different approaches to explore and evaluate for feasibility based on weighing the pros and cons of each approach. The optimum solution depends on industry specific use cases, associated cyber risks, and mitigation strategies.
Resilience by design
The complexity, limited scalability, and lack of agility in traditional security information and event management (SIEM) systems to prevent security breaches are a leading indicator of the difficult challenges that must be overcome in high-speed distributed digital networks to protect connected devices. The hypothesis of security orchestration, automation, and response (SOAR) serves as an arrow in the quiver. The maxim of chaos theory is that absolute control is an illusion and uncertainty is the only certainty with randomness and unpredictability in data. Winning forever wars in cyberspace will require chaos-oriented resilience engineering (CORE). The promise and hype of AI/ML powered cybersecurity benefits must be evaluated in the context of both zero-sum gain and net-positive outcomes. The potential use (and abuse) of AI/ML in cybersecurity models require a holistic approach that is rooted in the OT devices that must be protected in the future when the tools and methods available to the adversaries would be far more sophisticated than they are today. Security as a posture cannot be a constant because the threats are variable. Long lived devices in the field are vulnerable to future-day attacks and therefore must be hardened to survive zero-day attacks of the future.