Why AI Has a Communication Problem
Language is a tool, the purpose of which is remarkably simple: to transfer the thought or idea in my head into yours, as completely and accurately as possible. Like most tools, the tool of language might be used by different users in diverse ways.
Mastery of language is no guarantee of success. New technologies mean new vocabulary. And new vocabulary means a less consistent use of that vocabulary. And nothing is newer or bigger in the marketplace than AI-related technology.
It's all about context
Language only works when both sides of a conversation agree on context and definitions. Said more colloquially, language works best with both sides are “on the same page.” In the technical world, the classic example of a miscommunication of this type is one between Engineering and Marketing. It is so common, in fact, that it is the fundamental premise of the humor in the cartoon strip Dilbert.
The problem is actually quite simple: the goal of an Engineer is communicating an idea precisely. While Marketing is also about communicating, preciseness is of secondary importance. The primary goal is to influence. If a less accurate word gets a better response from the Marketer, the less accurate word will be used. Naturally, this results in a disconnect (i.e., miscommunication) when an Engineer attempts to learn from Marketing materials.
Another common source of miscommunication is two groups having different definitions of the same word. In some cases, both are even correct, though incompatible. A perfect example of this is the word “theory.” To a scientist, engineer, or mathematician, the word “theory” has a very precise definition which is quite different from that of a non-technical person. William Briggs is a scientist with a PhD in Mathematical Statistics who offered the following insight on the subject in 2012:
“By the way, it is a pet peeve of mine to call any intellectual model of something a ‘theory.’ In science, a model is an explanatory, predictive description of some system or process. A hypothesis is a model that in principle can be falsified, that is, the evidence that would disprove the model can be unambiguously stated. A theory is a hypothesis that has, so far, survived all attempts to prove it wrong.”
The conflation of the definitions of “theory” and “hypothesis” in the minds of non-scientists makes communications between scientists and non-scientists a tricky problem to solve. In other words, it is difficult to transfer the thoughts or ideas of a scientist into the head of a non-scientist completely and accurately. In a more general sense, it is a good example of distinct groups having difficulty communicating with one another.
How do we fix this?
As a consumer of technology, “cross-silo” communication like this is an everyday challenge, whether it is between you and a vendor, or between you and other groups within your organization. As stated at the beginning, AI-related technologies are new to the marketplace, and therefore, a source of a lot of imprecision and miscommunication.
To fix this, first, you need a source of accurate, precise data. Your Sales team, an Account Manager, and a Sales Engineer have the job of influencing you to buy a product. They are taught to communicate in Marketing terms. What you have going for you is that most Sales Engineers, plus a surprising number of Account Managers, came from an Engineering background. It is not hard to get them into “geek mode” where they drop the Marketing vocabulary and switch to Engineering-speak. At that point, it is important to know the definitions of the Engineering terms they will be using.
AI has been around as a field of Computer Science since the mid-1950s. As such, the vocabulary is established in the technical world. But all of this is new to the consumer in the last few years, so the definitions of words used in consumer-facing media are a bit “fuzzy.” You have undoubtedly run across terms such as “Artificial Intelligence,” “Machine Learning,” “Large Language Models,” “GPT,” “Generative AI,” “Deep Learning,” “Neural Nets,” and “ChatGPT.” Let’s make sense of these.
Two basic categories of AI
Like the term “physics,” AI or Artificial Intelligence is not really a “thing” in and of itself. Rather, it is an umbrella under which many other fields exist. Discounting early avenues of research under the AI umbrella, there are two basic types of AI today: statistics-based AI and neural-network-based AI.
Machine Learning
Statistics-based AI is better known as ML or Machine Learning. Fundamentally, ML is all about creating a model comprised of one or more equations to describe a solution, then “training” that model using positive and negative reinforcement by providing the models with right and wrong answers. This training is essentially a computer-assisted search for coefficients for each variable in each equation, which, when novel values are plugged into the variables, results in the desired answers.
If this sounds too simple to be considered intelligence, you are not alone in that opinion. It is common for ML to be considered as a “lesser” science under the AI umbrella. While ML’s status as “intelligence” is debatable, its power as a tool is not. ML excels at many difficult tasks.
While ML can be used for many things, if I had to pick one sole use case that defines its utility, I would choose “grouping.” ML is exceptionally powerful for finding things that “look like” each other. This might be finding all of the photos of your dog on your phone. Or finding the faces of people in a photograph to use as a point on which to focus the lens. Since we are talking about security, it might be useful for finding groups of servers in your network with similar traffic patterns, then notifying you when the traffic from one of those servers suddenly becomes less like it used to be (i.e., a deviation from the baseline), potentially indicating a breach.
There are dozens of other possible uses, including finding all your NTP servers, all your Redis databases, or all the machines in your network running old, unpatched versions of Windows.
If you read that a product uses AI, it is likely that the specific technology in use is ML. Compared with other AI technologies, ML is the most predictable, best understood, and easiest to implement. It also nicely solves a lot of problems common to the security space. It is also worth noting that while training an ML model (the part the vendor does) requires extensive compute resources, using an ML model (the part you do once you have purchased the product) once it has been trained requires no more computing power than any other application.
Deep Learning
When the average person hears the term “AI,” solutions based on Deep Learning are probably what they have in mind. Before we define Deep Learning, however, we first need to talk about Neural Nets.
The fundamental building block of a computer is the NAND gate. With computer logic, any other type of gate, and thus any computer, can be built of NAND gates. In fact, the computers in the Apollo spacecraft were the size of a large shoebox and contained about 14,000 NAND gates.
NAND gates are simple critters. In the simplest form, a NAND gate has two inputs and one output. When both inputs are high (“on,” or logic 1), the output is low (“off”, or logic 0). Other combinations of inputs (low/low, low/high, or high/low) result in a high output. Simple. But from this lowly logical construct, all computers are built.
The fundamental building block or “processing unit” of the brain is a neuron. Neurons are not much more complex than NAND gates. They communicate electrochemically via several inputs (typically hundreds) and one output. While the logic in a neuron is more complex than a NAND gate (typically an analog threshold function, rather than an on/off logic gate), this is easily modeled in software.
A group of neurons “wired” together is a Neural Net. While Neural Nets are a fun curiosity, their true power is realized when layers of neurons are connected, where each neuron feeds one or more other neurons in large numbers. This is Deep Learning. Officially, deep learning is defined as “a neural network containing more than one layer.”
What is interesting is that Neural Nets are a descendant of Perceptrons, which were invented in 1943, and first implemented in 1958. While Perceptrons had serious limitations, the basic concept was sound, and evolved into Neural Nets in 1987. In other words, we have had the basic building blocks and understood the fundamental ideas upon which today’s incredible AI technology is based for over thirty-five years, yet AI progress was glacial until recent years.
What was lacking was compute power. The human brain has roughly 100 billion neurons. Between these neurons, there are roughly 100 trillion connections. Computer power has been growing exponentially since its inception, but only with the recent advent of extremely powerful computer graphics co-processors with thousands of processor cores each has it been possible to build Neural Networks with meaningful numbers of neurons. Let us throw some numbers out to put this into perspective.
In 1986, when I first started getting serious about programming, the most powerful supercomputer in the world was the Cray X-MP/48. This machine cost about $20M USD at that time, or about $55M USD in today’s money. It was about the size of a restaurant’s walk-in refrigerator, and used about 350 kw of electricity, about as much as a square block of houses with the AC cranked up. A Raspberry Pi Zero, when released a few years ago, cost $5 USD and had roughly the same performance as one of these systems. A single iPhone or high-end Android phone that you carry around in your pocket and toss in the trash when you break the screen is about as powerful as all the supercomputers in the world in 1986 combined. A visit to your local big box store might net you a machine equal to a few hundred iPhones.
While huge advancements have been made in the computer science side of AI, it is really the astonishing increase in computer power and ability to simulate ever-greater numbers of neurons that has led to the remarkable abilities of today’s AI solutions.
Solutions built on deep learning
Outside of ML, nearly all other current AI technology is based on Deep Learning. Generative AI is the broad classification of systems that produce the “wow” factor in AI today. Generative AI is the ability to synthesize new output, often in the style of other input data. This might be audible (voices, sounds, or music), visual (pictures, movies, drawings), or text (words, sentences, paragraphs, poetry, or lyrics, for example). This output might be entirely original or done in the style of a specific artist (your favorite search engine should be able to turn up examples of the voice of Elvis singing Sir Mix-a-Lot’s Baby’s Got Back or a painting of a corgi in the style of Vermeer).
Large Language Models are Generative AI systems that specialize in human language. Unless you live under an extremely large rock, you have likely heard of ChatGPT. ChatGPT is a web interface on top of AutoAI’s product called GPT. ChatGPT is a remarkable system which, based on prompts and questions from a user, produces output ranging from puzzling to astonishing. ChatGPT will happily do your child’s math homework (or write their book report), write you a story, analyze a piece of software, or help you write some code in Python. The output of ChatGPT can be easily seen as intelligent (though whether this output truly represents intelligence or not is beyond the scope of this article). Certainly, the output is close enough to intelligence to show where the technology might go in the next five years.
Deep Learning in security
To date, there has been little integration of Neural Network-based AI solutions in security products. It is certainly not zero, but there are still a few speedbumps to be navigated before a vendor will commit to incorporating this technology.
If I may take a few liberties with the term “motivation,” the first liability of the current generation of Large Language Models is that its “motivation” is to produce output that satisfies a user. This sounds rather good, until you realize that output that satisfies a user is not necessarily correct output. An LLM is entirely happy with being wrong, so long as the user is happy. In fact, it would not even be accurate to say that being correct is a secondary consideration for an LLM. If the output of an LLM does happen to be accurate, it’s more of a happy accident, and of no real concern of the LLM. While this is fine when writing LLM-assisted poetry, it might be problematic when assisting with security policy.
Second, LLMs can still “get out of hand,” so to speak. By necessity, LLMs are trained with a far wider breadth of knowledge and data than is strictly necessary for the use they are being put to. In fact, it is sometimes useful to think of using an LLM in the same way as hiring an employee. An employee hired to do the task you need done certainly has life experience outside of that task. Like an errant employee, current LLM implementations can be led outside of safe topics of conversation.
LLMs are extremely recent technology, and these issues are being worked on by a lot of very smart people. They will undoubtedly be solved in the next year or so. Once they are, expect a variety of new product features, including natural language interfaces, automatic prioritization of issues, cross-referencing of previously solved issues, and suggestions for issue resolution. Twelve to eighteen months from now, I would be surprised if there was not a product on the market that might send you the following email:
Dear User. Anomalous traffic with characteristics matching the newly released CVE-20240101 was detected from the following four machines in your Dallas datacenter starting at 04:53:07 this morning: […] All four of these machines were lacking vendor patch XXX, and two were also lacking patch YYY, both of which mitigate CVE-20240101. As these were redundant database servers and adequate capacity was available for fail-over, these machines were temporarily disconnected from the network. Please click >here< to automatically re-image, patch, and restore these systems, or click >here< for more information and other options.
Each piece of this already exists today, at least in the research phase. LLMs can parse the English text of CVEs (common vulnerabilities and exposures). They are capable of comparing the data in that CVE with real-world network traffic. They are capable of analyzing network volume and capacity. They are capable of analyzing a system’s installed (and missing) software and configuration. And they are capable of generating Ansible scripts to automate the rebuilding of systems and restoration of configurations and data. It is just a matter of putting the pieces together.
In closing
In the world of social media and news, we are watching history unfold as language (and, therefore, communications) is being made deliberately less precise. We are watching real-world implementations of the lessons of Bernays and Orwell. In the world of technology, however, we are not yet facing these challenges. We are still free to speak precisely and accurately. Having the right vocabulary is an important part of that.