Benjamin Jensen, Dan Tadross, and Matthew Strohmeyer recently wrote an article about Agentic Warfare for War on The Rocks (WoRT) that uncritically promotes a future if agentic warfare. The piece is a combination of pie in the sky on the abilities of LLMs and Chicken Little over the fear the U.S. is falling behind that is being seen more often within AI hype circle, generally geared toward selling AI products. I saw this as a person that is deeply invested in evoking technology and thing AI applied and appropriately integrated with human teaming is a winning combination.
Jensen, et al. write:
The world is about to enter the era of agentic warfare. In this article, we outline a theory of agentic warfare and use it to define what the United States should do to maintain its military power in this era. Although the United States may have embraced the potential of AI in the aftermath of World War II, its future is up for grabs.
…
If the United States is the first mover, these tactical agentic insights will flow autonomously to the strategic level. Feeding a new defense agentic base that not only sees critical capability gaps before they are a risk in crisis but also iteratively considers new options for capabilities that are outside immediate human imagination. Gaps, seams, and solutions for global defense transportation networks will emerge from agents who not only deeply understand the warfighting requirement, but who also iterate with like agents steeped with live insights into every commodity, every port, every wind pattern.
First, I have no idea what “Although the United States may have embraced the potential of AI in the aftermath of World War II,” means in reference to agentic AI. In part, this is because the authors do not offer any take on US investment in AI in the post-war period. The writers seem to have some tacit knowledge but do not share with the reader. Part of the issue is obfuscation—intentional or not—between LLMs and AI systems that operate on more narrowly defined parameters. This is one of the issues with the AI discussion right now. AI means everything and nothing.
Agentic Issues
By ‘agentic,’ I assume the authors mean a system that is autonomous, interactive, goal-seeking, and able to affect change in the world with minimal human oversight. LLMs do not currently meet this standard. The main problem with anything involving LLMs, and in this case the agentic models, is they don’t actually feed you knowledge—they produce simulations of knowledge that are not tied to truth, data, or facts. They are just manipulating tokens with close to zero regard for accuracy. Even one of the most foundational ideas of the LLM, that you can have a conversation with it, is completely wrong. Each prompt and query to an LLM is a discrete set of tokens. This means each prompt triggers a separate interaction that only seems like a conversation because it transmits the entire past conversation back to a GPU. Note, I said “A GPU” and not “THE GPU.” Seemingly sequential interactions might not even occur in the same counties as data centers spread globally. So, it’s not a conversation, but still they are tying a chain of logic together right?
LLMs do not have a logic chain a human would recognize. Given the largely similar training data most agentic models access—or steal as some copyrighted authors claim—you would expect most models to produce similar answers to questions. However, we already know that “who” asks the question is important when dealing with an LLM. I will get different responses not only based on my prompts but on my “past interactions.” This shows human capital matters, but the issue is worse than that.
Why Hallucinations Matter
recently wrote, “Ask the same question to three LLMs and you get three plausible, polished, but different answers. Which one's right? Who knows? They all sound legit. The main reason, of course, is that the data these systems are trained on is ambiguous. It's scraped from a web filled with conflict, contradiction, and Reddit flame wars. What is even "ethical" in algorithms trained across thousands of cultures and a billion conflicting opinions.” Well, that’s depressing but at least we are getting a handle on the hallucinations.
The WoTR’s article articulates a fantasy about how military agentic models will work. They see a promise inherent in them that will take human positions in planning for war and crisis? For this to be reality we will have to solve hallucinations and it’s not clear we are close to that.
has written since 2001 about issues of LLM hallucinations. He recently wrote:Media cheerleaders and industry people are trying to convince you that AGI is here, or imminent, and that hallucinations have gone away or are exceedingly rare.
Bullshit. Hallucinations are still here, and aren’t going away, anytime soon. LLMs still can’t sanity check their own work or still purely to known sources, or notify when they have invented something bogus.
The military does not have access to a pristine and unique pools of data. Even if they did, there is no reason to think a military LLM will hallucinate less than current leading edge models. These facts mean that military agentic models will look like their civilian facsimiles. Which is ok, until they are tied to national defense. Then it is a weakness.
The need for data from the internet is a vulnerability in a defense system and agentic planners need access to the internet. A great deal of intelligence is derived from open source information. Analysts turn that information into intelligence by analyzing, scoring, and sorting it. Analysts have their own biases and limitations. They are human. But, for the most part, humans understand other humans. We are attuned to the fact that someone’s analysis could be wrong. Is that extended to an LLM? It should be. For one, we have next to no idea how an LLM comes up with answers. We are already seeing the government struggle to identify hallucinations—like the Department of Health and Human Services made up study citations. Identifying these hallucinations takes humans with expertise. To borrow a phrase from the WoTR authors, accepting that LLMs will perform as they envision is as big a vulnerability as fighting tanks on horses.
The risks of hallucinations are compounded when we start to consider militarily relevant topics like false positives in targeting, or the dangers inherent in a system that is overconfidence in its automated logistics. These are internal system issues could be controlled with robust human checks. What is unknown is the susceptibility these systems have to information warfare, and the security risks inherent in scaling these systems.
Red and Blue Teaming
As Jack Shanahan recently said, “We can and should expect our adversaries to do everything possible to corrupt AI agents and reasoning models. It's the inevitable cat-and-mouse game of warfare.” We should expected Chinese, Russian, North Korean, and Iranian systems to target U.S. systems relentlessly. Their threat compounds when we talk about red and blue agents interacting. This is truly unknown territory as we do not even have a good theory for how to address this issues.
Zico Kolter recently told Wired, “It's a field [agents interacting with other agents] that is largely unexplored, both scientifically and commercially, and it's a really valuable space. Game theory was developed in no small part due to World War II, and then during the Cold War afterwards. I’m not equating the current setting to this in any way, but I think oftentimes, when there are these massive breaks in the operations of the world, we need a new kind of theory to explain how we might operate in these settings. And I think that we need a new game theory to understand the risk associated with AI systems. Because traditional modeling just doesn't really capture the variety of possibilities here.”
Great, the future is here, and we have no idea what it is going to look like. Also, one of the theories that underpinned some international dynamics, Game Theory, which is based in part on rational actors, may not even apply to systems that do not in fact rationalize. Can we even be sure military leaders are asking the right questions?
Are current LLMs even agentic? What does that mean? One narrow definition of agent comes from Cybernetics. Among other traits Cybernetic agents are autonomous. This would be akin to a thermostat or cell organelle that, once set, maintains stasis without outside command or direction. LLMs do not meet this definition currently. Could they? Maybe, but the hallucinations and our inability to understanding those answer processes fully mean that even a fully autonomous LLM agent could be dangerous in a defense systems. At a minimum, they would humans with expertise in the loop. Even if we expanded the agent definition to include traits of agency, interactivity, and embodiment, LLM “agents” still fail to meet the definition of an agent.
Exploring that failure is important to understanding the issues with current agentic models. It’s true they can act in a goal oriented way (agency) but the goal is centered on the token and not the outcome. This is important because the focus divergence between the user and LLM is opaque but shows they are not partners in the way humans traditionally think of partnerships. LLMs can be interactive with other agents, but as Zico Kolter said, we don’t even have a good theory of how that interaction works, so does that rise to the agentic definition? Maybe? But, it carries significant unknown risks.
Human Capital, Tacit Knowledge, and Knowledge Management
One of the Special Operations Forces’ Truths is “Humans are more important than hardware.” The point of this truth is that humans are worth more intrinsically than any system or weapon. It’s not a philosophical stance, it’s that hardware is only as good as the user. As already discussed, this is clearly the case with LLMs as better users get better responses. This means even if agentic AI overcomes all of the issues detailed above, the military needs to invest first in users to have any hope of an agentic model producing good work. Lastly, there is embodiment. Can the model interact in the real world? Well, they lack sensory input and realtime decision making. Again, they are working off token exchanges and can’t process real-time sensor inputs.
So, LLMs are not really agents in the way scientists and researchers have traditionally defined them. They are reliant on humans in the loop making the majority of decisions and providing the preponderance of situational awareness. At best, LLMs could enhance human understanding and intelligence if they are highly constrained. That is not nothing. The types of workflows that could be improved with a highly constrained LLM working with experts is impressive, it is not Agentic Warfare.
One thing that keeps LLMs from producing more reliable outputs is the way they consume data. Overwhelmingly, LLMs are trained on online content. There is considerable knowledge and information available online, catalogs of books, databases of art, and access to demographics that are mind boggling. The internet is billed as “all human knowledge” but it almost never contains the reasoning process people used to reach conclusions. A polite interpretation of an agentic model is the machine is reproducing a facsimile of a human thought process. In some very real ways, LLMs have almost no access to human’s internal logic process because it relies on tacit inputs. That tacit knowledge—the pre-judgements, biases, and values that underpin human thought—is ingrained in human thought and not easily reproduced because, in large part, it is not fully expressed in published products.
What could help this is a shift away from focusing on single user LLMs and moving to multiuser models that allow each user to review the entire conversation and interaction that leads to an output. This is not dissimilar to how an intelligence analyst or planner in the military is grilled by commanders and staff officers on their assumptions that underpin their work. Multiuser LLMs would not crack open the black box of the LLM’s backpropagation, but it would allow other humans to understand the human inputs.
This idea draws on the wisdom of crowds in a way that will help identify issues with LLM outputs. However, again here we return to the issue of human capital. A crowd’s wisdom in affairs requires at least some level of expertise in the topics at hand. Military commanders that layer high human capital with LLM’s outputs could see improvement in outputs. However, the issue then is the use of an agentic AI would necessitate the creation of an army (pun intended) of fact checkers that have the knowledge and ability to review the output and identify hallucinations and errors.
The last issue the military faces with using any form of AI is data and knowledge management. Even companies like Microsoft and Google struggle with data management. Theese companies have entire data ecosystems and employ data scientists, statisticians, and other specialist to manage, sort, organize, and weight their data. They serve as a good example for the military because they are massive, global, and run constant operations. So, if military commanders want to use any form of AI to its potential they need to invest in data management. Again, this is a point where investment in human capital is necessary to even start. Can the volume of data units produce be fully outsourced to civilian companies? I would say no, because it starts with enforcement of standards and practices in the organization that an outside entity cannot impose. More revolution in military ability will come from good data management than any present day LLM can offer. If, at some point, we cross the Rubicon and create truly reliable agentic models, good data management will make them even better.