Trust Algorithms? The Army Doesn’t Even Trust Its Own AI Developers

October 7, 2020
cruickshank

Last month, an artificial intelligence agent defeated human F-16 pilots in a Defense Advanced Research Projects Agency challenge, reigniting discussions about lethal AI and whether it can be trusted. Allies, non-government organizations, and even the U.S. Defense Department have weighed in on whether AI systems can be trusted. But why is the U.S. military worried about trusting algorithms when it does not even trust its AI developers?

Any organization’s adoption of AI and machine learning requires three technical tools: usable digital data that machine learning algorithms learn from, computational capabilities to power the learning process, and the development environment that engineers use to code. However, the military’s precious few uniformed data scientists, machine learning engineers, and data engineers who create AI-enabled applications are currently hamstrung by a lack of access to these tools. Simply put, uniformed personnel cannot get the data, computational tools, or computing capabilities to create AI solutions for the military. The problem is not that the systems or software are inherently unsafe, but that users cannot get approvals to access or install them.

 

 

Without data, computing power, and a development environment, AI engineers are forced to cobble together workarounds with the technical equivalent of duct-tape and WD-40 or jump through bureaucratic hoops to get access to industry-standard software libraries that would take only a few seconds to download on a personal computer. Denying AI engineers these tools is the equivalent of denying an infantryman her rifle and gear (body armor, helmet, and first aid kit). If the military can trust small-unit leaders to avoid fratricide or civilian casualties while leading soldiers in a firefight or to negotiate with tribal leaders as part of counter-insurgency operations, it can trust developers to download software libraries with hundreds of millions of registered downloads.

The Defense Department’s Joint AI Center has initiated a multi-year contract to build the Joint Common Foundation, a platform to equip uniformed AI developers with the tools needed to build machine learning solutions. However, tools alone are not enough. The Joint Common Foundation should be part of a broader shift in empowering developers with both tools and trust.

Developers Need Data

Data is the lifeblood of modern machine learning, but much of the Defense Department’s data is neither usable nor accessible, making the military data rich but information poor. The military is hardly alone in its inability to harness the potential of data. A survey by Kaggle, the world’s largest data science community, showed that “dirty data” was the biggest barrier to data science work.

A recent article in a publication about the Joint Common Foundation mentioned the difficulties of object detection using MQ-9 Reaper drone videos because position data was “burned in” to the images, confusing the machines. Our most trying experience with dirty data comes from the Army human resources system which — as you might have guessed — has copies of soldier’s personnel records in image or pdf form, rather than a searchable, analyzable database. Instead of using AI to address talent management, the Army is struggling to make evaluations and records computer-readable. Once cleaned and structured, the data should also be accessible by users and their tools.

Military data owners frequently refuse to share their data, siloing it away from other data sources. Uniformed developers often spend hours to find the right authority to request access to a dataset. When they do, overly restrictive and nonsensical data sharing practices are common. For example, in one author’s experience, a data-owning organization shipped a laptop to that individual with preconfigured programs on it, because the data-owning organization did not trust the AI engineer to download the information or configure their own tools. Other times, the approval process takes weeks, as legal, G-6, G-8, and Network Enterprise Technology Command entities take turns saying: “It’s not my decision,” “I don’t know,” or “This seems scary.”

While the services have information system owners at regional network enterprise centers to manage users and networks, there is no such role or process for data. The Joint Common Foundation may put some of the Defense Department’s data under one technical roof, but it doesn’t solve the problem of bureaucratic silos and gatekeepers. Without an established framework for identifying and labeling which AI engineers have “need-to-know” and a streamlined process for access requests, the data will still be effectively locked away.

…And an Advanced Development Environment

In the rare event that data is accessible, uniformed AI engineers are not allowed to install software or configure their machines. The government computers with data access may only have data science languages like R — and much more rarely Python and Julia — and may also prohibit or severely inhibit the installation of software libraries that allow for data exploration, visualization, or machine learning. These libraries are critical to making machine learning accessible to any AI researchers (which the military has few of). Denying these tools to uniformed AI engineers forces them to reinvent the wheel, rebuilding algorithms from scratch.

In simple terms, the current options are blunt, general-purpose tools, but most AI engineers prefer advanced tools. For comparison, a financial analyst could do complex math by hand or with a basic calculator, but Microsoft Excel is a far more robust tool. The Army’s AI engineers face an equivalent situation.

Without these tools and libraries, AI engineers are forced to recreate the research of several academics — in whatever coding language is allowed — to do anything even as basic as matrix multiplication. As uniformed technologists, we build side projects on our personal computers with much more ease (and modern tools) than on government equipment. Such disparity is not surprising, but the central issues are permission, control, and speed rather than security or risk.

The Joint Common Foundation is expected to provide a secure software engineering environment and access to other resources, but a centralized solution of individually allocating software permissions will never keep pace with user needs. For comparison, the Defense Information Systems Agency has spent nearly $150 million since 2018 to address the backlog of more than 700,000 personnel awaiting security clearances, with some success. The importance of AI in future warfare means that backlogs of hundreds of AI developers waiting for software tools to do their job is a critical national security risk. A long process is not necessarily a thorough one while scalability comes from educating, trusting, and empowering many users. In order to actually enable the uniformed AI workforce to do its job, there needs to be greater trust in what tools and programs they are allowed to install and use on their government-furnished equipment.

The common refrain is that those tools are not safe, but that reasoning is just draconian and lacks critical thinking. Fighter jets are expensive and precious, yet military pilots still fly — and occasionally crash — them. Soldiers on a combat patrol or even the rifle range are at increased risk, but they patrol and train because that is their mission. Security is a balance of risk and effectiveness and we need to re-evaluate our digital network policies. It’s unreasonable that minor version updates of TensorFlow and PyTorch — key machine learning libraries created and maintained by Google and Facebook, respectively — would suddenly be a threat. It’s also unlikely that a widely-used open-source library would be a threat or that the threat would be detected in a review, yet somehow missed by millions of other users. Moreover, government networks should be secure enough to detect and isolate malicious behavior or at least built with zero trust — minimizing the time a network user has elevated privileges — such that the blast radius is minimized. The U.S. military can do better and the Joint Common Foundation alone will not suffice.

…Plus, More Computing Power

Once an AI engineer has access to data and the necessary software tools to build machine learning algorithms, they will need computational power, or “compute,” to train the machine to learn using the data. Computing power, like data, is currently siloed within some data-focused organizations like the Center for Army Analysis, the G-8, and the Office of Business Transformation, and is inaccessible to AI engineers outside of these organizations. Even if an AI developer is granted an account on the systems, the computational environments are only accessible via government laptops maintained by specific IT administrators.

This purely bureaucratic restriction means that a substantial number of the military’s AI workforce — who may be doing training with industry, getting a degree in machine learning from Carnegie Mellon, or otherwise in an environment without a computer on the .mil domain — would not be able to use their new skills on military problems.

Connectivity and access have been issues at the Army’s Data Science Challenge. When participants raised the issue last year, the sponsors of the challenge made the data available to military members without access to government computers (and, no data leaks transpired). This year, however, the bureaucratic access control issue will prevent last year’s competition winner — along with however many AI engineers that are currently in school, training with industry, or simply unable to get to a government computer due to the novel coronavirus teleworking restrictions — from competing.

Do Both: Centralize and Delegate

Ongoing platform efforts like the Coeus system proposed by the Army’s AI Task Force and Joint Common Foundation being built by the Joint AI Center are much-needed efforts to put tools in the hands of AI developers. We strongly support them. Both may take years to reach full operational capability, but the military needs AI tools right now. The Joint Common Foundation contract has options for four years, which is a long time in the fast-moving field of AI. Few people in the Pentagon understand AI and no one there knows what AI will look like in four years. Four years ago, the federal government spent half as much on AI as it does now; the Defense Department had not established the Joint AI Center or even the Pentagon’s first large AI effort, Project Maven; and the Pentagon had no AI strategy at all. Who can predict with confidence on such a time horizon? While fully functioning platforms are being developed, the Pentagon can take immediate steps.

The Defense Department and the services should formally track people in AI or software engineer roles, giving them skill identifiers similar to those of medical professionals, and giving digital experts specific permissions: access to data sources, authority to use low risk software locally (including virtual machines), and secure access to compute resources. The services have IT admins who are entrusted with elevated network permissions (the bar is only a CompTIA Security+ certification) and it is time to create a new user profile for developers. AI and software engineers (many of whom have degrees in computer science) require access to customize their own devices and use many specialty tools. The process to become an authorized user should be clear and fast with incentives for approval authorities to hit speed benchmarks.

First, the Defense Department needs to update its policies related to data sharing (2007 and 2019). Department leadership needs to formally address issues with permissions, approval processes, privacy, confidentiality, sensitivity for data sharing, and recognize AI engineers as a new user group that is distinctly different from data scientists. Moreover, access to data gets lost in bureaucracy because there is no executive role to manage it. The Defense Department should also consider creating a role of an “information data owner” to perform this role based on the information security owner role that controls network security. Data scientists and AI experts need access to data to do their jobs. This should not mean carte blanche, but maybe parity with contractors is a fair target.

Current policies restricting access to data for uniformed AI experts are especially frustrating when one considers that the Defense Department pays contractors like Palantir billions of dollars for aggregation and analysis of sensitive, unclassified, and classified data. Given that military leadership trusts contractors — who have little allegiance to the military beyond a contract — with wide latitude in data access, shouldn’t the military also extend at least the same trust with data to its own people?

Second, the Defense Department should set a goal to rapidly put as many tools as possible in the hands of engineers. The Joint AI Center and AI hubs within the services should drive expansion of existing “virtual software stores” with well-known, vetted-safe software libraries like Pandas, Scikit Learn, PyTorch, and TensorFlow and allow AI and software engineers to freely install these packages onto government computers. Such a capability to manage software licenses already exists but needs a major upgrade to meet the new demands of uniformed digital technologists.

Concurrently, the Defense Department should lower the approval authority of software installation from one-star generals to colonels (O-6) in small scale use cases. For example, if an AI team’s commanding officer is comfortable using an open source tool, the team should be able to use it locally or in secure testing environments, but they should not push it to production until approved by the Defense Information Systems Agency. Once the agency approves the tool, it can be added to the “software store” and made available to all uniformed personnel with the “AI Engineer” user role described above. The chief information officer/G-6 and deputy secretary of defense should provide incentives for the Defense Information Systems Agency to accelerate its review processes. The net benefit will allow engineers to refine and validate prototypes while security approvals are running in parallel.

In particular, designated users should be authorized to install virtualization software (like VMWare or Docker) and virtual private network servers into government computers. Virtualization creates a logically isolated compartment on a client and gives developers full configuration control over software packages and operating systems on a “virtual machine.” The virtual machine can break without affecting the government hardware it sits on — thus making the local authority for software installation less risky. VPN technology will allow approved users to connect to .mil systems without government equipment except for a common access card. These products are secure and widely recognized as solutions to enterprise security problems.

The military will also benefit by giving AI developers access to virtualization tools. Now, they will become “beta testers,” users who encounter problems with security or AI workflows. They can identify issues and give feedback to the teams building the Joint Common Foundation and Coeus, or the teams reviewing packages at the Defense Information Systems Agency. This would be a true win for digital modernization and part of a trust-building flywheel.

Risk Can Be Mitigated

If the military truly wants an AI-enabled force, it should give its AI developers access to tools and trust them to use those tools. Even if the military does build computational platforms, like Coeus or Joint Common Foundation, the problem of having grossly insufficient computational tools will persist if the services still do not trust their AI engineers to access or configure their own tools. We fully recognize that allowing individual AI engineers to install various tools, configure operating systems, and have access to large amounts of data poses some level of additional risk to the organization. On its face, in a world of cyber threats and data spillage, this is a scary thought. But the military — over hundreds of years of fighting — has recognized that risk cannot be eliminated, only mitigated. Small, decentralized units closest to the problems should be trusted with the authority to solve these problems.

The military trusts personnel to handle explosives, drop munitions, and maneuver in close proximity under fire. Uniformed AI engineers need to be entrusted with acquiring and configuring their computational tools. Without that trust and the necessary tools to perform actual AI engineering work, the military may soon find itself without the AI engineers as well.

 

 

Maj. Jim Perkins is an Army Reservist with the 75th Innovation Command. After 11 years on active duty, he now works in national security cloud computing with a focus on machine learning at the edge. From 2015–2017, he led the Defense Entrepreneurs Forum, a 501(c)(3) nonprofit organization driving innovation and reform in national security. He is a member of the Military Writers Guild and he tweets at @jim_perkins1.

The opinions expressed here are the author’s own and do not reflect official policy of the Department of Defense, Department of the Army, or other organizations.

Image: U.S. Army Cyber Command