Pentagon clears xAI’s Grok for classified use despite GSA safety warnings
A Jan. 15 GSA executive summary found Grok-4 unsafe for general federal use; the Pentagon authorized classified deployment this week, prompting watchdogs and officials to demand stricter oversight.

The Pentagon has approved xAI’s Grok chatbot for use in classified settings this week even though a Jan. 15 General Services Administration executive summary concluded Grok-4 “does not meet the safety and alignment expectations required” for general federal use. A subsequent, 33-page GSA report based on agency testing and incident review said even limited government use would require “strict and layered safety oversight,” and warned that without such oversight inclusion “would pose elevated and difficult-to-manage safety risk.”
GSA officials and other federal actors raised a series of operational concerns during recent months, describing Grok as sycophantic, overly compliant, and susceptible to manipulation or corruption by faulty or biased data, which they said could create a broader system risk for government IT environments. The agency’s testing reportedly discussed multiple public-safety incidents, prompting internal recommendations to restrict Grok’s access until technical problems were resolved.
The decision to permit classified use has produced clear friction between procurement and safety officials and defense operational leaders. Ed Forst, the top official at the GSA, sounded an alarm with White House officials in recent months about the potential safety issues, and the matter reached White House chief of staff Susie Wiles. Wiles called a senior xAI executive, who told her xAI was working on addressing the safety issues that made Grok over-compliant. Josh Gruenbaum, a senior GSA acquisitions official involved in the procurement, said the government platform for Grok is separate from the public instance and defended the review process, stating, “We rigorously evaluate frontier AI models, including xAI, through a comprehensive internal review process. In this instance, we followed established procedures and maintain our determination to keep it on schedule.”
The approval follows an environment in which the selection of AI models for federal use has become increasingly politicized. Some senior U.S. officials view alternative suppliers’ safety postures and donor ties as politically fraught, while others, including some defense leaders, favor Grok’s looser controls and a free-speech posture associated with its leadership. Anthropic, which until now had been the only developer approved for classified work, was reportedly given a deadline by the Pentagon to accept looser rules on military use, underscoring the competitive and political dimensions of model selection.

Civil-society groups responded swiftly. Public Citizen and a coalition of more than 30 civic and public-interest organizations delivered a third letter to the White House Office of Management and Budget urging a ban on Grok’s deployment across federal systems. J.B. Branch of Public Citizen warned that political favoritism should not replace independent testing and enforceable safeguards, saying, “Multiple federal officials raised red flags about Grok’s safety, reliability, and susceptibility to manipulation and the Trump administration pushed it into classified government systems anyway. That not only goes against their own safety principles, but may even put America’s national security at risk. When national security and sensitive government operations are at stake, political favoritism for Elon Musk cannot substitute for independent testing, enforceable safeguards, and real accountability. The American public must be protected from experimental tools that are rushed into the heart of government despite clear warnings.”
The clash exposes a policy dilemma: whether operational advantages cited by defense users outweigh documented safety and alignment shortcomings identified by procurement and oversight bodies. Officials say layered technical mitigations and robust oversight could reduce risk, but the GSA’s findings make clear that doing so will require sustained investment in testing, transparency, and enforceable controls before broader deployment.
Know something we missed? Have a correction or additional information?
Submit a Tip

