Wire

Federal data gets a cleaner path into AI training

The House bill would have NIST set voluntary standards for formatting, labeling and checking public datasets before they feed AI systems. It also reaches open government data and calls for better metadata and ongoing updates.

For agencies trying to use artificial intelligence, the bottleneck is often not a lack of data. It is the messiness of the data they already have. In the federal House, H.R. 9341 would tell the National Institute of Standards and Technology, or NIST, to write voluntary guidelines for preparing government datasets, including open government data assets, so they can be used to train AI models.

The bill comes from Representative Brian Babin and Representative Zoe Lofgren, but the real subject is the information itself. Better structure and cleaner labels can make public data easier to reuse, whether the user is a federal office building a tool or an outside developer working with government material.

The data chores behind the model

The measure would push NIST to focus on the plain, technical work that turns a dataset into something a system can read. That includes formatting and structure, so the information is organized in ways AI can interpret; labeling and annotation, so entries are clearly marked; and quality evaluation, so agencies have a way to check whether the data is sound before it is used for training.

The bill also says those labeling and annotation standards should work at scale, including through programmatic, automated and expert-guided methods. That is a clue to the bill’s practical aim. It is not trying to rewrite AI policy from scratch. It is trying to make sure the raw material going into these systems is less likely to be garbled, inconsistent or hard to reuse.

A guide, not a mandate

The guidance would be voluntary, which matters. NIST could set a common standard, but the bill would not force every agency into a new compliance regime. Even so, a shared playbook can change how federal data gets handled, especially if different offices have been using different methods to clean, label and publish the same kind of information.

NIST would also consult with the Office of Science and Technology Policy, the Energy Department, the Office of Management and Budget and other agencies it sees fit. That cross-agency reach suggests the bill is trying to make AI-ready data a governmentwide habit, not a one-off fix. For readers, the point is simple: the quality of public data can shape how useful, accurate and reusable AI systems become.

Back to wire