Toward Responsible AI in High-Stakes Domains: A Dataset for Building Static Analysis with LLMs in Structural Engineering

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Modern engineering faces an unprecedented paradox: while our systems grow in-creasingly complex, the tools we use to design and evaluate them must remain both reliable and transparent. Decisions in energy, infrastructure, and construction no longer occur in isolation but within socio-technical networks shaped by emerging technologies and artificial intelligence (AI). Among these advances, large language models (LLMs) such as GPT have attracted attention for their ability to synthesize solutions, interpret domain-specific queries, and generate outputs with minimal fine-tuning. Yet beneath this promise lies a critical flaw—LLMs do not compute; they predict. Their reliance on sta-tistical associations often leads to biases, logical missteps, or hallucinated values, short-comings that become especially problematic when applied to structural engineering, where safety and compliance are non-negotiable. This tension sets the stage for the present work. The dataset introduced here responds to this gap by demonstrating how generative AI can be grounded within validated com-putational workflows. Through the Model Context Protocol (MCP), ChatGPT was con-nected to numerical solvers such as OpenSees and benchmarked against ETABS, ensuring traceability, reproducibility, and compliance with seismic design standards. The dataset comprises technical prompts, GPT outputs, verified numerical analyses, and comparative error metrics for four reinforced concrete frame models designed under Ecuadorian (NEC-15) and U.S. (ASCE 7-22) standards. Beyond a simple record, it exemplifies a re-producible methodology for embedding LLMs within structural engineering practice. By curating and releasing this dataset, the study pursues three goals: to strengthen re-producibility by enabling independent verification, to foster interdisciplinary collabo-ration across AI, civil engineering, and data science, and to establish benchmarks for context-aware AI integration in high-stakes domains. In doing so, it not only illustrates the promise of human–AI teaming but also highlights the limitations that must be addressed if generative models are to be responsibly embedded in engineering decision-making.

Article activity feed