Comparing Linear and Nonlinear Finite Element Models of Vertebral Strength Across the Thoracolumbar Spine: A Benchmark from Density-Calibrated Computed Tomography
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (GigaScience)
Abstract
Opportunistic assessment of vertebral strength from clinical computed tomography (CT) scans holds substantial promise for fracture risk stratification, yet variability in calibration methods and finite element (FE) modeling approaches has led to limited comparability across studies. In this work, we provide a publicly available benchmark dataset that supports standardized biomechanical analysis of the thoracic and lumbar spine using density-calibrated CT data. We extended the VerSe 2019 dataset to include phantomless quantitative CT calibration, automated vertebral substructure segmentation, and vertebral strength estimates derived from both linear and nonlinear FE models. The cohort comprises 141 patients scanned across five CT systems, including contrast-enhanced protocols. Phantomless calibration was performed using automatically segmented tissue references and validated against synchronous calibration phantoms in 17 scans. To evaluate model performance, we implemented a nonlinear elastoplastic FE model and compared it to two linear estimates. A displacement-calibrated linear model (0.2% axial strain) demonstrated excellent agreement with nonlinear failure loads (R = 0.96; mean difference = -0.07 kN), while a stiffness-based approach showed similarly strong correlation (R = 0.92). We evaluated vertebral strength at all thoracic and lumbar levels, enabling level-wise normalization and comparison. Strength ratios revealed consistent anatomical trends and identified T12 and T9 as reliable alternatives to L1 for opportunistic screening and model standardization. All calibrated scans, segmentations, software, and modeling outputs are publicly released, providing a benchmark resource for validation and development of FE models, radiomics tools, and other quantitative imaging applications in musculoskeletal research.
Article activity feed
-
AbstractOpportunistic assessment of vertebral strength from clinical computed tomography (CT) scans holds substantial promise for fracture risk stratification, yet variability in calibration methods and finite element (FE) modeling approaches has led to limited comparability across studies. In this work, we provide a publicly available benchmark dataset that supports standardized biomechanical analysis of the thoracic and lumbar spine using density-calibrated CT data. We extended the VerSe 2019 dataset to include phantomless quantitative CT calibration, automated vertebral substructure segmentation, and vertebral strength estimates derived from both linear and nonlinear FE models. The cohort comprises 141 patients scanned across five CT systems, including contrast-enhanced protocols. Phantomless calibration was performed using …
AbstractOpportunistic assessment of vertebral strength from clinical computed tomography (CT) scans holds substantial promise for fracture risk stratification, yet variability in calibration methods and finite element (FE) modeling approaches has led to limited comparability across studies. In this work, we provide a publicly available benchmark dataset that supports standardized biomechanical analysis of the thoracic and lumbar spine using density-calibrated CT data. We extended the VerSe 2019 dataset to include phantomless quantitative CT calibration, automated vertebral substructure segmentation, and vertebral strength estimates derived from both linear and nonlinear FE models. The cohort comprises 141 patients scanned across five CT systems, including contrast-enhanced protocols. Phantomless calibration was performed using automatically segmented tissue references and validated against synchronous calibration phantoms in 17 scans. To evaluate model performance, we implemented a nonlinear elastoplastic FE model and compared it to two linear estimates. A displacement-calibrated linear model (0.2% axial strain) demonstrated excellent agreement with nonlinear failure loads (R = 0.96; mean difference = -0.07 kN), while a stiffness-based approach showed similarly strong correlation (R = 0.92). We evaluated vertebral strength at all thoracic and lumbar levels, enabling level-wise normalization and comparison. Strength ratios revealed consistent anatomical trends and identified T12 and T9 as reliable alternatives to L1 for opportunistic screening and model standardization. All calibrated scans, segmentations, software, and modeling outputs are publicly released, providing a benchmark resource for validation and development of FE models, radiomics tools, and other quantitative imaging applications in musculoskeletal research.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf094), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 2: Karan Devane
The study uses an open-source dataset collected in a population representative of those who would benefit from opportunistic screening and included physiological variation (i.e. contrast enhanced images and pre-existing fracture), alongside validation of density and FE assessment calibration methods. The methods are described in detail, including software versioning schemes, and links to the software sources as relevant for use in replicating methods. Additionally, the enhanced dataset is being included alongside the publication. The primary purpose of this study was to prepare and make available a public dataset for use in continued testing and development of opportunistic screening methods. The data appears to be conservatively analyzed as such, and the authors make notes of existing limitations of the population and sample characteristics where applicable. Additionally, the phantomless calibration technique is validated within this dataset prior to use in support of the "generalizability of the approach" (178), though the applied sample for this is relatively small (n=17 with in-scan phantoms). The manuscript is well-written and easy to understand but I have a few suggestions and comments that need to be addressed.
The data are well-controlled for the study cohort, however as mentioned by the authors (228-232), this cohort is biased towards individuals with pre-existing skeletal fragility, as indicated by the average lumbar T-score as assessed by DXA falling in the osteopenic range (-1.5, Table 1). Beyond this, the authors made use of multiple validated calibration techniques to support the use of their internal calibration scheme, as well as analysis of potential confounding variables such as contrast enhanced CT scans. Relative vertebral strength analysis (Figure 6, Table 2), however does not appear to be analyzed with respect to the fractures mentioned as present throughout the cohort (193). While differences in strength may be primarily explained by density or size, it is possible that the incidence of pre-existing fracture occurring in the thoracolumbar segment may influence adaptation of the other vertebrae in the region [1][2][3], and as such analysis for fracture inclusion may be warranted.
The use of standardized FE modeling techniques supports the goal for reproducibility of assessment in clinical FE modeling. While the authors made efforts to enhance the reproducibility and generalizability of the dataset, they themselves note that the source population is not necessarily descriptive of a general population (lines 227-232). Though this population is representative of those indicated for opportunistic screening, the development of risk curves necessitates the inclusion of healthy individuals, and follow-up analysis to fully flesh out the use of opportunistic FE in clinical settings, however this analysis would require a much larger cohort, and are outside the scope of the current manuscript. Further, while 'voxel-models' are typically regarded as standard, tetrahedral element models may generally provide better representation of complex biological geometries [4]. All approaches to FE have drawbacks, and tetrahedral models may be less-optimal solutions compared to hexahedral elements for convergence and the possibility of artificial stiffening, the high prevalence of osteophytes and degradation [5], particularly in older populations where screening is indicated, may warrant the use of tetrahedral elements which capture the intricacies of vertebral geometry that impact FE derived strength [6]. While again potentially outside the scope of this study, it might be noted as an additional formulative variable for FE approaches to estimating fracture risk.
Line 269 -> "… applications such as radiomics-driven [approach?] for opportunistic …" As fracture prevalence is included in the dataset, it may be worthwhile to include analysis of fracture-adjacent vertebra in the selection of surrogate vertebra for L1 in opportunistic screening. Does pre-existing fracture influence which vertebrae selected, and should this decision be made on a person-to-person basis, taking into consideration the particular condition of the vertebrae available in the scan?
[1] https://pmc.ncbi.nlm.nih.gov/articles/PMC8752702/ [2]https://academic.oup.com/jbmr/article/39/12/1744/7825427 [3] https://pmc.ncbi.nlm.nih.gov/articles/PMC7697376/ [4]https://www.sciencedirect.com/science/article/pii/S0021929005003568 [5] https://link.springer.com/article/10.1007/s12565-010-0080-8 [6]https://www.sciencedirect.com/science/article/pii/S1529943018306466
-
AbstractOpportunistic assessment of vertebral strength from clinical computed tomography (CT) scans holds substantial promise for fracture risk stratification, yet variability in calibration methods and finite element (FE) modeling approaches has led to limited comparability across studies. In this work, we provide a publicly available benchmark dataset that supports standardized biomechanical analysis of the thoracic and lumbar spine using density-calibrated CT data. We extended the VerSe 2019 dataset to include phantomless quantitative CT calibration, automated vertebral substructure segmentation, and vertebral strength estimates derived from both linear and nonlinear FE models. The cohort comprises 141 patients scanned across five CT systems, including contrast-enhanced protocols. Phantomless calibration was performed using …
AbstractOpportunistic assessment of vertebral strength from clinical computed tomography (CT) scans holds substantial promise for fracture risk stratification, yet variability in calibration methods and finite element (FE) modeling approaches has led to limited comparability across studies. In this work, we provide a publicly available benchmark dataset that supports standardized biomechanical analysis of the thoracic and lumbar spine using density-calibrated CT data. We extended the VerSe 2019 dataset to include phantomless quantitative CT calibration, automated vertebral substructure segmentation, and vertebral strength estimates derived from both linear and nonlinear FE models. The cohort comprises 141 patients scanned across five CT systems, including contrast-enhanced protocols. Phantomless calibration was performed using automatically segmented tissue references and validated against synchronous calibration phantoms in 17 scans. To evaluate model performance, we implemented a nonlinear elastoplastic FE model and compared it to two linear estimates. A displacement-calibrated linear model (0.2% axial strain) demonstrated excellent agreement with nonlinear failure loads (R = 0.96; mean difference = -0.07 kN), while a stiffness-based approach showed similarly strong correlation (R = 0.92). We evaluated vertebral strength at all thoracic and lumbar levels, enabling level-wise normalization and comparison. Strength ratios revealed consistent anatomical trends and identified T12 and T9 as reliable alternatives to L1 for opportunistic screening and model standardization. All calibrated scans, segmentations, software, and modeling outputs are publicly released, providing a benchmark resource for validation and development of FE models, radiomics tools, and other quantitative imaging applications in musculoskeletal research.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf094), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 1: Maria Prado
The study presents a novel technique that could advance vertebral strength estimations using FE analysis. The authors clearly articulate the motivation for open benchmarking, covering spinal regions (T1-L6) that are not typically included in similar studies. The description and availability of both linear and nonlinear models support the method's broad utility. I value the authors' effort to share data and open-source resources, which enhances reproducibility.
Suggestions are recommended to enhance the manuscript and clarify/expand some sections for future readers.
(Lines 122-132) The justification for choosing 0.2% axial strain as the calibration threshold is somewhat empirical and based on only three representative samples (low, medium, and high vBMD). Please, expand on how representative these three samples are of the entire cohort and whether additional samples were tested to confirm generalizability.
(Line 151-152) The manuscript notes that T12 (+2.2%) and T9 (-2.1%) exhibited the smallest deviation from L1, suggesting their potential as alternative targets. In addition to calculating these deviations, was any further analysis performed to support this conclusion? Consider expanding on whether more extensive validation or simulations would be necessary to robustly support T12 and T9 as substitutes for L1.
(Lines 198-200) The description of cortical bone modeling is vague. It is not clear if the cortical bone was not modeled explicitly, but was implicitly accounted for. Clarification would be appreciated. Additionally, please comment on whether the method leads to under- or overestimation of strength in areas where cortical bone is predominant. Is this a limitation that might impact model predictions?
(Line 314) Is there a specific reason why the posterior elements were included in the segmentation process? Previous studies have often omitted these structures from their models. A brief justification for their inclusion in the present work would be helpful.
(Lines 322-323) Are there any references or prior studies that support the selection of the specific reference tissues used for phantomless calibration?
(Lines 349-356) While equations for modulus and yield stress are provided, a short explanation of how these equations compare to other published models and why they were chosen could be more clearly included.
(Lines 361-373) The explanation of the simulation procedure, while valuable, does not clearly state whether it was performed solely on the L4 vertebra (described as the reference image) or applied individually to each vertebral body. Please clarify this point. Additionally, although the loading and boundary conditions are described, the manuscript lacks detail on how endplate irregularities or variations in vertebral alignment were addressed.
(Line 387) For the failure load calculation using the stiffness-based method, which specific vertebrae were used to measure height? Please clarify whether height measurements were taken from all vertebrae in the cohort, only from those included in the force analysis, or from a subset.
(Lines 397-399) The "graph model" approach for intervertebral strength normalization is not explained in detail. While it appears that this method corresponds to the analysis presented in Figure 6, this connection is not clearly stated in the text.
(Lines 122-144) In the section Linear models approximate nonlinear vertebral strength estimates, it is unclear how the nonlinear model itself was validated. The manuscript does not reference any experimental or literature-based benchmarks to support the accuracy of the nonlinear failure load predictions. Please clarify whether any validation against in vitro or in vivo vertebral failure data was performed or cited. If such validation is lacking, this should be acknowledged as a limitation and discussed in terms of its potential impact on the interpretation of the results.
Minor suggestions:
Terminology: The term "phantomless calibration" is well-used, but a brief definition upfront (in Abstract or Background) would help readers unfamiliar with the concept.
(Line 59) The word "transparent" refers to a clearer modeling workflow?
(Lines 87-89) Consider relocation of the statement ("By providing these outputs, we offer a ready-to-use reference..."), which seems confusing and cuts the flow of the text.
FIGURES: Ensure axis labels, units, and legends in all figures (especially Fig. 4 and Fig. 6) are visible and explained.
FIGURE 3A - C. The subtitle titles could lead to misinterpretation or confusion about what is being described.
-
-
-