Intelligent Hydraulic Flow Unit Mapping: Leveraging Unsupervised and Supervised Learning on Large-Scale Core Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate identification of hydraulic flow units (HFUs) is fundamental for reservoir characterization; however, conventional approaches, such as histogram analysis, log-log plots of reservoir quality index (RQI) versus porosity index (ϕz), and Z-score probability tests, often suffer from subjectivity, data overlap, and limited scalability across large datasets. To address these limitations, this study introduces a hybrid machine learning workflow that integrates unsupervised clustering and supervised classification models to automate the identification and prediction of HFUs. In the first phase, unsupervised models including K-means, K-medoids, Fuzzy C-means (FCM), and Gaussian mixture models (GMM) were employed to detect the optimal number of HFUs. The GMM demonstrated superior clustering performance (R² = 0.9278, RMSE = 0.3365) compared to K-means and K-medoids, whereas FCM underperformed. In the second phase, supervised learning models were applied to predict HFUs using laboratory-derived core features. Among the tested models, k-nearest neighbors (KNN), random forest classifier (RFC), support vector machine (SVM), gradient boosting (GB), extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), and a stacking hybrid were evaluated. RFC outperformed the others with robust generalization (training accuracy: 0.983; testing accuracy: 0.972), while SVM showed moderate success, and KNN exhibited overfitting. Boosting models, such as XGBoost and GB, achieved high training accuracy but suffered from overfitting. In contrast, AdaBoost demonstrated relatively lower performance but stronger generalization capabilities. The stacking model, though highly accurate in training, also displayed overfitting during testing. Computational efficiency analysis further highlighted the trade-off between training time and predictive performance, with KNN and SVM being the fastest but also the least reliable. At the same time, RFC provided the most balanced accuracy–time outcome. Overall, the proposed workflow establishes an effective and scalable methodology for HFU classification, offering greater consistency, objectivity, and applicability to large reservoir datasets in the Norwegian sector of the North Sea.

Article activity feed