Similarity-Informed Matrix Completion Method for Predicting Activity Coefficients
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate prediction of thermodynamic properties of mixtures, like activity coefficients, is essential for designing and optimizing chemical processes. While established physics-based methods face limitations in prediction accuracy and scope, emerging machine learning approaches, such as matrix completion methods (MCMs), offer promising alternatives. However, their performance can suffer in data-sparse regions. To address this issue, we propose a novel hybrid MCM for predicting activity coefficients at infinite dilution at 298 K that uses not only experimental training data but also includes synthetic training data from two sources: predictions obtained from the physics-based modified UNIFAC (Dortmund) and from a similarity-based approach developed in previous work. The resulting hybrid method combines the broad applicability of MCMs with the precision of the similarity-based approach, resulting in a more robust prediction framework that excels even in regions with limited data. Additionally, our analysis provides valuable insights into how different types of training data affect prediction accuracy. When experimental data are sparse, incorporating synthetic training data from modified UNIFAC (Dortmund) and the similarity-based approach significantly improves the performance of the MCMs. Conversely, even with abundant experimental data, high accuracy is only achieved if the training set includes mixtures similar to those of interest.