Improving the cheminformatics-based machine learningperformance for the prediction of organic solar cells’ efficiency
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The organic solar cell (OSC) experimental data is usually scarse in large regions ofthe donor-acceptor pair space. It is because the acceptor is usually kept fixed when examining new donors (or vice versa). For instance, the data used in this work contains only two different acceptors (PCBM and PC71BM) but hundreds of different donors. The donors, however, can be subclassified as a polymer or small molecule. In this work both the acceptor’s and the donor’s information are considered as four extra one-hot binary features in the cheminformatics-based machine learning model. It is demonstrated that this additional information can improve the model’s prediction performance up to 34%. In addition, the trained models can be easily used to explore brand new theoretically constructed OSCs with different donor-acceptor combinations. It is predicted that over200 new OSCs, which are obtained by using a different acceptor for a given donor, will have higher PCE than the corresponding OSCs in the original data set. Trying to diminishthe data’s scarcity, the method can also be used to generate synthetic data points at alow computacional cost. The augmented data can then be used for training additional machine learning models.