Abstract:
A comprehensive understanding of the composition and physicochemical properties of coal-based liquids, such as coal tar or coal direct liquefaction oil, is conducive to the rapid development of multi-purpose, high-performance and high-value-added products and the efficient use of oil properties. A full understanding of the composition of ideal components in the coal-based liquid mixtures and their physical and chemical properties is also the key to designing liquid fuels with some special properties. The authors use the RDKit toolkit in Python, a method based on the Simplified Molecular Input Specification for Molecules (SMILES) language, to construct the molecular descriptors suitable for substances in the coal-based liquids. The constructed molecular descriptors are able to extract the required structural fragments for the molecules in the coal-based liquids, which are mainly composed of the elements C, H, O, N, and S and contain a large number of substances with polycyclic aromatic structures, so the constructed structural fragment descriptors are mainly considered from the perspective of the elemental and ring numbers of the polycyclic aromatic compounds. At the same time, the number of atoms and the molecular weight descriptors are added to the structural fragment descriptors, which the number of molecular descriptors is 115 in total. Compared with the traditional manual information extraction methods, the constructed molecular descriptors can quickly extract the information contained in a large number of molecules in the coal-based liquids. The structural fragments, molecular weights and atomic numbers of the coal-based liquid molecules obtained by the constructed molecular descriptors are used as input features in Machine Learning (ML) to establish a method of predicting the quantitative molecular structure-property relationship (ML-QSPR) of the coal-based liquids, which achieves the fast and accurate prediction of four properties, namely, the lower heating value (LHV), the density of the liquid (
ρ), the flash point (FP) and the cetane number (CN). The model validation analysis shows that the model
R2 of LHV,
ρ, and FP are 0.996, 0.988, and 0.987, respectively. The CN prediction is predicted by adding mixtures, and the
R2=0.959. The ML-QSPR method has been improved in terms of prediction accuracy compared to the methods in the literatures and has a significant advantage over the traditional experimental methods in terms of the speed of obtaining properties. Using the information in the property database obtained from the ML-QSPR predictions, the evolution of four combustion performance parameters of different groups of substances is investigated when the number of carbon atoms is increased, and all four properties are significantly affected by the carbon number (
n). Comparison of the individual properties of substances of different families shows that the difference in the LHV of substances of different families is small, and the size of LHV is mainly determined by
n. For
ρ, FP and CN, the difference in the properties of substances of different families is obvious. The trained model can be used to predict new molecules for new fuel design. The ML-QSPR method is expected to be used as a transfer learning model for the property analysis of different coal-based liquids when being applied in other application scenarios.