李文英,王香玲,范欢欢,等. ML-QSPR方法预测煤基液体的燃料性能[J]. 煤炭学报,2024,49(2):1098−1110. DOI: 10.13225/j.cnki.jccs.2023.1701
引用本文: 李文英,王香玲,范欢欢,等. ML-QSPR方法预测煤基液体的燃料性能[J]. 煤炭学报,2024,49(2):1098−1110. DOI: 10.13225/j.cnki.jccs.2023.1701
LI Wenying,WANG Xiangling,FAN Huanhuan,et al. Predicting the fuel performance of coal-based liquids using the ML-QSPR method[J]. Journal of China Coal Society,2024,49(2):1098−1110. DOI: 10.13225/j.cnki.jccs.2023.1701
Citation: LI Wenying,WANG Xiangling,FAN Huanhuan,et al. Predicting the fuel performance of coal-based liquids using the ML-QSPR method[J]. Journal of China Coal Society,2024,49(2):1098−1110. DOI: 10.13225/j.cnki.jccs.2023.1701

ML-QSPR方法预测煤基液体的燃料性能

Predicting the fuel performance of coal-based liquids using the ML-QSPR method

  • 摘要: 煤基液体混合物如煤焦油、煤直接液化油的分子结构描述和性质预测是开发煤基液体产品高值化工艺和技术的重要基础。由于煤基液体主要由C、H、O、N、S元素构成数量庞杂、芳环结构各异的混合物,因此,使用Python中的RDKit工具包,利用简化分子线性输入规范(Simplified Molecular Input Line Entry System,SMILES)语言构建煤基液体中物质分子描述符,描述符包含样品元素信息、环数与环结构信息、原子数及分子量信息等共计115个分子描述符。对比人工信息提取方法,将所构建的分子描述符能够体现煤基液体分子结构碎片、分子量及原子个数信息等作为机器学习的特征输入变量,用于建立预测煤基液体的燃料性能的分子机器学习−定量结构性质关系方法(ML-QSPR),实现对燃料低位热值(LHV)、液体密度(ρ)、闪点(FP)、十六烷值(CN)4个关键燃料性能参数的快速预测。模型验证分析表明LHV、ρ、FP模型的R2分别为0.996、0.988、0.987;CN预测中加入混合物数据进行预测,R2=0.959。与已公开报道的预测LHV、ρ、FP、CN性质方法对比,笔者提出ML-QSPR方法在预测4个关键燃料性能参数准确度方面有提升,在获取结果速度方面有显著优势。利用ML-QSPR模型预测得到的煤基液体制特种燃料性能参数数据库中的信息,分析增加不同族组分物质的碳原子数量时4个燃料性能参数的演变趋势,发现LHV、ρ、FP、CN四个燃料性能参数均受碳数(n)影响显著。由于LHV主要由n决定,不同族组分物质的LHV差距小;而不同族组分物质的ρ、FP和CN性质差距明显。此外,本研究训练好的模型可用于预测新的分子,为新型燃料分子设计提供参考;ML-QSPR方法作为迁移学习模型可在今后用于煤基液体其他场景相关理化性质的分析。

     

    Abstract: A comprehensive understanding of the composition and physicochemical properties of coal-based liquids, such as coal tar or coal direct liquefaction oil, is conducive to the rapid development of multi-purpose, high-performance and high-value-added products and the efficient use of oil properties. A full understanding of the composition of ideal components in the coal-based liquid mixtures and their physical and chemical properties is also the key to designing liquid fuels with some special properties. The authors use the RDKit toolkit in Python, a method based on the Simplified Molecular Input Specification for Molecules (SMILES) language, to construct the molecular descriptors suitable for substances in the coal-based liquids. The constructed molecular descriptors are able to extract the required structural fragments for the molecules in the coal-based liquids, which are mainly composed of the elements C, H, O, N, and S and contain a large number of substances with polycyclic aromatic structures, so the constructed structural fragment descriptors are mainly considered from the perspective of the elemental and ring numbers of the polycyclic aromatic compounds. At the same time, the number of atoms and the molecular weight descriptors are added to the structural fragment descriptors, which the number of molecular descriptors is 115 in total. Compared with the traditional manual information extraction methods, the constructed molecular descriptors can quickly extract the information contained in a large number of molecules in the coal-based liquids. The structural fragments, molecular weights and atomic numbers of the coal-based liquid molecules obtained by the constructed molecular descriptors are used as input features in Machine Learning (ML) to establish a method of predicting the quantitative molecular structure-property relationship (ML-QSPR) of the coal-based liquids, which achieves the fast and accurate prediction of four properties, namely, the lower heating value (LHV), the density of the liquid (ρ), the flash point (FP) and the cetane number (CN). The model validation analysis shows that the model R2 of LHV, ρ, and FP are 0.996, 0.988, and 0.987, respectively. The CN prediction is predicted by adding mixtures, and the R2=0.959. The ML-QSPR method has been improved in terms of prediction accuracy compared to the methods in the literatures and has a significant advantage over the traditional experimental methods in terms of the speed of obtaining properties. Using the information in the property database obtained from the ML-QSPR predictions, the evolution of four combustion performance parameters of different groups of substances is investigated when the number of carbon atoms is increased, and all four properties are significantly affected by the carbon number (n). Comparison of the individual properties of substances of different families shows that the difference in the LHV of substances of different families is small, and the size of LHV is mainly determined by n. For ρ, FP and CN, the difference in the properties of substances of different families is obvious. The trained model can be used to predict new molecules for new fuel design. The ML-QSPR method is expected to be used as a transfer learning model for the property analysis of different coal-based liquids when being applied in other application scenarios.

     

/

返回文章
返回