李福兴, 李璐爔. 面向煤炭开采的大数据处理平台构建关键技术[J]. 煤炭学报, 2019, 44(S1): 362-369. DOI: 10.13225/j.cnki.jccs.2019.0252
引用本文: 李福兴, 李璐爔. 面向煤炭开采的大数据处理平台构建关键技术[J]. 煤炭学报, 2019, 44(S1): 362-369. DOI: 10.13225/j.cnki.jccs.2019.0252
LI Fuxing, LI Luxi. Key technologies of big data processing platform construction for coal mining[J]. Journal of China Coal Society, 2019, 44(S1): 362-369. DOI: 10.13225/j.cnki.jccs.2019.0252
Citation: LI Fuxing, LI Luxi. Key technologies of big data processing platform construction for coal mining[J]. Journal of China Coal Society, 2019, 44(S1): 362-369. DOI: 10.13225/j.cnki.jccs.2019.0252

面向煤炭开采的大数据处理平台构建关键技术

Key technologies of big data processing platform construction for coal mining

  • 摘要: 针对我国煤炭开采技术已步入机械化、自动化、智能化无人开采及建设智慧矿山阶段,煤炭开采与其他工业领域一样逐步进入新的依靠数据生产的时代,由此产生的海量数据处理的问题,提出构建煤炭开采的大数据处理平台。在分析了我国煤炭开采生产中产生的数据量大、多样性、时效性强、数据失真可能性大、预判性要求高、数据价值密度低等特点,提出了以大数据理论和技术为基础,从硬件和软件两部分架构平台。硬件部分:提出基于原有信息化建设所选配服务器基础上,应用服务器集群技术搭建服务器集群,对其进行升级改配,不足部分再根据运行需要不断增配和调整; 对集群服务器中的结点名称服务器内存大小,根据管理文件数量多少、文件块的大小、管理服务器的数量及每个服务数据的存储量,并结合服务器CPU的虚拟内核数及超线程数计算出其内存配置; 对于服务器的CPU,提出主结点服务器用多内核多线程的CPU; 对于集群存储系统,提出将服务器应用软件存储与海量数据储存分离,服务器本机选用固态盘用于存储应用软件,海量数据存储系统采用网络接入存储和存储区域网络集成整合模式,实现数据统一、集中管理、扩充容易、容错,保障网络无单点故障,提高集群I/O速度。软件部分:经分析平台构建需满足实现批量处理和流式计算及透明性,实现增量计算、分布式内存并行计算、高可用高可扩展的内存计算; 满足煤炭开采生产中对各类数据的计数、求和、平均等计算,各类大量实时数据采集传感器数据的融合决策中的方差、标准差等实时计算等需要; 满足多维度、长时间、多次重新计算等需要,提出采用Hadoop及Storm为主构建分布式大数据处理系统,其服务器操作系统用CentOS、日志消息处理用Flume软件、数据接入缓冲用kafka软件等关键技术,平台数据可视化软件使用户可根据需要做相应的选择,并不影响平台对数据的处理。

     

    Abstract: This paper puts forward a proposal regarding the construction of a big data processing platform for coal mining in view of the facts that China's coal mining technology has stepped into the stage of mechanized, automated and intellectualized unmanned mining, as well as the construction of intelligent mines, and that the coal mining, like other industrial fields, has gradually entered a new era of relying on data production, resulting in massive data processing problems.After analyzing the characteristics of large amount of data, diversity, strong timeliness, high possibility of data distortion, high requirement of predictability and low density of data value generated in coal mining production in China, the paper proposes to construct a platform for both hardware and software based on the theory and technology of big data.In terms of hardware part, based on the selected servers of the original informatization construction, the server cluster technology is applied to build the server cluster, which is upgraded and reallocated.The inadequacies are continuously allocated and adjusted according to the operation needs.According to the number of management files, the size of file blocks, the number of management servers and the storage capacity of each service data, the memory configuration of the node name server in the cluster server is calculated according to the number of virtual cores and hyperthreads of the server CPU.For the CPU of the server, a multi-core and multi-threaded CPU is proposed for the main node server.For cluster storage system, after making comparison, it is proposed to separate the storage of server application software from the storage of mass data.Solid state disk is used to store application software on the server itself.The integration mode of network access storage and storage area network is adopted in mass data storage system to realize data unification, centralized management, easy expansion and fault tolerance, to ensure no single-point failure of network, and to improve cluster I/O speed.In terms of software part, according to the analysis, platform construction needs to meet the requirements of batch processing, flow calculation and transparency, incremental computing, distributed memory parallel computing, high available and scalable memory computing, counting, summing and averaging of various data in coal mining production, and real-time calculation of variance and standard deviation in fusion decision-making of a large number of real-time data acquisition sensor data and to meet the needs of multi-dimensional, long-term and multiple recalculations.This paper proposes to adopt a distributed big data processing system based on Hadoop and Storm.CentOS is adopted as the server operating system.Flume software is used for log message processing.Kafka software and other key technologies are used for data access buffer.The platform data visualization software can be chosen by users according to their needs, without affecting the platform's data processing.

     

/

返回文章
返回