详细信息

一种基于XML和规则库的专利数据抽取方法
A Patent Data Extraction Based on XML and Rule Base

文献类型：期刊文献

中文题名：一种基于XML和规则库的专利数据抽取方法

英文题名：A Patent Data Extraction Based on XML and Rule Base

作者：常国锋[1];苗长芬[1]

机构：[1]新乡学院计算机与信息工程学院

第一机构：新乡学院计算机与信息工程学院

年份：2014

卷号：31

期号：6

起止页码：30-32

中文期刊名：新乡学院学报

外文期刊名：Journal of Xinxiang University

基金：河南省科技厅科技攻关计划项目(122102210407);河南省哲学社会科学规划项目(2012CJJ014);新乡学院创新基金项目(12SB17)

语种：中文

中文关键词：XML;规则库;专利;抽取

外文关键词：XML； rule base； patent； extraction

摘要：通过对现有网页数据抽取方法的分析,结合专利网页数据的特点,提出了一种基于XML文件和规则库的专利数据抽取方法。通过自定义标签对网页进行格式化,克服了以往网页采集中只针对<\table>和<\div>标签进行分割提取数据的不足,实现了专利数据的有效采集。实验结果表明该方法具有很高的准确性和适用性。
In this article, tho authors presented patent data extraction based on XML and rule base, after analyzing the current extraction of web pages and combining with the characteristics of patent web pages. They formatted web pages by custom tag, thus overcame the shortcomings of extraction data from web pages merely aiming at 〈table〉 tag and 〈div〉 tag before, and realized efficient collections. The experimental results showed that the method has high veracity and applicability.

参考文献：

正在载入数据...

新乡学院机构知识库

详细信息

一种基于XML和规则库的专利数据抽取方法 A Patent Data Extraction Based on XML and Rule Base

参考文献：

一种基于XML和规则库的专利数据抽取方法
A Patent Data Extraction Based on XML and Rule Base