博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
【parser】stanford-parser demo使用
阅读量:6234 次
发布时间:2019-06-22

本文共 4197 字,大约阅读时间需要 13 分钟。

测试站点:

先贴点代码,是stanfor-parser的demo:

import java.util.Collection;import java.util.List;import java.io.StringReader;import edu.stanford.nlp.process.TokenizerFactory;import edu.stanford.nlp.process.CoreLabelTokenFactory;import edu.stanford.nlp.process.DocumentPreprocessor;import edu.stanford.nlp.process.PTBTokenizer;import edu.stanford.nlp.ling.CoreLabel;import edu.stanford.nlp.ling.HasWord;import edu.stanford.nlp.ling.Sentence;import edu.stanford.nlp.trees.*;import edu.stanford.nlp.parser.lexparser.LexicalizedParser;class ParserDemo {  /**   * The main method demonstrates the easiest way to load a parser.   * Simply call loadModel and specify the path, which can either be a   * file or any resource in the classpath.  For example, this   * demonstrates loading from the models jar file, which you need to   * include in the classpath for ParserDemo to work.   */  public static void main(String[] args) {    LexicalizedParser lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");    if (args.length > 0) {      demoDP(lp, args[0]);    } else {      demoAPI(lp);    }  }  /**   * demoDP demonstrates turning a file into tokens and then parse   * trees.  Note that the trees are printed by calling pennPrint on   * the Tree object.  It is also possible to pass a PrintWriter to   * pennPrint if you want to capture the output.   */  public static void demoDP(LexicalizedParser lp, String filename) {    // This option shows loading and sentence-segmenting and tokenizing    // a file using DocumentPreprocessor.    TreebankLanguagePack tlp = new PennTreebankLanguagePack();    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();    // You could also create a tokenizer here (as below) and pass it    // to DocumentPreprocessor    for (List
sentence : new DocumentPreprocessor(filename)) { Tree parse = lp.apply(sentence); parse.pennPrint(); System.out.println(); GrammaticalStructure gs = gsf.newGrammaticalStructure(parse); Collection tdl = gs.typedDependenciesCCprocessed(); System.out.println(tdl); System.out.println(); } } /** * demoAPI demonstrates other ways of calling the parser with * already tokenized text, or in some cases, raw text that needs to * be tokenized as a single sentence. Output is handled with a * TreePrint object. Note that the options used when creating the * TreePrint can determine what results to print out. Once again, * one can capture the output by passing a PrintWriter to * TreePrint.printTree. */ public static void demoAPI(LexicalizedParser lp) { // This option shows parsing a list of correctly tokenized words String[] sent = { "This", "is", "an", "easy", "sentence", "." }; List
rawWords = Sentence.toCoreLabelList(sent); Tree parse = lp.apply(rawWords); parse.pennPrint(); System.out.println(); // This option shows loading and using an explicit tokenizer String sent2 = "This is another sentence."; TokenizerFactory
tokenizerFactory = PTBTokenizer.factory(new CoreLabelTokenFactory(), ""); List
rawWords2 = tokenizerFactory.getTokenizer(new StringReader(sent2)).tokenize(); parse = lp.apply(rawWords2); TreebankLanguagePack tlp = new PennTreebankLanguagePack(); GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory(); GrammaticalStructure gs = gsf.newGrammaticalStructure(parse); List
tdl = gs.typedDependenciesCCprocessed(); System.out.println(tdl); System.out.println(); TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed"); tp.printTree(parse); } private ParserDemo() {} // static methods only}

结果:

Your query猴子喜欢吃香蕉。Segmentation猴子喜欢吃香蕉。Tagging猴子/NR喜欢/VV吃/VV香蕉/NN。/PUParse(ROOT  (IP    (NP (NR 猴子))    (VP (VV 喜欢)      (IP        (VP (VV 吃)          (NP (NN 香蕉)))))    (PU 。)))Typed dependenciesnsubj(喜欢-2, 猴子-1)root(ROOT-0, 喜欢-2)ccomp(喜欢-2, 吃-3)dobj(吃-3, 香蕉-4)Typed dependencies, collapsednsubj(喜欢-2, 猴子-1)root(ROOT-0, 喜欢-2)ccomp(喜欢-2, 吃-3)dobj(吃-3, 香蕉-4)

 

转载于:https://www.cnblogs.com/549294286/archive/2013/05/08/3067534.html

你可能感兴趣的文章
让Baidu和Google收录你的Hexo博客
查看>>
TypeScript 2019 路线图:更效率,更易用!
查看>>
【微信】第三方登录接口流程
查看>>
纯C实现的词法分析和lex实现的词法分析的对比
查看>>
喧喧发布 2.5.3 版本,主要提升系统稳定性,优化交互体验
查看>>
iOS应用显示一直安装,重启无效,也删除不了问题
查看>>
怎么把荣耀8x的手机备忘录导到电脑里?
查看>>
Elastic Search 新手入门笔记(一)
查看>>
把 Excel 透视表搬到 WEB 上
查看>>
Jenkins配合github实现前端项目自动化构建部署
查看>>
荐书:「时差党」——出国留学不是一件容易的事
查看>>
用户自定义结构数据与VARIANT转换
查看>>
java8-lambda
查看>>
Windows平台JxCore打包
查看>>
[译]ASP.NET Core依赖注入深入讨论
查看>>
设计模式(九)_代理模式
查看>>
02 回归算法 - 线性回归求解 θ(最大似然估计求解)
查看>>
【算法学习】算法的复杂度
查看>>
14、python常用模块
查看>>
swift4.0 CAKeyframeAnimation,抖动动画
查看>>