Skip to content

DKPro Core 1.11.0

Choose a tag to compare

@reckart reckart released this 05 Jul 13:27

We are pleased to announce the release of

DKPro Core 1.11.0

a collection of interoperable software components for natural language processing (NLP) based on the Apache UIMA framework.

https://dkpro.github.io/dkpro-core

This is a feature release.

Important upgrade notice

  • Changed groupIds and artifactIds. The group ID is now org.dkpro.core and the artifact IDs are dkpro-core-...-(asl/gpl)
  • Changed package names. The packages are now all starting with org.dkpro.core... - except the packages of UIMA types which remain unchanged for data compatibility.

Notable changes since DKPro Core 1.10.0

  • Changed parts of the brat data conversion code such that it can be more easily used outside a UIMA component
  • Changed type mapping such that out-of-tagset types map to the generic type (e.g. an unknown POS tag maps to POS, not to POS_X)
  • Changed name of NYTCollectionReader to NitfReader
  • Added types to encode XML document structure in CAS
  • Added new XmlDocumentReader/Writer components using these types
  • Added basic reader for Annotated Gigaword corpus (only reads text so far) (thanks @az79nefy)
  • Added basic support for PubAnnotation JSON format
  • Added Maui component for keyword assignment
  • Added parameter to SfstAnnotator to enable lower-case lookup of first word in a sentence (thanks @rziai)
  • Added "order" feature to Token type
  • Added support for CoNLL-U document and paragraph IDs (thanks @manuelciosici)
  • Added support for CoNLL-U sentence IDs and text
  • Added standardized parameter to disable type mapping
  • Added support for TCF orthography layer using SofaChangeAnnotations
  • Added segmenter for Chinese using jieba (thanks @Horsmann)
  • Added MyStem for Russian
  • Added links to OpenMinTeD categories in type system documentation
  • Added support for the reading/writing the CoreNLP CoNLL flavor
  • Added parameter to configure the Tika buffer size (useful for large documents)
  • Updated to OpenNLP 1.9.1
  • Updated to CoreNLP 3.9.2
  • Updated to ICU4J 64.2
  • Updated to Tika 1.19.1
  • Updated to LanguageTool 4.3
  • Updated to PDFBox 2.0.12
  • Updated IllinoisNLP components
  • Updated TreeTagger models/binaries in build.xml script (thanks @tilmanbeck)
  • Updated LIF dependencies
  • Updated dataset descriptions
  • Updated various general dependencies (e.g. Apache Commons etc.)
  • Improved robustness of checksum verification for text files used in datasets (e.g. license files)
  • Improved error messages in WebAnno TSV3 module
  • Fixed crash in WebannoTsv3XWriter when annotations do not start/end at token boundaries
  • Fixed bug in WebAnno TSV3 support causing span annotations with slot features to disappear
  • Fixed trimming of whitespace in TeiReader
  • Fixed bug in NifWriter causing named entity identifier not to be written
  • Fixed crash in BratReader with reading discontinuous segments
  • Fixed problem in BratWriter when dealing with slot features
  • Fixed metadata of CoNLL2012Writer
  • Fixed potential problem of datasets being written outside their target directory
  • Dropped the GrAF I/O module since the upstream libraries are outdated and no longer maintained

A more detailed overview of the changes in this release can be found here.

Thanks for contributions go to: @az79nefy, @ramonziai, @manuelciosici, @Horsmann, @tilmanbeck

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.