Book – http://www.manning.com/ingersoll/

LingPipe

OpenCalais

http://opennlp.sourceforge.net/

Tika is a content extraction framework + Apache Solr Content Extraction Library (Solr Cell) + varios packages of solr