Tika in Action

Overview

Summary

Tika in Action is a hands-on guide to content mining with Apache Tika. The book's many examples and case studies offer real-world experience from domains ranging from search engines to digital asset management and scientific data processing.

About the Technology

Tika is an Apache toolkit that has built into it everything you and your...

See more details below
Paperback
$32.30
BN.com price
(Save 28%)$44.99 List Price

Pick Up In Store

Reserve and pick up in 60 minutes at your local store

Other sellers (Paperback)
  • All (15) from $26.19   
  • New (12) from $26.19   
  • Used (3) from $33.59   
Sending request ...

Overview

Summary

Tika in Action is a hands-on guide to content mining with Apache Tika. The book's many examples and case studies offer real-world experience from domains ranging from search engines to digital asset management and scientific data processing.

About the Technology

Tika is an Apache toolkit that has built into it everything you and your app need to know about file formats. Using Tika, your applications can discover and extract content from digital documents in almost any format, including exotic ones.

About this Book

Tika in Action is the ultimate guide to content mining using Apache Tika. You'll learn how to pull usable information from otherwise inaccessible sources, including internet media and file archives. This example-rich book teaches you to build and extend applications based on real-world experience with search engines, digital asset management, and scientific data processing. In addition to architectural overviews, you'll find detailed chapters on features like metadata extraction, automatic language detection, and custom parser development.

This book is written for developers who are new to both Scala and Lift and covers just enough Scala to get you started.

Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.

What's Inside

  • Crack MS Word, PDF, HTML, and ZIP
  • Integrate with search engines, CMS, and other data sources
  • Learn through experimentation
  • Many examples

This book requires no previous knowledge of Tika or text mining techniques. It assumes a working knowledge of Java.

========================================‚Äč==

Table of Contents

  1. PART 1 GETTING STARTED
  2. The case for the digital Babel fish
  3. Getting started with Tika
  4. The information landscape

  5. PART 2 TIKA IN DETAIL
  6. Document type detection
  7. Content extraction
  8. Understanding metadata
  9. Language detection
  10. What's in a file?

  11. PART 3 INTEGRATION AND ADVANCED USE
  12. The big picture
  13. Tika and the Lucene search stack
  14. Extending Tika

  15. PART 4 CASE STUDIES
  16. Powering NASA science data systems
  17. Content management with Apache Jackrabbit
  18. Curating cancer research data with Tika
  19. The classic search engine example
Read More Show Less

Product Details

  • ISBN-13: 9781935182856
  • Publisher: Manning Publications Company
  • Publication date: 11/28/2011
  • Pages: 256
  • Sales rank: 1,093,981
  • Product dimensions: 7.30 (w) x 9.20 (h) x 0.60 (d)

Meet the Author

Chris Mattmann has a wealth of experience in software design, and in the construction of large-scale data-intensive systems. His work has infected a broad set of communities, ranging from helping NASA unlock data from its next generation of earth science system satellites, to assisting graduate students at the University of Southern California (his Alma mater) in the study of software architecture, all the way to helping industry and open source as a member of the Apache Software Foundation. When he's not busy being busy, he's spending time with his lovely wife and son braving the mean streets of Southern California.

Jukka Zitting is a core Tika developer with over a decade of experience of open source content management. Jukka works as a Senior Developer for the Swiss content management company Day Software, and is a member of the JCP expert group for the Content Repository for Java Technology API. He is a member of the Apache Software Foundation and the chairman of the Apache Jackrabbit project.

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star

(0)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

Your Rating:

Your Name: Create a Pen Name or

Barnes & Noble.com Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & Noble.com that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & Noble.com does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at BN.com or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation

Reminder:

  • - By submitting a review, you grant to Barnes & Noble.com and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Noble.com Terms of Use.
  • - Barnes & Noble.com reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & Noble.com also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on BN.com. It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

 
Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)