Accelerating Discovery: Mining Unstructured Information for Hypothesis Generation

Accelerating Discovery: Mining Unstructured Information for Hypothesis Generation

by Scott Spangler

Hardcover

$94.50 $105.00 Save 10% Current price is $94.5, Original price is $105. You Save 10%.
Choose Expedited Shipping at checkout for guaranteed delivery by Wednesday, January 23

Product Details

ISBN-13: 9781482239133
Publisher: Taylor & Francis
Publication date: 10/09/2015
Series: Chapman & Hall/CRC Data Mining and Knowledge Discovery Series , #37
Pages: 292
Product dimensions: 6.20(w) x 9.20(h) x 0.90(d)

About the Author

Scott Spangler is a principal data scientist, distinguished engineer, and master inventor in the Watson Innovations Group at the IBM Almaden Research Center. He has been involved with knowledge base and data mining research for the past 25 years. His recent work has applied Watson technology to help accelerate cancer research. He holds 45 patents and is the author of over 30 publications. He received a BS in mathematics from MIT and an MS in computer science from the University of Texas.

Table of Contents

Introduction

Why Accelerate Discovery?
Scott Spangler and Ying Chen
THE PROBLEM OF SYNTHESIS
THE PROBLEM OF FORMULATION
WHAT WOULD DARWIN DO?
THE POTENTIAL FOR ACCELERATED DISCOVERY: USING COMPUTERS TO MAP THE KNOWLEDGE SPACE
WHY ACCELERATE DISCOVERY: THE BUSINESS PERSPECTIVE
COMPUTATIONAL TOOLS THAT ENABLE ACCELERATED DISCOVERY
ACCELERATED DISCOVERY FROM A SYSTEM PERSPECTIVE
ACCELERATED DISCOVERY FROM A DATA PERSPECTIVE
ACCELERATED DISCOVERY IN THE ORGANIZATION
CHALLENGE (AND OPPORTUNITY) OF ACCELERATED DISCOVERY

Form and Function
THE PROCESS OF ACCELERATED DISCOVERY
CONCLUSION

Exploring Content to Find Entities
SEARCHING FOR RELEVANT CONTENT
HOW MUCH DATA IS ENOUGH? WHAT IS TOO MUCH?
HOW COMPUTERS READ DOCUMENTS
EXTRACTING FEATURES
FEATURE SPACES: DOCUMENTS AS VECTORS
CLUSTERING
DOMAIN CONCEPT REFINEMENT
MODELING APPROACHES
DICTIONARIES AND NORMALIZATION
COHESION AND DISTINCTNESS
SINGLE AND MULTIMEMBERSHIP TAXONOMIES
SUBCLASSING AREAS OF INTEREST
GENERATING NEW QUERIES TO FIND ADDITIONAL RELEVANT CONTENT
VALIDATION
SUMMARY

Organization
DOMAIN-SPECIFIC ONTOLOGIES AND DICTIONARIES
SIMILARITY TREES
USING SIMILARITY TREES TO INTERACT WITH DOMAIN
EXPERTS
SCATTER-PLOT VISUALIZATIONS
USING SCATTER PLOTS TO FIND OVERLAPS BETWEEN NEARBY ENTITIES OF DIFFERENT TYPES
DISCOVERY THROUGH VISUALIZATION OF TYPE SPACE

Relationships
WHAT DO RELATIONSHIPS LOOK LIKE?
HOW CAN WE DETECT RELATIONSHIPS?
REGULAR EXPRESSION PATTERNS FOR EXTRACTING
RELATIONSHIPS
NATURAL LANGUAGE PARSING
COMPLEX RELATIONSHIPS
EXAMPLE: P53 PHOSPHORYLATION EVENTS
PUTTING IT ALL TOGETHER
EXAMPLE: DRUG/TARGET/DISEASE RELATIONSHIP
NETWORKS
CONCLUSION

Inference
CO-OCCURRENCE TABLES
CO-OCCURRENCE NETWORKS
RELATIONSHIP SUMMARIZATION GRAPHS
HOMOGENEOUS RELATIONSHIP NETWORKS
HETEROGENEOUS RELATIONSHIP NETWORKS
NETWORK-BASED REASONING APPROACHES
GRAPH DIFFUSION
MATRIX FACTORIZATION
CONCLUSION

Taxonomies
TAXONOMY GENERATION METHODS
SNIPPETS
TEXT CLUSTERING
TIME-BASED TAXONOMIES
KEYWORD TAXONOMIES
NUMERICAL VALUE TAXONOMIES
EMPLOYING TAXONOMIES

Orthogonal Comparison
AFFINITY
COTABLE DIMENSIONS
COTABLE LAYOUT AND SORTING
FEATURE-BASED COTABLES
COTABLE APPLICATIONS
EXAMPLE: MICROBES AND THEIR PROPERTIES
ORTHOGONAL FILTERING
CONCLUSION

Visualizing the Data Plane
ENTITY SIMILARITY NETWORKS
USING COLOR TO SPOT POTENTIAL NEW HYPOTHESES
VISUALIZATION OF CENTROIDS
EXAMPLE: THREE MICROBES
CONCLUSION

Networks
PROTEIN NETWORKS
MULTIPLE SCLEROSIS AND IL7R
EXAMPLE: NEW DRUGS FOR OBESITY
CONCLUSION

Examples and Problems
PROBLEM CATALOGUE
EXAMPLE CATALOGUE

Problem: Discovery of Novel Properties of Known Entities
ANTIBIOTICS AND ANTI-INFLAMMATORIES
SOS PATHWAY FOR ESCHERICHIA COLI
CONCLUSIONS

Problem: Finding New Treatments for Orphan Diseases from Existing Drugs
IC50:IC50

Example: Target Selection Based on Protein Network Analysis
TYPE 2 DIABETES PROTEIN ANALYSIS

Example: Gene Expression Analysis for Alternative Indications
NCBI GEO DATA
CONCLUSION

Example: Side Effects

Example: Protein Viscosity Analysis Using Medline Abstracts
DISCOVERY OF ONTOLOGIES
USING ORTHOGONAL FILTERING TO DISCOVER IMPORTANT RELATIONSHIPS

Example: Finding Microbes to Clean Up Oil Spills
ENTITIES
USING COTABLES TO FIND THE RIGHT COMBINATION OF FEATURES
DISCOVERING NEW SPECIES
ORGANISM RANKING STRATEGY
CHARACTERIZING ORGANISMS
CONCLUSION

Example: Drug Repurposing
COMPOUND 1: A PDE5 INHIBITOR
PPARα/γ AGONIST

Example: Adverse Events
FENOFIBRATE
PROCESS
CONCLUSION

Example: Discovering New P53 Kinases
AN ACCELERATED DISCOVERY APPROACH BASED ON ENTITY SIMILARITY
RETROSPECTIVE STUDY
EXPERIMENTAL VALIDATION
CONCLUSION

Conclusion and Future Work
ARCHITECTURE
FUTURE WORK
ASSIGNING CONFIDENCE AND PROBABILITIES TO ENTITIES, RELATIONSHIPS, AND INFERENCES
DEALING WITH CONTRADICTORY EVIDENCE
UNDERSTANDING INTENTIONALITY
ASSIGNING VALUE TO HYPOTHESES
TOOLS AND TECHNIQUES FOR AUTOMATING THE DISCOVERY PROCESS
CROWD SOURCING DOMAIN ONTOLOGY CURATION
FINAL WORDS

References appear at the end of most chapters.

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews