Journal article icon

Journal article

Information extraction and linked open data in chemistry

Abstract:
Chemists not only produce a significant amount of data-rich scholarly communication artifacts, but have also adopted a highly formulaic style of writing. The literature of this discipline is an attractive target for automated data extraction. In previous work, we have demonstrated the identification and extraction of chemical entities from scientific papers.[1][2] However, we have not addressed the extraction of the relationships linking the chemical entities to both each other as well as to the document object from which they were extracted. Using chemical synthesis procedures as an exemplar, we present a methodology for the extraction of both chemical entities and the relationships between them using these techniques. Chemical synthesis procedures are collected by data-mining the chemical literature. Natural language processing tools and entity recognisers are then used to analyse the individual elements within these procedures and provide a grammatical structure. Relationships between the individual entities are then established. This structured information is then stored in RDF[3] using domain-specific ontologies. Once information is expressed in a semantic format, it can then be searched and indexed using the RDF querying Language SPARQL[4] as well as generate visualisations such as visual document summaries. The ultimate goal of the work documented here is to make data contained in publications available and re-usable by the scientific community.
Publication status:
Not published
Peer review status:
Peer reviewed

Actions

Access Document

Authors

More by this author
Institution:
"University of Cambridge"
Department:
Unilever Centre for Molecular Science Informatics,Department of Chemistry
Role:
Author
More by this author
Institution:
"University of Cambridge"
Department:
Unilever Centre for Molecular Science Informatics,Department of Chemistry
Role:
Author
More by this author
Institution:
"University of Cambridge"
Department:
Unilever Centre for Molecular Science Informatics,Department of Chemistry
Role:
Author
More by this author
Institution:
"University of Cambridge"
Department:
Unilever Centre for Molecular Science Informatics,Department of Chemistry
Role:
Author


Edition:
Author's Original


Language:
English
Keywords:
Subjects:
UUID:
uuid:400e170b-98d3-41ee-afcf-fff1918b0452
Local pid:
ora:3120
Deposit date:
2009-12-01
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP