Journal article
Information extraction and linked open data in chemistry
- Abstract:
- Chemists not only produce a significant amount of data-rich scholarly communication artifacts, but have also adopted a highly formulaic style of writing. The literature of this discipline is an attractive target for automated data extraction. In previous work, we have demonstrated the identification and extraction of chemical entities from scientific papers.[1][2] However, we have not addressed the extraction of the relationships linking the chemical entities to both each other as well as to the document object from which they were extracted. Using chemical synthesis procedures as an exemplar, we present a methodology for the extraction of both chemical entities and the relationships between them using these techniques. Chemical synthesis procedures are collected by data-mining the chemical literature. Natural language processing tools and entity recognisers are then used to analyse the individual elements within these procedures and provide a grammatical structure. Relationships between the individual entities are then established. This structured information is then stored in RDF[3] using domain-specific ontologies. Once information is expressed in a semantic format, it can then be searched and indexed using the RDF querying Language SPARQL[4] as well as generate visualisations such as visual document summaries. The ultimate goal of the work documented here is to make data contained in publications available and re-usable by the scientific community.
- Publication status:
- Not published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Author's original, bin, 611.7KB, Terms of use)
-
Authors
- Edition:
- Author's Original
- Language:
-
English
- Keywords:
- Subjects:
- UUID:
-
uuid:400e170b-98d3-41ee-afcf-fff1918b0452
- Local pid:
-
ora:3120
- Deposit date:
-
2009-12-01
- ARK identifier:
Terms of use
- Copyright holder:
- LHawizy et al
- Copyright date:
- 2009
- Notes:
-
References
[1] S. E. Adams, J. M. Goodman, R. J. Kidd, A. D. McNaught, P. Murray-Rust, F. R. Norton, J. A. Townsend, and C. A. Waudby, “Experimental data checker: Better information for organic chemists,” Organic and Biomolecular Chemistry,
vol. 2, pp. 3067 –3070, 2004.
[2] P. Corbett and P. Murray-Rust, “High-throughput identification of chemistry in life science texts,” 2006, pp. 107–118. [Online]. Available: http://dx.doi.org/10.1007/11875741 11
[3] W. Consortium, “Rdf primer,” http://www.w3.org/TR/rdf-primer/ , last accessed: 07/08/09.
[4] ——, “Sparql query language for rdf,” http://www.w3.org/TR/rdf-sparql-query/ , last accessed: 07/08/09.
If you are the owner of this record, you can report an update to it here: Report update to this record