The Provenance of Electronic Data.Moreau, L.; Groth, P.; Miles, S.; Vazquez, J.; Ibbotson, J.; Jiang, S.; Munroe, S.; Rana, O.; Schreiber, A.; Tan, V.; and Varga, L.2008.Communications of the ACM, 51:52--58. The Provenance of Electronic DataBibtexAbstract:
In the study of fine art, provenance refers to the documented history of some art object. Given that documented history, the object attains an authority that allows scholars to appreciate its importance with respect to other works, whereas, in the absence of such history, the object may be treated with some skepticism. Our IT landscape is evolving as illustrated by applications that are open, composed dynamically, and that discover results and services on the fly. Against this challenging background, it is crucial for users to be able to have confidence in the results produced by such applications. If the provenance of data produced by computer systems could be determined as it can for some works of art, then users, in their daily applications, would be able to interpret and judge the quality of data better. We introduce a provenance lifecycle and advocate an open approach based on two key principles to support a notion of provenance in computer systems: documentation of execution and user-tailored provenance queries.
Electronically Querying for the Provenance of Entities.Miles, S.2006.In Provenance and Annotation of Data, International Provenance and Annotation Workshop, IPAW 2006, Chicago, IL, USA, May 3-5, 2006, Revised Selected Papers, 184--192, Springer. Electronically Querying for the Provenance of EntitiesBibtex
liblicense provides a straight-forward way for developers to build license-aware applications. liblicense utilizes a pluggable module system for reading and writing metadata from specific file types, allowing extensibility for specific content types.
Architecture for Provenance Systems.Groth, P.; Miles, S.; Tan, V.; and Moreau, L.2005.University of Southampton, October. Bibtex
Graphs-at-a-time: query language and access methods for graph databases.He, H.; Singh; and K, A.2008.In SIGMOD Conference, 405--418. Bibtex
Workflow systems have become increasingly popular for managing experiments where many bioinformatics tasks are chained together. Due to the large amount of data generated by these experiments and the need for reproducible results, provenance has become of paramount importance. Workflow systems are therefore starting to provide support for querying provenance. However, the amount of provenance information may be overwhelming, so there is a need for abstraction mechanisms to help users focus on the most relevant information. The technique we pursue is that of "user views". Since bioinformatics tasks may themselves be complex sub-workflow, a user view determines what level of sub-workflow the user can see, and thus what data and tasks are visible in provenance queries. In this paper, we formalize the notion of user views, demonstrate how they can be used in provenance queries, and give an algorithm for generating a user view based on which tasks are relevant for the user. We then describe our prototype and give performance results. Although presented in the context of scientific workflow, the technique applies to other data-oriented workflow.