microarray-ontol-digest Tuesday, July 4 2000 Volume 01 : Number 002 ---------------------------------------------------------------------- Date: Mon, 19 Jun 2000 14:32:31 -0400 From: Michael Bittner Subject: [microarray-ontol] Ontolgy working group notes from Heidelberg Notes from the ontologies working group sessions at the Heidelberg MGED meeting. The discussions on ontologies at the last MGED meeting identified a large number of types of sample characterization which were generally agreed to be essential, and a large number of perspectives on what the most sensible way to view the links between these characterizations would be. There remains a significant amount of difference between the way that people who mainly think of managing this data from the experimentalist point of view and the way that people who have come to this problem from a systematics point of view see the relationships. Some of the flavor of this distinction can be seen in the recent postings from Mike Eisen and Chris Stoeckert. Another important trend in the meeting was the desire to allow for a collaborative spirit from the investigators depositing the data. In general, the ordinary, proximate data associated with an array experiment would be mandatory. The more extended descriptions of the sample, and what was done to it would be made more voluntary, relying on the desire of submitters to cooperate in depositing sufficiently informative detail. If accepted, this strategy is important for our group. It would allow flexibility in tailoring the kinds of entries more specifically to the organism from which the samples are drawn, and in allowing for data types where the submitter would provide tags for both the type of data and the value for that piece of data. Appended below as Table 1 are a list of the various classes of data which the working group suggested as necessary to support, along with examples of some particular types of data that would belong in those classes. Below it is a more structured view, reworked from the concepts after the meeting by C.Stoeckert, W. Liebermeister and M.Bittner. Steffen Schulze-Kremer, who led the discussion in Heidelberg, has also started from the suggestions in Table 1 and is working on a formal ontology based on this starting point. At this point, we need feedback from the group on whether the list is an appropriate starting place. Does it provide an upper level description which is broad enough to encompass the kinds of data entries which you would want to make in describing your array experiment? Further feedback on whether the structured outlines for provider, taxonomy and phenotype are close to a workable form would also be very helpful. If these are close, then we could begin testing how we could implement them with the ontologies available for some organisms. Finally, it is apparent that the structure of the History/Defined Properties unit could be improved. This area can clearly run out to infinite detail. Where should we put the initial boundaries? What kinds of entries of this sort should we attempt to support? Finally, we are at the point where we need a central source of information that is easily accessible. Chris Stoeckert has begun putting up a website by loading our goals and the preliminary drafts of the recommendations. He will be broadcasting the URL soon. In addition to a site for the evolving working documents, this site will serve as a source listing of URL pointers to ontologies that should be reviewed for utility in our application. I hope that many of you will devote some further effort to refining these recommendations, and I thank all of you who participated in Heidelberg. Best, Mike - ------------------------ Table 1 SAMPLE Species (strain/cultivar, name..identifier..label) Genotype (mutants, transgenics, sex, ploidy, transfected-stable, transient transfection, epigenetic, heterogeneity Developmental Stage (embryological, age, morphology, synchronized for cell cycleŠ.) Environment (culturing conditions, nutrients, temperature, passage number, media, density, contamination/purity, co-culture, host) Anatomy (tissue/organ, cellular composition-homogeneity Pathology (clinical history, pathological staging History (treated for mycoplasma, family history, dead or alive, time of day at sampling Treatment (chemical, physical, behavioral, time after treatment Behavior (growth rate, neurological Phenotype ( Description (race, sizeŠ Source (donor, provider, owner, where it came from ­geographic location, (commercial/other) source, availability, catalog #, In vivo/in vitro Infection Quality (purity) Physio-chemical composition of the sample Amount (# of cells Cell lines from multi-cellular organism ALTERNATIVE TREES SAMPLE INTO BIOLOGICAL/NON-BIOLOGICAL BIOLOGICAL INTO WHOLE ORGANISM/PARTS/ PRIMARY CELLS/CELL LINES - ------------------------ Table 2 BIOLOGICAL SOURCE: SAMPLE PROVIDER (adminstrative provider, commercial/other source, availability, catalog #) TAXONOMY STRAIN/CULTIVAR (properties of a collection of individuals, group identifier) GENOTYPE mutants, transgenics, sex, ploidy, transfected-stable, transient transfection, epigenetic, heterogeneity INDIVIDUAL Name, identifier/clinical tag, label ANATOMY TISSUE/ORGAN (cellular composition-homogeneity) CELL purified from primary tissue CELL LINE DEVELOPMENTAL STAGE embryological, age, morphology, synchronized for cell cycleŠ PHENOTYPE BEHAVIOR PATHOLOGY clinical history pathological staging HISTORY/DEFINED PROPERTIES culturing conditions/media nutrients, temperature, co-culture, host, passage number, treated for mycoplasma, density, contamination/purity, infection, family history. time of day at sampling, pre or post mortem at sampling, Physio-chemical composition of the sample, amount of material, # of cells ENVIRONMENT where it came from ­geographic location in vivo in vitro in situ (xenografts) TREATMENT agents (drugs, chemicals, biochemicals, physical) time after treatment behavioral stimulus ------------------------------ Date: Tue, 4 Jul 2000 15:57:24 +0200 (DFT) From: Paolo Romano Subject: [microarray-ontol] Re: [microarray-format] CABRI data format being considered for OMG RFP-7? Dear Patricia and all, sorry for the late answer, but I was back yesterday from two weeks of holidays. > Hi Paolo! > > The microarray-format group seems to be doing much of the same work in > establishing proposed data formats that CABRI has done. Are you planning to > submit this work to the OMG for consideration, or are you working with the > MGED-format group to have your data types considered for their synthesized > proposal? Just curious. Well, I'm pretty new to the OMG activity, so I don't think I will be able to submit something very soon. What is certain is that our activity will go on. We will develop new servers for distributing the catalogues of the European Biological Resources Centers (BRCs) and we will certainly design new standards for the interoperability of our systems. My feeling is that we should support the developement of new OMG standards for the definition of objects involved with BRCs (cell lines, bacteria strains, etc...). At the same time, we should define new XML standards for this information. I think that the description of the strains' information can easily and efficiently be managed alone, and that it can be referenced by other standards/systems. In other words, by considering for example an animal cell line, the object "cell line" should be described separately from other standards and referenced by them. In fact, I wonder if the description of the microarray experiments should also include a detailed description of the biological sample when this can efficiently be described by the collections that actually distributed it. I'm not speaking, of course, about the treatment or history of the sample after its acquisition, but of the source of the sample. BRCs are already a reference for the samples (types), they can also be a reference for the information about them. If a sample would not come from a reference BRC, it could as well be described with the same information, possibly on the basis of the standards that we are going to define and implement. Finally, I would like to add that we should try to use, when possible, for biological samples used in microarrays's ecxperiments the same ontologies that have been adopted by BRCs (e.g., by CABRI), that have a long experience on this field, instead of re-defining (or inventing) our own. This can also be a stimulus for BRCs to improve the contents of their catalogues. Ciao. Paolo > -----Original Message----- > From: Paolo Romano [mailto:paolo@risc220.ist.unige.it] > Sent: Tuesday, June 06, 2000 6:57 AM > To: microarray-ontol@ebi.ac.uk > Subject: [microarray-ontol] Ontologies for biological samples (Addendum) > > Dear microarray-ontologists, > > sorry, I forgot to mention, for those interested, the URL where > the MDS can be retrieved. Here it is: > > http://www.cabri.org/CABRI/home/guidelines/catalogue/CPdata.html > > Best regards. Paolo Romano - -- Paolo Romano (paolo@ist.unige.it) Biotechnology Department, Natl Inst. for Cancer Research c/o Advanced Biotechnology Centre Largo Rosanna Benzi, 10, I-16132, Genova, Italy Tel: +39-010-5737-288 Fax: +39-010-5737-295 ------------------------------ End of microarray-ontol-digest V1 #2 ************************************