microarray-ontol-digest Monday, July 24 2000 Volume 01 : Number 003 ---------------------------------------------------------------------- Date: Wed, 05 Jul 2000 13:43:43 -0400 From: Chris Stoeckert Subject: Re: [microarray-ontol] Re: [microarray-format] CABRI data format being considered for OMG RFP-7? Hi Paolo, > In fact, I wonder if the description of the microarray experiments > should also include a detailed description of the biological sample > when this can efficiently be described by the collections that actually > distributed it. I'm not speaking, of course, about the treatment > or history of the sample after its acquisition, but of the source > of the sample. BRCs are already a reference for the samples (types), > they can also be a reference for the information about them. > According to our proposed concept structure (see http://www.cbil.upenn.edu/Ontology/), cell line is at the bottom of our hierarchy. I think that it would be great if we could supply a pointer to the collections that supplied the cell line when available. In fact we do that through Provider, and that would be the place to get cell line details such as passage number. We still need a minimum description of the levels above cell line to be provided. Going out and grabbing that information, I think should be done by the submitter not by the database, and it would be great if the collections made it easy to do so by involvement such as yours. > > If a sample would not come from a reference BRC, it could as well > be described with the same information, possibly on the basis of > the standards that we are going to define and implement. > Finally, I would like to add that we should try to use, when possible, > for biological samples used in microarrays's ecxperiments the same > ontologies that have been adopted by BRCs (e.g., by CABRI), that have > a long experience on this field, instead of re-defining (or inventing) > our own. This can also be a stimulus for BRCs to improve the contents > of their catalogues. I went to the site you gave and followed the links for animal and cell lines. http://www.cabri.org/CABRI/home/guidelines/catalogue/CPdataahc.html If this what you are referring to, then I don't think it will work for two reasons. One is that cell lines are at the root of the hierarchy which makes sense for collections but not for microarray experiments which use samples which are not cell lines. The other problem is your hierarchy is too flat. Cell line has a brief description, tissue is a brief description, tissue is one of a list. I want to be able to ask what genes are expressed in the brain and get data from experiments done using tissue sections as well as cell lines and covering all parts of the brain (whole brain, cerebrum, amygdala, etc). Chris - -- Chris Stoeckert, Ph.D. Center for Bioinformatics, University of Pennsylvania 1316 Blockley Hall, 418 Guardian Drive Philadelphia, PA 19104-6021 215-573-4409 215-573-3111 FAX stoeckrt@pcbi.upenn.edu ------------------------------ Date: Thu, 6 Jul 2000 16:26:25 +0200 (DFT) From: Paolo Romano Subject: [microarray-ontol] CABRI data sets and lists Dear Chris, dear all, thank you for your message that is very stimulating. > According to our proposed concept structure (see > http://www.cbil.upenn.edu/Ontology/), cell line is at the bottom of our > hierarchy. Yes, I know. But in Heidelberg I suggested a different approach where a SAMPLE can either be a BIOLOGICAL or a NON-BIOLOGICAL sample and BIOLOGICAL samples can be WHOLE ORGANISM or PARTS or PRIMARY CELLS or CELL LINES. This distinction would put the cell lines at the top of the hierarchy. This approach has been recorded as "Alternative tree". Is it still valid? Anyway, if we remain at the original structure, > I think that it would be great if we could supply a pointer to the > collections that supplied the cell line when available. > In fact we do that through Provider, and that would be the place to > get cell line details such as passage number. I agree. > We still need a minimum description of the levels above cell > line to be provided. Again, I agree. Although there would be some redundancy, it would be better to have some basic attributes recorded in the microarray db as well. > Going out and grabbing that information, I think should be > done by the submitter not by the database, and it would be great > if the collections made it easy to do so by involvement such as yours. Totally agree. Information on cell lines must come from collections. We will try to do it in the near future. > http://www.cabri.org/CABRI/home/guidelines/catalogue/CPdataahc.html > If this what you are referring to, then I don't think it will work > for two reasons. > One is that cell lines are at the root of the hierarchy which makes > sense for collections but not for microarray experiments which use > samples which are not cell lines. See above, I'm thinking to a different approach where cell lines are at the top. My feeling is that in this case we could be more specific and less general/ambiguous. There are some properties/attributes which are adequate for some kind of samples and not for other. Why should we first define general attributes for all samples and then say what kind of sample do we have? Being more specific could as well help us define better reference lists. This is the CABRI approach: we have different materials, we do not specify a unique minimum data set, but different data sets, one for each kind of materials. > The other problem is your hierarchy is too flat. Cell line > has a brief description, tissue is a brief description, tissue > is one of a list. I did not want to point out our MDSs as good hierachies. They are a sort of initial compromise for allowing an integration of various databases that have long and different histories. We will certainly have to develop real hierarchies and accordingly modify the MDSs. What I specially wanted to point our are CABRI reference lists, either in-house or external, which are adopted by the collections and should be carefully considered as reference lists for microarray samples. I hope my intention is now a bit clearer. I'm looking forward to continue with this discussion which I believe is really interesting and productive (at least for me). Ciao. Paolo - -- Paolo Romano (paolo@ist.unige.it) Biotechnology Department, Natl Inst. for Cancer Research c/o Advanced Biotechnology Centre Largo Rosanna Benzi, 10, I-16132, Genova, Italy Tel: +39-010-5737-288 Fax: +39-010-5737-295 ------------------------------ Date: Thu, 6 Jul 2000 17:50:02 -0400 (EDT) From: Chris Stoeckert Subject: [microarray-ontol] Re: CABRI data sets and lists Dear Paolo and other owg (otnology working group) members, > Yes, I know. But in Heidelberg I suggested a different approach > where a SAMPLE can either be a BIOLOGICAL or a NON-BIOLOGICAL sample > and BIOLOGICAL samples can be WHOLE ORGANISM or PARTS or PRIMARY CELLS > or CELL LINES. This distinction would put the cell lines at the top > of the hierarchy. > > This approach has been recorded as "Alternative tree". > Is it still valid? Absolutely, but we have to decide which tree is our consensus recommendation so it would be great to hear from the rest of you. Unless I hear otherwise, I am going to assume that other than Paolo (and maybe Mike Eisen), every agrees with me that cell lines should be on the bottom (perhaps this will generate some traffic? ;-) ). Seriously, both trees are valid and we need to discuss which is a better fit for microarray data. If you send me your view of how we should structure the same concepts, I'll put it up on the Ontology home page for comparison. > See above, I'm thinking to a different approach where cell lines are > at the top. My feeling is that in this case we could be more specific > and less general/ambiguous. There are some properties/attributes which > are adequate for some kind of samples and not for other. Why should we > first define general attributes for all samples and then say what kind > of sample do we have? Being more specific could as well help us define > better reference lists. This is the CABRI approach: we have different > materials, we do not specify a unique minimum data set, but different > data sets, one for each kind of materials. A major benefit of defining general attributes first is that it allows for more direct comparisons between experiments at their common level. This is useful when you want to ask high-level questions (e.g., where is this gene expressed?). This question can be answered with the alternative tree but you would be polling tissues, cells, and cell lines separately and then integrating the results rather than surveying a single tree at the level of granularity you are interested in. Another major benefit of the general to specific tree is that once you choose the specific case, you have automatically chosen all the general cases above it. So that if you choose a specific cell line, you have automatically chosen the cell it came from, the tissue that cell came from, the anatomical part the tissue is from etc. This works if you have the appropriate hierarchies available. If these are not available, (e.g., what individual did this anatomy come from?) then the levels are merely placeholders and not very useful which I think is your argument. I would be inclined to split anatomy off from individual for that reason. Chris Chris Stoeckert, Ph.D. Center for Bioinformatics, University of Pennsylvania 1316 Blockley Hall, 418 Guardian Drive Philadelphia, PA 19104-6021 215-573-4409 215-573-3111 FAX stoeckrt@pcbi.upenn.edu ------------------------------ Date: Fri, 7 Jul 2000 11:12:27 -0700 From: "Collins, Patricia" Subject: RE: [microarray-ontol] Re: CABRI data sets and lists Sorry for delay in responding to the thread of discussion recently. I'm planning to spend part of this weekend looking more closely at what's been said. If I have suggestions or questions, I'll try to post them on Monday, July 10. . . . Absolutely, but we have to decide which tree is our consensus recommendation so it would be great to hear from the rest of you. Unless I hear otherwise, I am going to assume that other than Paolo (and maybe Mike Eisen), every agrees with me that cell lines should be on the bottom (perhaps this will generate some traffic? ;-) ). . . . ------------------------------ Date: Mon, 10 Jul 2000 19:06:33 -0400 (EDT) From: Chris Stoeckert Subject: [microarray-ontol] Re: Ontology (for a change not workshop) Jim, I'm posting this to the group because you make some good points as well as point out something about inheritance that I may have confused in my last posting. The tree that I was pushing of TAXONOMY: STRAIN/CULTIVAR: INDIVIDUAL: ANATOMY: TISSUE/ORGAN: CELL, CELL LINE is not a strict IS-A hierarchy and therefore (as written) does not provide inheritance. I was trying to point out that a tree like this can be useful if a standard anatomy hierarchy is available to choose terms from. For the question of "what genes are expressed in tissue X?" including substructure terms is very useful as has been done in GXD and we have done in RAD (GXD and RAD use a similar anatomical hierarchy based on the U. of Edinburgh anatomy). As you suggest, it would help to list the types of questions people want to ask that take advantage of standardized terms and relations (such as the one I posed above). This would help us decide what structure works best for our type of data/questions. In asking for ontologies and controlled vocabularies to post/consider, I was thinking of situations such as the one above where a standard hierarchy (or list) could be used to choose terms to populate our structured concepts. Our structured concepts have to fit our needs (types of questions posed) and I don't think UMLS will do that but I may not be familiar enough with UMLS. I see our efforts more as "customizing" our own wheel for our needs rather than reinventing it. Once we establish the basic structure of our concepts based on the questions we want to use this for then we can carve out pieces of existing ontologies/terminologies/controlled vocabularies to flesh it out. Database integration (this is my turf, I'm a newbie to ontologies as you can probably tell) will certainly benefit from this effort. Schemas are ontologies (or so I'm told) and if everyone bases the relevant part of their schema on our ontology then conducting queries (that use our concepts) across multiple databases will be easy. This is an important outcome of our work. However, database integration involves issues (architecture, data models, etc.) that we are not concerned with. So I completely agree with viewing our efforts as coming up with a standard. Chris On Sun, 9 Jul 2000, james geller wrote: > As I said, I did not keep up with the development, so I am > not totally sure. Thus I need to answer with a generality. > (You can post this if you feel it is of any general interest...) > > My education says that an ontology is a rooted connected > DAG of concepts that are connected by IS-A links. > Thus, one can say the sentence " IS-A " > for every pair of nodes connected by an IS-A link. > Every concept is more general than its children, and every > property is introduced at the highest possible point and then > inherited. Furthermore, we use attributes ("datatypes") and > relationships ("arrows") as properties. (Other people stress > attached axiom sets. We don't.) > > As I and a student of mine have studied part relationships for > about 15 years now, part relationships are added next to > the ontology. But everything that is a part must also be > tied into the net of IS-A links first. > > All other relationships besides IS-A and Part-of don't have > any special semantics. They are just arrows. (But we are studying > Onwership as a semantic relationship now.) What I have seen in > Heidelberg (again, I don't know for sure what has happened since) > was very underdeveloped in such relationships. > > Concerning the other question, of existing ontologies, I am > a little confused. As was mentioned at Heidelberg, the UMLS > contains pretty much everything that is of interest in this field. > (40 source vocabularies, when I looked more than a year ago. > There might be more now.) Are we going to try to carve pieces > out of the UMLS (or one or more of its components) and add them > to our ontology? Or are we trying (I DON'T say this with a negative > connotation) going to invent our own wheel? > > There are two things that bother me right now. First of all, in order > to be useful, I am sure that what we have now needs to grow by an > order of magnitude. > > Secondly, Jim Cimino, who was our teacher on medical vocabularies, > always asks the question "What are you going to use it for?" first. > I think, in order to be successful we need a few well defined questions > or problems against which we can develop the ontology. Later on we > need to test the ontology relative to some similar problems. > > If our problem is "integration" there have been many publications in > databases that do a fairly good job in addressing problems of the kind > "There are two databases, one has prices in dollars with not sales tax, > while the other one has prices in Euro, with sales tax. How do you > integrate those two databases?" I am not totally up to the state of the > art, but I don't think they are doing such a great job on database > columns that have symbolic as opposed to numerical values. > > My point of view is that instead of hoping for integration, we should > have an ontology that works like a standard, so that people in the > future will already develop database schemas with the same meaning > (by consulting the ontology before building it). This does not solve > problems for the existing databases, but I suspect that the number of > genomic databases will grow by an order of magnitude in the next > 10 years. So at least those would be helped. How to go about this in > detail is another question, and I am afraid a question that will be > hard to resolve by a group of people that is physically distributed > over two continents. > > > Chris Stoeckert, Ph.D. Center for Bioinformatics, University of Pennsylvania 1316 Blockley Hall, 418 Guardian Drive Philadelphia, PA 19104-6021 215-573-4409 215-573-3111 FAX stoeckrt@pcbi.upenn.edu ------------------------------ Date: Wed, 12 Jul 2000 14:28:03 -0400 From: Chris Stoeckert Subject: [microarray-ontol] questions for biological concepts In attempting to move forward on structuring our (microarray-ontology) sample concepts, it has become clear that consideration of the questions being posed of the concepts should guide our efforts. In other words, the more likely we are to form a query around a concept or attribute, the more important it is for us to provide structure/rules and controlled vocabularies for that concept/attribute. As Patricia Collins pointed out to me, this gets us into the domain of the micoarray-users working group and I think we should poll them to help us in this effort. Hence their inclusion in this mail. Is there a list of queries that you (users) can give us to consider? Our list is of concepts is located at: www.cbil.upenn.edu/Ontology Chris - -- Chris Stoeckert, Ph.D. Center for Bioinformatics, University of Pennsylvania 1316 Blockley Hall, 418 Guardian Drive Philadelphia, PA 19104-6021 215-573-4409 215-573-3111 FAX stoeckrt@pcbi.upenn.edu ------------------------------ Date: Mon, 24 Jul 2000 18:45:47 -0400 From: Chris Stoeckert Subject: [microarray-ontol] Ontology for microarray experiment information Hello all, Steffen Shulze-Kremer has put the sample concepts along with all the other microarray experiment information into an ontology that can be viewed with his Java Ontology Browser at http://igd.rz-berlin.mpg.de/~www/oe/mbo.html. I have also put a link to that site on our ontology working group home page (http://www.cbil.upenn.edu/Ontology). I also started a list of ontology resources with Paolo Romano's pointer to the CABRI site. I will add others. Chris - -- Chris Stoeckert, Ph.D. Center for Bioinformatics, University of Pennsylvania 1316 Blockley Hall, 418 Guardian Drive Philadelphia, PA 19104-6021 215-573-4409 215-573-3111 FAX stoeckrt@pcbi.upenn.edu ------------------------------ End of microarray-ontol-digest V1 #3 ************************************