microarray-ontol-digest Sunday, April 14 2002 Volume 01 : Number 022 ---------------------------------------------------------------------- Date: Sat, 30 Mar 2002 16:37:42 -0500 From: Chris Stoeckert Subject: [microarray-ontol] Re: cell ontologies Great! Am working on a grant so please pardon the sporadic nature of my responses. OK back to cultured cells. In rereading your original mail I realize that I missed your point. You were pointing out the need for both an anatomic location and a cell lineage and I got caught up in the culture aspect. Sorry. You are correct that both are needed and these are in the MGED ontology as OrganismPart (The part of the organism's anatomy from which the biomaterial was derived) and CellType (Cell type, the type of cell used in the experiment if non mixed, if mixed the TargetedCellType should be used, eg of instance, epithelial, glial etc.). Our CBIL controlled vocabulary mixes the two. This is also true for the Mouse Anatomical Dictionary but I believe that Martin Ringwald and his group are working to separate these into orthogonal ontologies. When that is available it would address your issues for mouse (and probably for other mammals). Meanwhile you can still use the CBIL controlled vocabulary for CellType. To query, you can use the term (e.g. 'fibroblast') which should bring back entries with that name regardless of origin (rather than query by ID (e.g. "5" or "307"). Does this make sense? Cheers, Chris On Wednesday, March 27, 2002, at 03:06 PM, Christopher Denny wrote: > Hi Chris - By all means, copy my email out to whomever you think might > be interested. I'd be happy from input from others grappling with this > problem. In addition, if anyone would like entry into our development > server for our microarray database for a look around, I'd be happy to > supply account/password information. > > Chris > >> Dear Chris, >> I think the problem you are facing is one that we and many others are >> also trying to sort out which is how to incorporate the MGED efforts >> including the ontology in to a working database. If you don't mind I'd >> like to communicate over the ontology working group mail list as I >> believe that it would be instructive to many to hear your background >> and follow along as we try to solve this. If you'd rather not that's >> fine as well. Just let me know and we can start discussing how to >> represent cultured cells. Just to start off, I see two scenarios, one >> where you get something from ATCC (or CABRI or Coriell) and so you >> point to their identifiers and information (CellLine). The other is >> where you have cells or tissues (CellType, OrganismPart, >> TargetedCellType) which you describe and then describe the culture as >> Treatment. We currently have CultureConditions as EnvironmentalHistory >> but that will be moved to Treatment. >> >> Cheers, >> Chris >> >> On Tuesday, March 26, 2002, at 09:16 PM, Christopher Denny wrote: >> >>> Dear Dr. Stoeckert, >>> This is a quick (well, maybe not so quick) note to introduce >>> ourselves and to describe what we've done and what we're trying to do >>> regarding microarray data management. While we've made some useful >>> tools, we want very much to be in compliance with the rest of the >>> world (which is looking very much like MGED). We are also happy to >>> contribute our tools and source code to other academic efforts. Our >>> current development issues center on better naming biosource >>> material. Thus I'm afraid that I'm bothering you. >>> About a year ago the microarray facility here at UCLA expanded. >>> To meet this need, we created a database backed website and tools to >>> handle sample submissions, sample tracking and data distribution back >>> to users. (General construct is: http <> tcl <> sql (Oracle).) >>> Creating accurate experimental descriptions was a top priority from >>> the start. When investigators wish to perform a microarray >>> experiment, they sign in and are prompted to describe their >>> experiment with a context sensitive survey - the questions they are >>> asked depend on answers to previous queries. In this way we are able >>> to get a fairly good experimental description in a controlled and >>> quick way. >>> This database has been running for about a year. From its >>> inception we have used concepts found in earier MGED websites. In >>> looking over the recent MGED ontology, we're happy that many of the >>> concepts are still represented. We're having difficulty however >>> implementing better cell/tissue descriptions. Our current thinking >>> (admittedly mammalian biased) is that biomaterials can be considered >>> a combination of two concepts: cell source (cell lines or primary >>> cells) and cell passage (tissue culture or in vivo). Primary cells >>> and cell lines that are passaged in vivo are by necessity are >>> heterogeneous cell populations and would best be described by an >>> anatomic ontology such as the one you have developed >>> (http://www.cbil.upenn.edu/anatomy.php3). Cells passed in tissue >>> culture are more problematic. Some feel that they are less >>> representative of their anatomic location of origin than their cell >>> lineage. For example, primary fibroblasts from foreskin, bone marrow >>> or lung behave very similarly in tissue culture. In this case a cell >>> lineage ontology might be better. >>> This becomes more of an issue for data retreival than entry. >>> Running some hypothetical instances using the CBIL anatomic ontology, >>> I had no problem finding places for specific entries. The problem is >>> that I could find more than one logical place where a user might log >>> in a particular cell line. This ambiguity could make searching our >>> database difficult. >>> I want to be clear about my limitations. I'm coming at this >>> from a biologist/amateur programmer background. For the majority of >>> the day, I run a molecular biology lab. I'm pretty familiar with the >>> nitty gritty of performing microarray experiments. I've constructed >>> a few database apps but our microarray website was constructed by >>> some real DBAs that manage to put up with me. Creating ontologies and >>> knowledge based systems is out at the fringe of my comfort zone. I've >>> been through most of the online resources at the MGED site (the >>> Ontologies 101 tutorial was great) but I'm not even to neophyte >>> status. The recently posted MGED Ontology >>> (http://www.cbil.upenn.edu/Ontology/biomaterial13.html) was pretty >>> impenetrable to me. >>> What I'm looking for is a pragmatic solution to describe cell >>> sources that we can incorporate into our currently running db. At >>> this point extensibility is probably more important than >>> completeness. Extending the hierarchy is easy. What I would like to >>> avoid is painting myself into a corner (eg. MGED noncompliance) and >>> then have to rip it out and start over again in the future. Any >>> thoughts, suggested resources you might have would be greatly >>> appreciated. >>> >>> Thanks >>> >>> Chris Denny >>> Professor of Pediatrics >>> UCLA School of Medicine >>> >>> -- > > > -- ------------------------------ Date: Sat, 30 Mar 2002 18:39:29 -0500 From: Chris Stoeckert Subject: [microarray-ontol] Changes to ontology I've posted version 1.3 of the ontology as an html file at the web site (http://www.cbil.upenn.edu/Ontology/biomaterial13.html) and sent the rdfs and DAML files to Jason S to update the mged.sourceforge.net site. Version 1.3 is now a direct link from the OWG home page. I've retired the "building an ontology" page (no links to it) - let me know if there was something there that you miss. One of the outcomes of the MGED 4 discussions was that to synchronize MAGE and the ontology, biosources can not have manipulations that use protocols. In the current ontology you'll see that biosources have environmental histories. There are three subclasses of EnvironmentalHistory with Protocols: ContaminantOrganism (a subclass of CultureCondition), Preservation, and Water. I propose that we change their superclass to be Treatment. Note that I am proposing that we keep CultureCondition as an EnvironmentalHistory and just move ContaminantOrganism. Comments? Cheers, Chris ------------------------------ Date: Fri, 5 Apr 2002 16:28:45 -0800 (PST) From: Cathy Ball Subject: [microarray-ontol] RE: [Mged-mage] question on Biomaterial [MAGE-OM] Michael, I'm not sure I am totally on board with this (I'm cc'ing the ontology folks because it affects how we implement ontologies to describe our samples). You wrote: > > "Show me all arrays performed by John Doe using yeast." It turns > out that the information is one remove away from the ArrayDesign > for that Array to allow multiple species on a single array > (DesignElementGroup has a species association to OntologyEntry) > > So a bit more formally could be: "Show me all arrays performed by > John Doe where the array design for that > array has a FeatureGroup with a species OntologyEntry of yeast." Using this thinking, how would I say, "Show me all arrays performed by John Doe where the array design for that array has a FeatureGroup with a species OntologyEntry of human AND (here's where it gets tricky) the sample used for the labelled extract was derived from chimpanzee"? Or maybe my sample was generated from mouse cell line infected with three viruses. There are lots of variations on this theme that we have already had to store (inadequately, I must add, since we use exactly the method you suggest -- simply recording the species used to make the spots on the array). Very often, the species used to generate the array features and the species used to generate the labelled extract (which, biologically speaking, is often far more relevant) are different. Cheers, Cathy On Fri, 5 Apr 2002, Miller, Michael (Rosetta) wrote: > Hi John, > > You raise a good question. > > My take on this is that the model is intended to be concerned with modeling > the Data and not how queries will be handled. Particular implementations > will want to either extend the model or add application logic that use the > model in areas that are important to a particular application and platform > (but may not be important to other applications!). > > What I think you are asking is actually even a little more, really you're > talking about a query language that's much nicer than SQL. > > > "Show me all arrays performed by John Doe using yeast." > It turns out that the information is one remove away from the ArrayDesign > for that Array to allow multiple species on a single array > (DesignElementGroup has a species association to OntologyEntry) > > So a bit more formally could be: > "Show me all arrays performed by John Doe where the array design for that > array has a FeatureGroup with a species OntologyEntry of yeast." > > But even more, as organizations read in the information locally about an > experiment, or part of an experiment, the data and annotation can be > reorganized to suit the organization's needs. Either an application can > take your original query and be smart about the paths it searches for the > information, or the additional annotation information of species can be > associated with the array when it's read in from either the > DesignElementGroup or BioMaterial information. > > We deferred talking about query languages and just wanted to concentrate on > the model, but there may be a set of common queries (your examples are good > ones) that can be defined as a specific Gene Expression Query Language. > This wouldn't necessarily imply changing the model! It would merely imply > that an organization that supported the query language would recognize the > specialized queries and return the desired result, regardless of how the > information is stored. > > my 2c's > Michael > > > -----Original Message----- > > From: John Matese [mailto:jcmatese@genome.stanford.edu] > > Sent: Friday, April 05, 2002 12:27 PM > > To: mged-mage@lists.sourceforge.net > > Subject: [Mged-mage] question on Biomaterial [MAGE-OM] > > > > > > > > Hi All, > > > > At SMD we've been looking over the MAGE object model and are coming > > up with a few questions, mostly involving model's design, > > limitations, and implementation here. > > > > One question I have regards the Biomaterial package; Is it true that > > biomaterials only refer (by association) to those parent materials > > used to produce them? If so, how does one query for bioassays using > > a source-biomaterial's annotation. > > > > For example, at SMD we regularly perform queries of the type, > > "Show me all arrays performed by John Doe using yeast." > > > > Another might be, > > "Show me all arrays involving lung tissues." > > > > In this case "yeast" and "lung" would likely be annotations of a > > source-biomaterial. In the case of "lung", you might also have to > > find the more granular annotations, that are descendants of "lung". > > > > By my interpretation of the model, these queries are only > > facilitated if all desired search terms (related to biomaterial > > annotations) are also recorded as experimental factors? Perhaps I am > > not understanding the use case on page 75 of the Gene Expression RFP > > Response? > > > > Does the MAGE-OM not allow a source-biomaterial to "know" all > > its children? > > > > Thanks for the information, > > > > John Matese > > SMD Software Developer and Curator > > http://genome-www.stanford.edu/microarray > > jcmatese@genome.stanford.edu > > > > _______________________________________________ > > Mged-mage mailing list > > Mged-mage@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/mged-mage > > > > > _______________________________________________ > Mged-mage mailing list > Mged-mage@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/mged-mage > ------------------------------ Date: Fri, 5 Apr 2002 18:22:52 -0800 (PST) From: Cathy Ball Subject: RE: [microarray-ontol] RE: [Mged-mage] question on Biomaterial [MAGE-OM] Paul, I think the issue that we see is that a biosample only "knows" about its parent. We can't figure out how the model allows us to query a bio_source (i.e., chimpanzee or yeast or whatever) and find all the children biosamples (i.e., labelled extract from sonicated chimp toes or heat-shocked S288C). As I read it, Michael suggests looking at the DesignElementGroup info to infer the organism of the top-level biosample, but that might not be the correct information - eg people hybridizing chimp RNA to human cDNA microarrays, which they're doing. And that won't let you query for mid-level biosamples, either (i.e., untreated S288C). Cheers, Cathy On Fri, 5 Apr 2002, Paul Spellman wrote: > Cathy, > > You are querying two separate parts of the model. there is no problem. you > are asking for the join of array_property x = y and biosample_property m = n > for a hybridization. > > paul > > -----Original Message----- > From: owner-microarray-ontol@ebi.ac.uk > [mailto:owner-microarray-ontol@ebi.ac.uk]On Behalf Of Cathy Ball > Sent: Friday, April 05, 2002 4:29 PM > To: Miller, Michael (Rosetta) > Cc: ge-curator@genome.stanford.edu; microarray-ontol@ebi.ac.uk > Subject: [microarray-ontol] RE: [Mged-mage] question on Biomaterial > [MAGE-OM] > > > Michael, > > I'm not sure I am totally on board with this (I'm cc'ing the ontology > folks because it affects how we implement ontologies to describe our > samples). > > You wrote: > > > > "Show me all arrays performed by John Doe using yeast." It turns > > out that the information is one remove away from the ArrayDesign > > for that Array to allow multiple species on a single array > > (DesignElementGroup has a species association to OntologyEntry) > > > > So a bit more formally could be: "Show me all arrays performed by > > John Doe where the array design for that > > array has a FeatureGroup with a species OntologyEntry of yeast." > > Using this thinking, how would I say, "Show me all arrays performed by > John Doe where the array design for that array has a FeatureGroup with a > species OntologyEntry of human AND (here's where it gets tricky) the > sample used for the labelled extract was derived from chimpanzee"? > > Or maybe my sample was generated from mouse cell line infected with three > viruses. There are lots of variations on this theme that we have already > had to store (inadequately, I must add, since we use exactly the method > you suggest -- simply recording the species used to make the spots on the > array). > > Very often, the species used to generate the array features and the > species used to generate the labelled extract (which, biologically > speaking, is often far more relevant) are different. > > Cheers, > > Cathy > > > On Fri, 5 Apr 2002, Miller, Michael (Rosetta) wrote: > > > Hi John, > > > > You raise a good question. > > > > My take on this is that the model is intended to be concerned with > modeling > > the Data and not how queries will be handled. Particular implementations > > will want to either extend the model or add application logic that use the > > model in areas that are important to a particular application and platform > > (but may not be important to other applications!). > > > > What I think you are asking is actually even a little more, really you're > > talking about a query language that's much nicer than SQL. > > > > > "Show me all arrays performed by John Doe using yeast." > > It turns out that the information is one remove away from the ArrayDesign > > for that Array to allow multiple species on a single array > > (DesignElementGroup has a species association to OntologyEntry) > > > > So a bit more formally could be: > > "Show me all arrays performed by John Doe where the array design for that > > array has a FeatureGroup with a species OntologyEntry of yeast." > > > > But even more, as organizations read in the information locally about an > > experiment, or part of an experiment, the data and annotation can be > > reorganized to suit the organization's needs. Either an application can > > take your original query and be smart about the paths it searches for the > > information, or the additional annotation information of species can be > > associated with the array when it's read in from either the > > DesignElementGroup or BioMaterial information. > > > > We deferred talking about query languages and just wanted to concentrate > on > > the model, but there may be a set of common queries (your examples are > good > > ones) that can be defined as a specific Gene Expression Query Language. > > This wouldn't necessarily imply changing the model! It would merely imply > > that an organization that supported the query language would recognize the > > specialized queries and return the desired result, regardless of how the > > information is stored. > > > > my 2c's > > Michael > > > > > -----Original Message----- > > > From: John Matese [mailto:jcmatese@genome.stanford.edu] > > > Sent: Friday, April 05, 2002 12:27 PM > > > To: mged-mage@lists.sourceforge.net > > > Subject: [Mged-mage] question on Biomaterial [MAGE-OM] > > > > > > > > > > > > Hi All, > > > > > > At SMD we've been looking over the MAGE object model and are coming > > > up with a few questions, mostly involving model's design, > > > limitations, and implementation here. > > > > > > One question I have regards the Biomaterial package; Is it true that > > > biomaterials only refer (by association) to those parent materials > > > used to produce them? If so, how does one query for bioassays using > > > a source-biomaterial's annotation. > > > > > > For example, at SMD we regularly perform queries of the type, > > > "Show me all arrays performed by John Doe using yeast." > > > > > > Another might be, > > > "Show me all arrays involving lung tissues." > > > > > > In this case "yeast" and "lung" would likely be annotations of a > > > source-biomaterial. In the case of "lung", you might also have to > > > find the more granular annotations, that are descendants of "lung". > > > > > > By my interpretation of the model, these queries are only > > > facilitated if all desired search terms (related to biomaterial > > > annotations) are also recorded as experimental factors? Perhaps I am > > > not understanding the use case on page 75 of the Gene Expression RFP > > > Response? > > > > > > Does the MAGE-OM not allow a source-biomaterial to "know" all > > > its children? > > > > > > Thanks for the information, > > > > > > John Matese > > > SMD Software Developer and Curator > > > http://genome-www.stanford.edu/microarray > > > jcmatese@genome.stanford.edu > > > > > > _______________________________________________ > > > Mged-mage mailing list > > > Mged-mage@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/mged-mage > > > > > > > > > _______________________________________________ > > Mged-mage mailing list > > Mged-mage@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/mged-mage > > > > ------------------------------ Date: Sat, 6 Apr 2002 12:44:59 -0500 From: Chris Stoeckert Subject: [microarray-ontol] Fwd: BOUNCE microarray-ontol@alpha1.ebi.ac.uk: Non-member submission from ["Paul Spellman" ] > From: "Paul Spellman" > To: "Cathy Ball" , > "Miller, Michael (Rosetta)" > Cc: , > Subject: RE: [microarray-ontol] RE: [Mged-mage] question on Biomaterial > [MAGE-OM] > Date: Fri, 5 Apr 2002 16:40:09 -0800 > Message-ID: > MIME-Version: 1.0 > Content-Type: text/plain; > charset="US-ASCII" > Content-Transfer-Encoding: 7bit > X-Priority: 3 (Normal) > X-MSMail-Priority: Normal > X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) > Importance: Normal > In-Reply-To: > > X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 > X-Virus-Scanned: by AMaViS perl-11 > > Cathy, > > You are querying two separate parts of the model. there is no > problem. you > are asking for the join of array_property x = y and biosample_property > m = n > for a hybridization. > > paul > > -----Original Message----- > From: owner-microarray-ontol@ebi.ac.uk > [mailto:owner-microarray-ontol@ebi.ac.uk]On Behalf Of Cathy Ball > Sent: Friday, April 05, 2002 4:29 PM > To: Miller, Michael (Rosetta) > Cc: ge-curator@genome.stanford.edu; microarray-ontol@ebi.ac.uk > Subject: [microarray-ontol] RE: [Mged-mage] question on Biomaterial > [MAGE-OM] > > > Michael, > > I'm not sure I am totally on board with this (I'm cc'ing the ontology > folks because it affects how we implement ontologies to describe our > samples). > > You wrote: > >>> "Show me all arrays performed by John Doe using yeast." It turns >> out that the information is one remove away from the ArrayDesign >> for that Array to allow multiple species on a single array >> (DesignElementGroup has a species association to OntologyEntry) >> >> So a bit more formally could be: "Show me all arrays performed by >> John Doe where the array design for that >> array has a FeatureGroup with a species OntologyEntry of yeast." > > Using this thinking, how would I say, "Show me all arrays performed by > John Doe where the array design for that array has a FeatureGroup with a > species OntologyEntry of human AND (here's where it gets tricky) the > sample used for the labelled extract was derived from chimpanzee"? > > Or maybe my sample was generated from mouse cell line infected with > three > viruses. There are lots of variations on this theme that we have > already > had to store (inadequately, I must add, since we use exactly the method > you suggest -- simply recording the species used to make the spots on > the > array). > > Very often, the species used to generate the array features and the > species used to generate the labelled extract (which, biologically > speaking, is often far more relevant) are different. > > Cheers, > > Cathy > > > On Fri, 5 Apr 2002, Miller, Michael (Rosetta) wrote: > >> Hi John, >> >> You raise a good question. >> >> My take on this is that the model is intended to be concerned with > modeling >> the Data and not how queries will be handled. Particular >> implementations >> will want to either extend the model or add application logic that use >> the >> model in areas that are important to a particular application and >> platform >> (but may not be important to other applications!). >> >> What I think you are asking is actually even a little more, really >> you're >> talking about a query language that's much nicer than SQL. >> >>> "Show me all arrays performed by John Doe using yeast." >> It turns out that the information is one remove away from the >> ArrayDesign >> for that Array to allow multiple species on a single array >> (DesignElementGroup has a species association to OntologyEntry) >> >> So a bit more formally could be: >> "Show me all arrays performed by John Doe where the array design for >> that >> array has a FeatureGroup with a species OntologyEntry of yeast." >> >> But even more, as organizations read in the information locally about >> an >> experiment, or part of an experiment, the data and annotation can be >> reorganized to suit the organization's needs. Either an application >> can >> take your original query and be smart about the paths it searches for >> the >> information, or the additional annotation information of species can be >> associated with the array when it's read in from either the >> DesignElementGroup or BioMaterial information. >> >> We deferred talking about query languages and just wanted to >> concentrate > on >> the model, but there may be a set of common queries (your examples are > good >> ones) that can be defined as a specific Gene Expression Query Language. >> This wouldn't necessarily imply changing the model! It would merely >> imply >> that an organization that supported the query language would recognize >> the >> specialized queries and return the desired result, regardless of how >> the >> information is stored. >> >> my 2c's >> Michael >> >>> -----Original Message----- >>> From: John Matese [mailto:jcmatese@genome.stanford.edu] >>> Sent: Friday, April 05, 2002 12:27 PM >>> To: mged-mage@lists.sourceforge.net >>> Subject: [Mged-mage] question on Biomaterial [MAGE-OM] >>> >>> >>> >>> Hi All, >>> >>> At SMD we've been looking over the MAGE object model and are coming >>> up with a few questions, mostly involving model's design, >>> limitations, and implementation here. >>> >>> One question I have regards the Biomaterial package; Is it true that >>> biomaterials only refer (by association) to those parent materials >>> used to produce them? If so, how does one query for bioassays using >>> a source-biomaterial's annotation. >>> >>> For example, at SMD we regularly perform queries of the type, >>> "Show me all arrays performed by John Doe using yeast." >>> >>> Another might be, >>> "Show me all arrays involving lung tissues." >>> >>> In this case "yeast" and "lung" would likely be annotations of a >>> source-biomaterial. In the case of "lung", you might also have to >>> find the more granular annotations, that are descendants of "lung". >>> >>> By my interpretation of the model, these queries are only >>> facilitated if all desired search terms (related to biomaterial >>> annotations) are also recorded as experimental factors? Perhaps I am >>> not understanding the use case on page 75 of the Gene Expression RFP >>> Response? >>> >>> Does the MAGE-OM not allow a source-biomaterial to "know" all >>> its children? >>> >>> Thanks for the information, >>> >>> John Matese >>> SMD Software Developer and Curator >>> http://genome-www.stanford.edu/microarray >>> jcmatese@genome.stanford.edu >>> >>> _______________________________________________ >>> Mged-mage mailing list >>> Mged-mage@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/mged-mage >>> >> >> >> _______________________________________________ >> Mged-mage mailing list >> Mged-mage@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/mged-mage >> > > ------------------------------ Date: Sat, 6 Apr 2002 12:46:10 -0500 From: Chris Stoeckert Subject: [microarray-ontol] Fwd: BOUNCE microarray-ontol@alpha1.ebi.ac.uk: Non-member submission from [Tina Boussard ] > Date: Fri, 5 Apr 2002 16:40:35 -0800 (PST) > From: Tina Boussard > To: microarray-ontol@ebi.ac.uk > cc: ge-curator@alberich.Stanford.EDU > Subject: Ontology question > Message-ID: > > MIME-Version: 1.0 > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > > Hi All, > > As many of you might have seen, we here at SMD have been looking over > the > MAGE object model and have a few questions for the group; mine in > particular concerns the ontology part of the model: > > What is the difference between ExternalReference and DatabaseEntry? What > is the rule for when you use what? > > Under the ontology description, we need to be able to store the > relationship between the terms in any given ontology. Therefore we need > ontology containers so that we can provide our users with the ability to > browse and navigate ontologies. Is there a need for one in MAGE or is > it > sufficient to simply cite the ontology and give an ID? Indeed without > this ability to define relationships, the utility of storing ontologies, > and using them to annotate is significantly diminished. > > Ontology entries derive from Extendable. Why is it not Identifiable? > If > all ontology entries have unique identifiers (i.e. GO:001657, > TAXID:594882, ...), I would think that they should be Identifiable. > > > Cheers, > > Tina Boussard > > > > > > ------------------------------ Date: Sat, 6 Apr 2002 12:48:01 -0500 From: Chris Stoeckert Subject: [microarray-ontol] Fwd: BOUNCE microarray-ontol@alpha1.ebi.ac.uk: Non-member submission from ["Paul Spellman" ] > From: "Paul Spellman" > To: "Cathy Ball" > Cc: "Miller, Michael (Rosetta)" , > , , > > Subject: RE: [microarray-ontol] RE: [Mged-mage] question on Biomaterial > [MAGE-OM] > Date: Fri, 5 Apr 2002 18:35:59 -0800 > > Cathy et al., > > the model is not the same as your database implementation. the model > captures the necessary information, to reconstruct the sample > hierarchy. it > is the same as with GO terms, process doesn't know which go terms are > its > children, but if you start at the bottom it can be reconstructed to > answer > this question. > > If this is a critical ability of your database then you can certainly > normalize the data structures so that sources know their daughters, it > just > isn't MAGE (which is expected and fine). > > We decided that it wasn't essential for the model to work to force this > relationship and non-essential relationships would be too complicated. > Imagine if you had a document that had your starting sources that you > continually reused. now imagine that at later times you make some > samples, > if the model required sources to know samples this you would have to > constantly update your source document, which wouldn't necessarily be > useful. Further imagine two people using the same source, now they > have to > keep a master copy of that source which they can all update, rather than > recompute relationships later based on new information. > > Paul > > -----Original Message----- > From: owner-microarray-ontol@ebi.ac.uk > [mailto:owner-microarray-ontol@ebi.ac.uk]On Behalf Of Cathy Ball > Sent: Friday, April 05, 2002 6:23 PM > To: Paul Spellman > Cc: Cathy Ball; Miller, Michael (Rosetta); > ge-curator@genome.stanford.edu; microarray-ontol@ebi.ac.uk; > mged-mage@lists.sorceforge.net > Subject: RE: [microarray-ontol] RE: [Mged-mage] question on Biomaterial > [MAGE-OM] > > > Paul, > > I think the issue that we see is that a biosample only "knows" about its > parent. We can't figure out how the model allows us to query a > bio_source > (i.e., chimpanzee or yeast or whatever) and find all the children > biosamples (i.e., labelled extract from sonicated chimp toes or > heat-shocked S288C). > > As I read it, Michael suggests looking at the DesignElementGroup info to > infer the organism of the top-level biosample, but that might not be the > correct information - eg people hybridizing chimp RNA to human cDNA > microarrays, which they're doing. And that won't let you query for > mid-level biosamples, either (i.e., untreated S288C). > > Cheers, > > Cathy > > On Fri, 5 Apr 2002, Paul Spellman wrote: > >> Cathy, >> >> You are querying two separate parts of the model. there is no problem. > you >> are asking for the join of array_property x = y and biosample_property >> m = > n >> for a hybridization. >> >> paul >> >> -----Original Message----- >> From: owner-microarray-ontol@ebi.ac.uk >> [mailto:owner-microarray-ontol@ebi.ac.uk]On Behalf Of Cathy Ball >> Sent: Friday, April 05, 2002 4:29 PM >> To: Miller, Michael (Rosetta) >> Cc: ge-curator@genome.stanford.edu; microarray-ontol@ebi.ac.uk >> Subject: [microarray-ontol] RE: [Mged-mage] question on Biomaterial >> [MAGE-OM] >> >> >> Michael, >> >> I'm not sure I am totally on board with this (I'm cc'ing the ontology >> folks because it affects how we implement ontologies to describe our >> samples). >> >> You wrote: >> >>>> "Show me all arrays performed by John Doe using yeast." It turns >>> out that the information is one remove away from the ArrayDesign >>> for that Array to allow multiple species on a single array >>> (DesignElementGroup has a species association to OntologyEntry) >>> >>> So a bit more formally could be: "Show me all arrays performed by >>> John Doe where the array design for that >>> array has a FeatureGroup with a species OntologyEntry of yeast." >> >> Using this thinking, how would I say, "Show me all arrays performed by >> John Doe where the array design for that array has a FeatureGroup >> with a >> species OntologyEntry of human AND (here's where it gets tricky) the >> sample used for the labelled extract was derived from chimpanzee"? >> >> Or maybe my sample was generated from mouse cell line infected with >> three >> viruses. There are lots of variations on this theme that we have >> already >> had to store (inadequately, I must add, since we use exactly the method >> you suggest -- simply recording the species used to make the spots on >> the >> array). >> >> Very often, the species used to generate the array features and the >> species used to generate the labelled extract (which, biologically >> speaking, is often far more relevant) are different. >> >> Cheers, >> >> Cathy >> >> >> On Fri, 5 Apr 2002, Miller, Michael (Rosetta) wrote: >> >>> Hi John, >>> >>> You raise a good question. >>> >>> My take on this is that the model is intended to be concerned with >> modeling >>> the Data and not how queries will be handled. Particular > implementations >>> will want to either extend the model or add application logic that use > the >>> model in areas that are important to a particular application and > platform >>> (but may not be important to other applications!). >>> >>> What I think you are asking is actually even a little more, really > you're >>> talking about a query language that's much nicer than SQL. >>> >>>> "Show me all arrays performed by John Doe using yeast." >>> It turns out that the information is one remove away from the > ArrayDesign >>> for that Array to allow multiple species on a single array >>> (DesignElementGroup has a species association to OntologyEntry) >>> >>> So a bit more formally could be: >>> "Show me all arrays performed by John Doe where the array design for > that >>> array has a FeatureGroup with a species OntologyEntry of yeast." >>> >>> But even more, as organizations read in the information locally about >>> an >>> experiment, or part of an experiment, the data and annotation can be >>> reorganized to suit the organization's needs. Either an application >>> can >>> take your original query and be smart about the paths it searches for > the >>> information, or the additional annotation information of species can >>> be >>> associated with the array when it's read in from either the >>> DesignElementGroup or BioMaterial information. >>> >>> We deferred talking about query languages and just wanted to >>> concentrate >> on >>> the model, but there may be a set of common queries (your examples are >> good >>> ones) that can be defined as a specific Gene Expression Query >>> Language. >>> This wouldn't necessarily imply changing the model! It would merely > imply >>> that an organization that supported the query language would recognize > the >>> specialized queries and return the desired result, regardless of how >>> the >>> information is stored. >>> >>> my 2c's >>> Michael >>> >>>> -----Original Message----- >>>> From: John Matese [mailto:jcmatese@genome.stanford.edu] >>>> Sent: Friday, April 05, 2002 12:27 PM >>>> To: mged-mage@lists.sourceforge.net >>>> Subject: [Mged-mage] question on Biomaterial [MAGE-OM] >>>> >>>> >>>> >>>> Hi All, >>>> >>>> At SMD we've been looking over the MAGE object model and are coming >>>> up with a few questions, mostly involving model's design, >>>> limitations, and implementation here. >>>> >>>> One question I have regards the Biomaterial package; Is it true >>>> that >>>> biomaterials only refer (by association) to those parent materials >>>> used to produce them? If so, how does one query for bioassays >>>> using >>>> a source-biomaterial's annotation. >>>> >>>> For example, at SMD we regularly perform queries of the type, >>>> "Show me all arrays performed by John Doe using yeast." >>>> >>>> Another might be, >>>> "Show me all arrays involving lung tissues." >>>> >>>> In this case "yeast" and "lung" would likely be annotations of a >>>> source-biomaterial. In the case of "lung", you might also have to >>>> find the more granular annotations, that are descendants of "lung". >>>> >>>> By my interpretation of the model, these queries are only >>>> facilitated if all desired search terms (related to biomaterial >>>> annotations) are also recorded as experimental factors? Perhaps I >>>> am >>>> not understanding the use case on page 75 of the Gene Expression RFP >>>> Response? >>>> >>>> Does the MAGE-OM not allow a source-biomaterial to "know" all >>>> its children? >>>> >>>> Thanks for the information, >>>> >>>> John Matese >>>> SMD Software Developer and Curator >>>> http://genome-www.stanford.edu/microarray >>>> jcmatese@genome.stanford.edu >>>> >>>> _______________________________________________ >>>> Mged-mage mailing list >>>> Mged-mage@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/mged-mage >>>> >>> >>> >>> _______________________________________________ >>> Mged-mage mailing list >>> Mged-mage@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/mged-mage >>> >> >> > > ------------------------------ Date: Sat, 6 Apr 2002 14:36:24 -0500 From: Chris Stoeckert Subject: [microarray-ontol] Re: Ontology question Hi Tina, ExternalReference is a MAGE class not represented in the MGED ontology. I believe it is for pointing to objects that are part of the microarray experiment but not included in the MAGE document like a TIFF. DatabaseEntry is in the MGED ontology (an attribute of OntologyEntry) and is used to say where you got an ontology term (e.g. NCBI_taxon_id is an instance of DatabaseEntry). The issue you raise about relationships between terms in any given ontology is one which caused a lot of debate. The original MAGE notion of OntologyEntry was for "neutral authoring", i.e., to simply cite the ontology and give an ID. This has changed with the appreciation that this is not sufficient as an ontology object may have parts that need to be identified as well (the example used was Age which has a measurement and an initial time point). So now an OntologyEntry can refer to another OntologyEntry. As has been pointed out in MAGE discussions, MAGE is a model and how you store ontologies is an implementation issue. As I wrote to John Matese on the MAGE mail list, we have updated RAD (our relational gene expression database) to include the greater structure worked out in the MGED ontology and MAGE model. We are planning to use OntologyEntry as a table to store terms being used and where we got them. For simple controlled vocabularies such as the names of biosource characteristics (Age, Sex, OrganismPart, etc.), these are simply stored in the OntologyEntry table. For hierarchical controlled vocabularies in RAD (e.g., Anatomy), the term used is stored and a pointer given to the Anatomy table which has all the relationships. For controlled vocabularies/ ontologies not stored locally, pointers to external resources will be used (e.g., ChemID for a compound). At the end of MGED4, The issue of ontology entries not being identifiable coming up in a MAGE break-out session. My recollection is the following but any of the people there please correct me as needed. Identifiable refers to scope within a MAGE document. You are likely to use the same ontology entry more than once in the same MAGE document but to be identifiable they would need different IDs assigned to them. For example, if you used 2 mice, each of these is identifiable but each reference to the NCBI taxon ID for mouse does not need to be. Cheers, Chris > Hi All, > > As many of you might have seen, we here at SMD have been looking over > the > MAGE object model and have a few questions for the group; mine in > particular concerns the ontology part of the model: > > What is the difference between ExternalReference and DatabaseEntry? What > is the rule for when you use what? > > Under the ontology description, we need to be able to store the > relationship between the terms in any given ontology. Therefore we need > ontology containers so that we can provide our users with the ability to > browse and navigate ontologies. Is there a need for one in MAGE or is > it > sufficient to simply cite the ontology and give an ID? Indeed without > this ability to define relationships, the utility of storing ontologies, > and using them to annotate is significantly diminished. > > Ontology entries derive from Extendable. Why is it not Identifiable? > If > all ontology entries have unique identifiers (i.e. GO:001657, > TAXID:594882, ...), I would think that they should be Identifiable. > > > Cheers, > > Tina Boussard ------------------------------ Date: Sat, 6 Apr 2002 14:17:47 -0800 From: "Miller, Michael (Rosetta)" Subject: RE: [microarray-ontol] RE: [Mged-mage] question on Biomaterial [M AGE-OM] Hi Cathy, I'm in agreement with Paul here, that you are thinking about implementation. > Using this thinking, how would I say, "Show me all arrays performed by > John Doe where the array design for that array has a FeatureGroup with a > species OntologyEntry of human AND (here's where it gets tricky) the > sample used for the labelled extract was derived from chimpanzee"? To me this isn't tricky, you've stated very close to what a query could be, it's two valid queries by the model that can either be combined by AND or by OR or done separately. We strove to have the right set of associations so there was as little ambiguity as possible and the objects were related in as natural way as they are in a lab. Your query above could match an actually hybridization, MAGE allows it to be modeled. When John says the that a common query is: > > > > "Show me all arrays performed by John Doe using yeast." It is ambiguous by the model but might have a default implementation for a particular organization. For instance, it could be translated by an application to be: "Show me all arrays performed by John Doe where the array design for that array has a FeatureGroup with a species OntologyEntry of yeast OR the sample used for the labelled extract was derived from yeast" and, for instance, the Array table has additional associations in the implementation to quickly get the information from both the ArrayDesign and the BioMaterial directly. Or, the organization can have an implementation that returns, "Ambiguous query." Regards, Michael > -----Original Message----- > From: Cathy Ball [mailto:ball@genome.stanford.edu] > Sent: Friday, April 05, 2002 6:23 PM > To: Paul Spellman > Cc: Cathy Ball; Miller, Michael (Rosetta); > ge-curator@genome.Stanford.EDU; microarray-ontol@ebi.ac.uk; > mged-mage@lists.sorceforge.net > Subject: RE: [microarray-ontol] RE: [Mged-mage] question on > Biomaterial > [MAGE-OM] > > > Paul, > > I think the issue that we see is that a biosample only > "knows" about its > parent. We can't figure out how the model allows us to query > a bio_source > (i.e., chimpanzee or yeast or whatever) and find all the children > biosamples (i.e., labelled extract from sonicated chimp toes or > heat-shocked S288C). > > As I read it, Michael suggests looking at the > DesignElementGroup info to > infer the organism of the top-level biosample, but that might > not be the > correct information - eg people hybridizing chimp RNA to human cDNA > microarrays, which they're doing. And that won't let you query for > mid-level biosamples, either (i.e., untreated S288C). > > Cheers, > > Cathy > ------------------------------ Date: Sun, 14 Apr 2002 17:39:10 -0400 From: Chris Stoeckert Subject: [microarray-ontol] OILed 3.4 Dear Group, A new version of OILed is now available at http://oiled.man.ac.uk/ (the link from the OWG site will send you to a redirect page - I will update this). There is only a Windows version but since it is in Java, I was able to run it on my Mac in OSX. After registering, I download the zip file (just started with the version without the Reasoner) and after unstuffing, dragged it into my Applications directory. Since it's Windows it has a oiled.bat file: @echo off rem Set up the classpath by adding all the jars. set OILEDCLASSPATH=. for %%i in ("lib\*.jar") do call "cpappend.bat" %%i java -Xmx192m -cp "%OILEDCLASSPATH%" uk.ac.man.cs.img.oil.ui.OilEd cpappend.bat is just: @echo off rem Simple batch that appends something to the classpath set OILEDCLASSPATH=%OILEDCLASSPATH%;%1 I created a file "OILed" based on the same file with the Linux 2.2a version: #!/bin/sh OILEDCLASSPATH=".:lib/fact- client.jar:lib/jgl3.1.0.jar:lib/oil.jar:lib/rdf- api.jar:lib/crimson.jar:lib/xerces.jar:lib/getopt.jar:lib/jaxp.jar:lib/jena. jar:lib/jracer" java -Xmx192m -cp $OILEDCLASSPATH uk.ac.man.cs.img.oil.ui.OilEd Note that jar files are in "libs" in 2.2 and "lib" in 3.4. Obviously, what I did should work for any Unix system. Anyway this seems to work. I opened up a RDFS file and a DAML file made with 2.2a. The DAML file duplicated all the classes - one was simply the class name and the other was the class name preceeded by a full directory path! Only the latter had attributes (now called properties instead of slots). The RDFS file did not duplicate the classes but also did not have attributes. Will continue to play with this and let you know how it goes. Please share your thoughts and experiences if you use it. Cheers, Chris ------------------------------ End of microarray-ontol-digest V1 #22 *************************************