Environmental Health Perspectives. 2004;112(4):506-510
Abstract: On the surface, transcript profiling using microarrays seems to offer a way of looking at the global response of the cell to perturbation, with a focus on changes in gene expression. The difficulty, however, is that the response of a particular gene is actually measured on the array by an element that is a short, defined nucleic acid sequence. Sequences that map back to the same genetic locus may actually be given different names and descriptions when they are deposited in public sequence databases; when such sequences are used in microarray construction, elements that monitor the same genetic locus may have different names and descriptions. The algorithm described here uses a hierarchical approach to assign a single best annotation to the elements in a given microarray in such a fashion that elements from one microarray platform may be cross-indexed with those of another. The algorithm relies on the nucleic acid accession number for a given array element, and uses that to retrieve annotation from the most recent versions of LocusLink and UniGene. Both database resources are searched, with a priority being given to annotation derived from the curated LocusLink database. In lieu of annotation found in these databases, the default GenBank annotation is used. As a final outcome, a cross-chip identifier is generated that may be used to cross-index array elements. The program is available as a practical extraction and report language (Perl) script that can run under any Perl interpreter.
To download an open-access PDF copy, click here.