Recent Searches
Part of Thermo Fisher Scientific  
 
 
 
  This page will help you understand how to use our Gene Search to better find the products related to your genes of interest.

Introduction

Finding a cDNA clone to match your gene or sequence of interest can at first seem like a complex task. However, in actuality it is quite easy once you understand that each clone may have numerous identifiers and which identifiers are easiest to use. Or perhaps you have a gene name or a piece of sequence and need to find a clone.  At Open Biosystems it is our goal to make this process as straight forward as possible. We provide a clone query (Gene Search) that accepts the most commonly used identifier, the GenBank Accession number, and additional clone identifiers when possible. In addition, our unique RefClone Mapping assists in your efforts to rapidly identify physical cDNA clones containing your sequence of interest. Below you will find a description of the Open Biosystems Gene Search, a brief tutorial on identifier definitions, and additional hints on finding a clone.

Don't forget that if you have difficulty finding a clone you require, have a special request or want to provide feedback on how we can better assist you please contact us at info@openbiosystems.com or call our customer service at 1-888-412-2225.

How to search

Searching is easy.  You may search for up to fifty identifiers, of a common type, at a time and page through the results.  Enter your identifier(s) in the text box at the far right and click the submit button.


Searching by GenBank Accession number or Clone ID: Accepted Identifiers

The Open Biosystems clone query accepts both GenBank Accession numbers and clone identifiers that are unique to specific collections. To find a clone of interest, enter either a GenBank Accession or Clone ID into the search query box. The Open Biosystems clone query will then attempt to find an exact match to the GenBank Accession or Clone ID that you entered. However, if an exact match is not found then the clone query system will use a unique mapping mechanism, RefClone Mapping, to return a listing of homologous clones in our collection. An example of the accepted identifiers are given below:

GenBank Accession one or two letters followed by a series of digitsBQ228424
IMAGE ID five to seven digit number6059449
University of Iowa IDa series of letters and dashes beginning with UUI-R-E0-by-f-10-0-UI

A GenBank Accession number is assigned every time a sequence is deposited in the NCBI. It consists of one or two letters followed by a series of numbers (eg. BQ228424). A single cDNA clone can have several GenBank Accession numbers. This is because several sequences may have been deposited with the NCBI for the same cDNA clone. For example, one laboratory may have sequenced the 5' end of a clone and deposited the sequence while another lab may have sequenced the 3' end. Additionally, if a cDNA clone is completely sequenced and found to contain a full-length gene sequence then it will be given yet another GenBank Accession number

*Note: Many GenBank Accession records of sequence data do not reference a cDNA clone directly. These may be mRNA sequences (NM_), protein sequences (NP_), genomic sequence (NC_,NT_) or non-coding transcripts (XR_). A cDNA clone or BAC clone may be available containing the sequence of interest but will require additional searching to identify. See 'Identifier Definitions' or contact technical service for assistance.

The IMAGE ID is a unique identifier assigned by the I.M.A.G.E. Consortium to each clone that was derived from this project. Unlike the GenBank Accession number a single clone will have only one IMAGE ID. All sequences derived from the same clone will reference the same IMAGE ID. For example, the IMAGE ID 236338 is a cDNA clone containing sequence for the Tumor protein p53 (Li-Fraumeni syndrome). This same cDNA clone has been sequenced from the 5' end and this sequence was deposited with NCBI and given the GenBank Accession number H61357, it was also sequenced from 3' end and given the GenBank Accession number H62385. In both cases the IMAGE ID 236338 will be referenced. When entering IMAGE clone ID's into the clone query, enter the numerical portion only (i.e. remove IMAGE:)

The University of Iowa ID is a unique identifier assigned to cDNA clones derived from projects originating at this University. These include a rat cDNA clone collection, and other unique human cDNA collections (i.e. Cystic Fibrosis cDNA Collection). The University of Iowa ID's include information relating to tissue source, library type, etc. The University of Iowa ID is assigned much as the IMAGE ID that is detailed above. An example of an UI ID: UI-R-FS1-cqh-p-20-0-UI

Gene Search and RefClone Mapping

To find a clone of interest, enter either a GenBank Accession or Clone ID into the search query box. The Open Biosystems' Gene Search will then look for an exact match to the GenBank Accession or Clone ID that you entered. However, if an exact match is not found then the system will use a unique mapping mechanism, RefClone Mapping, to find homologous clones in our collection. The query processes your request in a stepwise fashion:

  1. An identifier type is determined to be an accession number or clone ID.
  2. An exact match is sought after in the Open Biosystems' clone database. Exact matches are returned with the ordering information and technical data.
  3. Identifiers without exact matches then undergo RefClone mapping to the UniGene database. The UniGene database clusters multiple clone sequences by MegaBLAST alignments (see UniGene Clustering Process below for details on the UniGene database). The cluster containing the requested identifier is first identified.
  4. All of the clones are then extracted from the representative UniGene cluster and compared against the Open Biosystems\' clone database.
  5. Clones found to be contained in both the UniGene cluster and the Open Biosystems' clone database are then displayed under the message:
    1. 'You searched for '123456', we found # in the same cluster (Hs.###).'
    2. Clones are returned with the ordering information and technical data.
  6. The clones returned were found to be related by sequence homology utilizing the methods described in UniGene Clustering Process (see below) These clones should be confirmed by individual alignment to the requested sequence to ensure accuracy. BLAST2 (http://www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html) at NCBI maybe utilized to rapidly align the requested sequence and the RefClone sequence.

Open Biosystems is not responsible for clones that may not align in whole or in part, as clone sequencing, genome finishing and clustering is currently a dynamic process. This service is provided in an effort to assist in your efforts to rapidly identify homologous cDNA clones.

Other Common Identifiers

The GenBank accession number and Clone identifiers previously described relate directly to sequences of a cDNA clone. However, you may find reference to or be interested in identifiers beginning with the prefixes below. The prefix indicates a specific type of sequence and even though it is not a direct sequence from a cDNA clone, we may be able to locate a clone for you containing the sequence of interest. We have provided the brief definition of these identifiers and how a representative clone containing the sequence of interest may be obtained if one is available.

AccessionTypecDNA cloneBAC cloneQuery Type
NC_Complete genome UnavailableUnavailableUnavailable
NG_Genomic sequence*UnavailableAvailableinfo@openbiosystems.com
NM_mRNA sequence AvailableUnavailable http://www.openbiosystems.com
NP_Protein sequence**AvailableUnavailableinfo@openbiosystems.com
NT_Genomic contig* UnavailableUnavailable Unavailable
XM_mRNA sequence***AvailableUnavailableinfo@openbiosystems.com
XP_Protein sequence**AvailableUnavailableinfo@openbiosystems.com
XR_Non-coding transcript* UnavailableAvailableUnavailable
RPCI-Genomic sequence*UnavailableAvailableinfo@openbiosystems.com
CTA,B,C,D- Genomic sequence*UnavailableAvailableinfo@openbiosystems.com

*Genes & cDNA clones contained within these regions may be identified through the human genome browsers at Ensemble or UCSC. A representative genomic clone may be available, email info@openbiosystems.com.

**The protein sequence will need to be translated back into nucleic acid sequence and the representative GenBank accession number identified through BLAST against the nr or EST databases.

***Representative clones may be available. BLAST will need to be performed against the nr or EST databases to identify candidates. The resulting accession numbers from the BLAST report can then be placed in the Open Biosystems clone query (See\'RefClone matching')

Identifier Definitions (taken from http://www.ncbi.nih.gov/RefSeq/)

NCBI Accession numbers that begin with the prefix NG_ (genomic), NM_ (mRNA) and NP_ (protein) are generated and maintained by the Curated RefSeq project. Sequence records are reviewed and additional feature annotation may be added. In addition, in some instances the sequence has been modified relative to the original GenBank sequence from which it was derived. Representative clones are available for a majority of these accession numbers. These are readily identified by reviewing the 'mRNA' section of the UniGene entry containing the accession number. Full length MGC/IMAGE clones that are representative of the RefSeq entry will be listed in the 'mRNA' section. For EST representatives, review the 'EST' section of the same UniGene entry.

Accession FormatMolecule TypeGenome
NC_123456Complete GenomeArchaea, Bacterial, Organelle, Virus

Complete ChromosomeEukaryote
NG_123456Genomic RegionHomo sapiens
NM_123456mRNAHomo sapiens
Mus musculus
Rattus norvegicus
NP_123456ProteinAll of the above
NT_123456Genomic ContigHomo sapiens
Mus musculus

NCBI Accession numbers that begin with the prefix XM_ (mRNA), XR_ (non-coding transcript), and XP_ (protein) are model reference sequences produced by NCBI's Genome Annotation project (i.e. in silico predictions). These records represent the transcripts and proteins that are annotated on the NCBI Contigs, which may have been generated from incomplete data. Because the XM_, XR_, and XP_ accessions reflect the current state of NCBI\'s assembly of the genomic sequence, they may be different from GenBank submissions for mRNAs and/or the curated RefSeq records. These differences may reflect real sequence variation (polymorphism), errors in GenBank accessions used as sources for unreviewed (provisional) RefSeq records, or errors or gaps in the available genomic sequence. These sequences should be used with caution, after comparing them to other available sequence information (Check the evidence viewer, BLink, LocusLink, or sequence neighbors).

Accession FormatMolecule TypeGenome
XM_123456mRNAHomo sapiens model mRNA provided by the Genome Annotation process; sequence corresponds to the genomic contig.
XR_123456RNAHomo sapiens model non-coding transcripts provided by the Genome Annotation process; sequence corresponds to the genomic contig.
XP_123456ProteinHomo sapiens model proteins provided by the Genome Annotation process; sequence corresponds to the genomic contig.

Finding a clone starting with a gene name

For human or mouse full-length clones the best resource is the MGC website at http://mgc.nci.nih.gov which has a keyword or gene symbol search

Another resource to find a clone by gene name is the UniGene database, http://www.ncbi.nlm.nih.gov/UniGene/. UniGene groups (clusters) genes according to their function or disease association based on sequence similarity. Typing the name of your gene of interest into the UniGene search engine and choosing your desired organism from the dropdown menu will return the cluster ID and cluster information. Included in the cluster information is a list of associated EST clones.

Clones are displayed by sequence end read length from longest to shortest if noted. You can use the Open Biosystems Clone Query to search our current stocks using either the Source ID or GenBank accession

Finding a clone starting with a nucleotide sequence

For human or mouse full-length clones the best resource is the MGC website at http://mgc.nci.nih.gov which has BLAST capability

The standard nucleotide-nucleotide BLAST engine at the NCBI website, http://www.ncbi.nlm.nih.gov/BLAST/, is a useful tool for locating homologous clones. Simply paste the sequence of interest into the "search" box, choose the database you would like to search (i.e. est_others), and click "BLAST". The instructions listed above are the simplest way to perform a BLAST query. There are many ways to refine and restrict the query described at the NCBI BLAST website.

The results returned from the BLAST query will be arranged in descending order from the most highly similar sequence alignment to the least similar. By scrolling down to the "Alignments" section or by clicking on an individual record, clone matches can be located by looking in the description as shown below.

UniGene Clustering Process (taken from 'The UniGene Build Procedure')
  1. Clustering is the process of finding subsets of sequences that belong together within a larger set. This is done by converting discrete similarity scores to boolean links between sequences. That is, two sequences are considered linked if their similarity exceeds a threshold. UniGene clustering proceeds in several stages, with each stage adding less reliable data to the results of the preceding stage. This staged clustering affords greater control than a more egalitarian treatment of all links between sequences.
  2. Screening for contaminants, repeats, and low-complexity sequence is performed. Low-complexity screening is performed using NCBI\'s Dust. Mitochondrial and ribosomal sequences are screened for, as are vector contaminants and repetitive elements. After screening, a sequence must contain at least 100 informative base pairs (bp) to be a candidate for entry into UniGene.
  3. Gene links are established. The set of mRNA sequences is compared with itself. Sequence pairs that are sufficiently similar are linked together to form initial clusters.
  4. Links between ESTs and mRNA are added to these clusters. The set of ESTs is compared with sequences from the set of initial clusters using megablast, and sufficiently similar sequence pairs are added to the clusters. Links that would join the initial mRNA-based clusters are discarded. EST to EST links are also generated and used to extend the initial clusters and to generate clusters composed solely of ESTs.
  5. Clone-based edges are added; these allow non-overlapping 5' and 3' ESTs to be assigned to the same cluster. Clone IDs which link at least two 5' ends from one cluster with and least two 3' ends another cluster are found, and the two clusters are merged. Due to imperfect clone labeling, a single clone-ID based edge is insufficient to merge two clusters.
  6. Any resulting cluster that does not contain a sequence with a polyadenylation signal or tail is discarded. Clusters that meet these criteria are called anchored clusters, since their 3' end is presumed to be known.
  7. EST's that do not belong to an anchored cluster are rechecked at a lower level of stringency than in the preceding passes. An EST which passes this less stringent test is then added to the cluster which contains the sequence which is the best match to the EST; it is a guest member.
  8. Clusters of size 1 (that is, clusters which seem to identify infrequently expressed genes) are compared against the rest of the sequences in UniGene at a lower level of stringency, and merged with the cluster containing the most similar sequence.
  9. The resulting clusters are compared with the preceding week's build and renumbered in an attempt to maintain continuity. Since the sequences that make up a cluster may change from week to week, and since the cluster identifier may disappear (typically when two clusters merge) using the cluster identifier as a reference is ill-advised. Using the GB accession numbers of the sequences that comprise the cluster is a safe alternative.
Call For Assistance

If you have any questions that are not covered here, please call us at 888-412-2225 or email us at info@openbiosystems.com and our customer service representatives will be happy to help you.