CardioGenomics
overview researchers projects data publications events links
What's New
Candidate Genes
How to Cite Us
FAQ's
Other PGA's
Contact Us
Sign In

DNA Clone Production Core


Overview

The primary focus for the DNA Clone Production Core (DCPC) within the Cardiogenomics PGA is to generate a collection of full-length sequence-verified cDNA clones specific to heart development and diseases of the heart (the Human Heart Gene Repository). This collection of clones will be captured into a recombinational cloning vector (Gateway-tm) in order to enable the transfer of these genes into virtually any protein expression vector in a simple overnight step. These clones will be freely distributed among the various members of the Cardiogenomics PGA as well as other PGAs for use in experiments to study the roles of these proteins in heart development and pathophysiology. The overarching goal is to enable researchers to perform high-throughput functional proteomics experiments on heart-related proteins.

It was anticipated that the data from the other components of the CardioGenomics Center would identify relevant genes for the DCPC to capture, and that these data would first become available sometime in the second year of operation. For this reason, the DCPC of the CardioGenomics PGA did not officially start operation or receive funding until year 2 of the grant (August 2001). Beginning in year 2, the goal has been to produce a set of 80-100 fully sequenced cDNA clones per year. (The average production and sequencing cost per clone is ~$1000)

There are several key development stages required for generating the Human Heart Gene Repository (HHGR) which include: (1) selecting target genes, (2) creating a computerized tracking system for clone construction, (3) establishing high-throughput methods for gene capture, and (4) developing methods for sequence verifying the clones.

1. Target Gene Selection. There are three sources by which genes are added to the cloning queue. First, a key aspect of target gene selection will be the data produced by the other components in our center. Thus we anticipate that much of our clone production will occur in the later years of the grant as these data become available. Second, there is a link on our website enabling anyone to suggest a gene for the queue. Third, we have developed a software tool that will help identify genes related to heart disease (or other diseases). This database, termed the MedGene database, has cross-correlated human diseases with all named human genes in the medical literature (including all synonymous names of the genes). This was accomplished with a text-mining tool specifically developed for this purpose. Using this tool we have identified over 200 genes related to cardiomyopathy. Associated with the genes is a statistic that ranks the genes in an order based on the frequency that the gene appears in the literature associated with various types of cardiomyopathy. Attached to each gene entry is/are the types of cardiomyopathy that have been associated with it in the medical literature. This list will be reviewed by other members in our PGA to rank them for prioritization on the cloning queue. The database is freely available at http://hipseq.med.harvard.edu/MEDGENE/.


2. Computerized clone tracking. A database has been assembled that collects all of the cDNA sequences that are entered on the request queue. The database automatically checks all sequences to ensure that they are not partial or incorrect sequences and upon user request places the sequences in the cloning queue. From this point forward, each attempt at a clone is assigned a unique ID number, which is tracked throughout the cloning process. Every well and plate visited by the clone is recorded, as are all relevant results from each process (PCR success, colony count, etc.). To query this information, users can type in the clone ID number or simply scan the bar-code label on any plate that contains the clone, which will automatically bring up the entire history of that clone, along with links to results data.
In order to ensure that all process steps are properly executed, the database monitors and manages process flow. Before executing any biochemical step, robots automatically read the bar-code label of every source plate and destination plate on the deck. The database is then queried to ensure that next step is valid and appropriate. Any attempt to perform an incorrect process will not be permitted. Date, time, Protocol ID and User ID are also recorded automatically for each process. Whenever a plate is read into the database, it automatically recognizes which project the plate belongs to, the current process, the next process, and generates the appropriate plate information for the next process. The database is fully functional and is currently in version 3.3. It has a three-tier architecture. The backend of the database is Oracle 8i, the business logic layer is primarily Java and JSP, and the presentation layer is HTML. The database core is housed in a Sun E450 enterprise class server and the web server is a Dell Power Edge dual processor server. In addition to the standard Oracle rollback capability, the database is backed up nightly to both hard disk and tape. Because the presentation layer is HTML, users can operate the database from anywhere with an internet connection.

3. High-throughput methods for gene capture. The assembly of the clones begins with the selection of genes for the queue and is completed with the culture and DNA preparation of 8 candidate isolates for each gene (4 closed and 4 fusion). Once genes are selected for the cloning queue, the database predicts the PCR primers and arranges the primers in an automation-friendly order. There are three sets of primers, one 5' set, and two 3' sets (with and without a stop codon, respectively). These three sets of primers are used in 2 PCR reactions - one each to create the closed and fusion clone sets. In order to avoid cloning aberrant PCR products, reactions are run on agarose gels and the appropriate bands are gel purified. Because the genes are pre-arranged in a size-based saw tooth pattern, it is relatively easy to detect the correct band and to avoid contamination from neighboring bands. The purified PCR products are then captured into the vector using a variation of the Gateway recombination reaction, transformed into bacteria and plated. Four colonies are selected for each clone and cultured. All of the steps in this process have now been automated with workstation automation. Reaction setups are performed by liquid handling robots, agarose gels are loaded by robot, bacterial transformations are plated by robot onto custom-designed bioassay dishes with 48 compartments, and picked automatically by a colony-picking robot. This automation is important for increasing accuracy. The error rate for the robots during repetitive manipulations is exceedingly low and all operations can be verified by examining the log files for the robots. This is especially evident during gel loading and colony picking, which are error-prone tasks when executed manually.At present, the process is very effective for cDNAs in the size range up to 2.5 kb and moderately effective up to 4kb.

4. Sequence verification. Preliminary bioinformatics has been assembled to sequence validate the clones as they are completed. This latter step is still slow and cumbersome, and at present is done entirely by hand. In addition, it is relatively expensive, costing approximately $150-200 per sequenced plasmid. On average, about four plasmids must be sequenced per attempt to find a clone that does not have any PCR-introduced errors. For each gene that is captured, two different versions are produced, one with (Closed) and one without (Open) a stop codon. This allows users to either make native protein or to add a carboxy-terminal peptide tag, respectively. A focus of the coming year will be to develop better in-house informatics that expedites the process of clone sequencing. T

The DCPC is ahead of schedule and has produced its first set of clones which have been sequence verified and are listed on the website. In some cases only an "open" version or a "closed" version of the clones were obtained. These are now in the process of being converted to the alternate version in order to make a complete set of both versions.

Our immediate goals now are:

  1. Complete the conversion of the first 100 genes (open <-> closed) so that it is a fully sequenced set of genes in both forms to distribute to the PGA.
  2. Review the cardiomyopathy list with the the members of the PGA to manually adjust the priority of the genes and then add the genes to the next set of clones to be prepared.
  3. Complete the assembly and sequencing of the next 100 HHGR clones.
Top of Page


Participants

Researchers

Role

Joshua LaBaer Principal Investigator
Leonardo Brizuela Co-Investigator


Affiliated Institutions  | Sponsored by the National Heart, Lung, and Blood Institute
Copyright 2001-2003, CardioGenomics   | Designed by Digizyme