 |
DNA Clone Production Core
Overview
The primary focus for the DNA Clone Production Core (DCPC) within the Cardiogenomics
PGA is to generate a collection of full-length sequence-verified cDNA clones
specific to heart development and diseases of the heart (the Human Heart Gene
Repository). This collection of clones will be captured into a recombinational
cloning vector (Gateway-tm) in order to enable the transfer of these genes into
virtually any protein expression vector in a simple overnight step. These clones
will be freely distributed among the various members of the Cardiogenomics PGA
as well as other PGAs for use in experiments to study the roles of these proteins
in heart development and pathophysiology. The overarching goal is to enable
researchers to perform high-throughput functional proteomics experiments on
heart-related proteins.
It was anticipated that the data from the other components of the CardioGenomics
Center would identify relevant genes for the DCPC to capture, and that these
data would first become available sometime in the second year of operation.
For this reason, the DCPC of the CardioGenomics PGA did not officially start
operation or receive funding until year 2 of the grant (August 2001). Beginning
in year 2, the goal has been to produce a set of 80-100 fully sequenced cDNA
clones per year. (The average production and sequencing cost per clone is ~$1000)
There are several key development stages required for generating the Human
Heart Gene Repository (HHGR) which include: (1) selecting target genes, (2)
creating a computerized tracking system for clone construction, (3) establishing
high-throughput methods for gene capture, and (4) developing methods for sequence
verifying the clones.
1. Target Gene Selection. There are three sources by which genes are
added to the cloning queue. First, a key aspect of target gene selection will
be the data produced by the other components in our center. Thus we anticipate
that much of our clone production will occur in the later years of the grant
as these data become available. Second, there is a link on our website enabling
anyone to suggest a gene for the queue. Third, we have developed a software
tool that will help identify genes related to heart disease (or other diseases).
This database, termed the MedGene database, has cross-correlated human diseases
with all named human genes in the medical literature (including all synonymous
names of the genes). This was accomplished with a text-mining tool specifically
developed for this purpose. Using this tool we have identified over 200 genes
related to cardiomyopathy. Associated with the genes
is a statistic that ranks the genes in an order based on the frequency that
the gene appears in the literature associated with various types of cardiomyopathy.
Attached to each gene entry is/are the types of cardiomyopathy that have been
associated with it in the medical literature. This list will be reviewed by
other members in our PGA to rank them for prioritization on the cloning queue.
The database is freely available at http://hipseq.med.harvard.edu/MEDGENE/.
2. Computerized clone tracking. A database has been assembled that collects
all of the cDNA sequences that are entered on the request queue. The database
automatically checks all sequences to ensure that they are not partial or incorrect
sequences and upon user request places the sequences in the cloning queue. From
this point forward, each attempt at a clone is assigned a unique ID number,
which is tracked throughout the cloning process. Every well and plate visited
by the clone is recorded, as are all relevant results from each process (PCR
success, colony count, etc.). To query this information, users can type in the
clone ID number or simply scan the bar-code label on any plate that contains
the clone, which will automatically bring up the entire history of that clone,
along with links to results data.
In order to ensure that all process steps are properly executed, the database
monitors and manages process flow. Before executing any biochemical step, robots
automatically read the bar-code label of every source plate and destination
plate on the deck. The database is then queried to ensure that next step is
valid and appropriate. Any attempt to perform an incorrect process will not
be permitted. Date, time, Protocol ID and User ID are also recorded automatically
for each process. Whenever a plate is read into the database, it automatically
recognizes which project the plate belongs to, the current process, the next
process, and generates the appropriate plate information for the next process.
The database is fully functional and is currently in version 3.3. It has a three-tier
architecture. The backend of the database is Oracle 8i, the business logic layer
is primarily Java and JSP, and the presentation layer is HTML. The database
core is housed in a Sun E450 enterprise class server and the web server is a
Dell Power Edge dual processor server. In addition to the standard Oracle rollback
capability, the database is backed up nightly to both hard disk and tape. Because
the presentation layer is HTML, users can operate the database from anywhere
with an internet connection.
3. High-throughput methods for gene capture. The assembly of the clones
begins with the selection of genes for the queue and is completed with the culture
and DNA preparation of 8 candidate isolates for each gene (4 closed and 4 fusion).
Once genes are selected for the cloning queue, the database predicts the PCR
primers and arranges the primers in an automation-friendly order. There are
three sets of primers, one 5' set, and two 3' sets (with and without a stop
codon, respectively). These three sets of primers are used in 2 PCR reactions
- one each to create the closed and fusion clone sets. In order to avoid cloning
aberrant PCR products, reactions are run on agarose gels and the appropriate
bands are gel purified. Because the genes are pre-arranged in a size-based saw
tooth pattern, it is relatively easy to detect the correct band
and to avoid contamination from neighboring bands. The purified PCR products
are then captured into the vector using a variation of the Gateway recombination
reaction, transformed into bacteria and plated. Four colonies are selected for
each clone and cultured.
All of the steps in this process have now been automated with workstation automation.
Reaction setups are performed by liquid handling robots, agarose gels are loaded
by robot, bacterial transformations are plated by robot onto custom-designed
bioassay dishes with 48 compartments, and picked automatically
by a colony-picking robot.
This automation is important for increasing accuracy. The error rate for the
robots during repetitive manipulations is exceedingly low and all operations
can be verified by examining the log files for the robots. This is especially
evident during gel loading and colony picking, which are error-prone tasks when
executed manually.At present, the process is very effective for cDNAs in the
size range up to 2.5 kb and moderately effective up to 4kb.
4. Sequence verification. Preliminary bioinformatics has been assembled
to sequence validate the clones as they are completed. This latter step is still
slow and cumbersome, and at present is done entirely by hand. In addition, it
is relatively expensive, costing approximately $150-200 per sequenced plasmid.
On average, about four plasmids must be sequenced per attempt to find a clone
that does not have any PCR-introduced errors. For each gene that is captured,
two different versions are produced, one with (Closed) and one without (Open)
a stop codon. This allows users to either make native protein or to add a carboxy-terminal
peptide tag, respectively. A focus of the coming year will be to develop better
in-house informatics that expedites the process of clone sequencing. T
The DCPC is ahead of schedule and has produced its first set of clones which
have been sequence verified and are listed on the website. In some cases only
an "open" version or a "closed" version of the clones were
obtained. These are now in the process of being converted to the alternate version
in order to make a complete set of both versions.
Our immediate goals now are:
- Complete the conversion of the first 100 genes (open <-> closed) so
that it is a fully sequenced set of genes in both forms to distribute to the
PGA.
-
Review the cardiomyopathy list with the the members of the PGA to manually
adjust the priority of the genes and then add the genes to the next set of clones
to be prepared.
-
Complete the assembly and sequencing of the next 100 HHGR clones.
Top of Page
Participants
|