|
Biopolymers Facility at Harvard Medical School
We now offer ultra high throughput DNA sequencing services on the
Illumina Genome Analyzer (formerly known as the Solexa 1G) platform.
Funding for the instrument was generously provided by Harvard Medical School
institutional support through the Taplin Funds for Discovery Program as well
as funds from a five-department consortium.
This new service is available to all our users (both internal and external),
but preference in queuing will be given to the laboratories that
participated in the Taplin proposal.
If you have questions regarding the service please contact the Biopolymers
Facility at nextgen@genome.med.harvard.edu.
Using a massively parallel sequencing approach, first
developed by Solexa, the Illumina Genome Analyzer (GA) can generate more
than one billion bases of data in a single run. The sequence platform
utilizes novel reversible terminator chemistry optimized to achieve high
levels of cost effectiveness and throughput. This document
describes the method in more detail. Additional information describing
this instrument and possible applications is available at the
Illumina web site.
Back to top
The following technologies have been developed for this platform:
- Chromatin Immunoprecipitation applications (ChIP-Seq).
- Gene Expression (transcriptional profiling).
- Resequencing.
- Small RNA / (microRNA) Discovery.
- SNP Discovery.
- Paired End Module Protocols (Not yet available but coming soon).
For publications describing these applications please go to the
Illumina web site.
There are five basic steps in conducting ultra high throughput
sequencing experiments that apply to the above applications:
- DNA / RNA Isolation and Purification.
- Sample Prep (also referred to as Library Prep).
- Cluster Generation.
- Sequencing by Synthesis.
- Bioinformatics (includes base calling, assembly and data analysis).
This procedure is to be done by the lab submitting the sample. If you
need assistance, please contact us at
nextgen@genome.med.harvard.edu
regarding recommendations for isolation and purification procedures.
Our facility offers library preparation services from the point of
purified and properly prepared DNA or RNA. Additional procedures,
such as target enrichment strategies or other "up front" library preparation
steps are the responsibility of your laboratory. Assistance for some of
these procedures is available at an additional fee and we can provide
reference information and advice on these strategies. Users interested
in preparing their own libraries may also do so but it will be your
responsibility to order your own sample prep kits in these cases. You
may download and read the current protocol for the appropriate kit
offered by Illumina (see below).
Note: Library preparation involves
sophisticated molecular biological techniques and specialized equipment,
and users should be confident in their skills before creating and
delivering libraries to our facility for sequencing. Full fees
for the cluster generation and sequencing processes will be
assessed on a failed run if the problem is determined to be
library specific (control libraries will routinely be run in
parallel on each flow cell). Several methods to validate the
library prior to sequencing are suggested by Illumina,
as outlined in the manuals.
A high quality sample library is critical in order to obtain good
data. Each application has variations in the preparation methods
and is specific to the type of experiment you are performing and
are summarized below:
ChIP-Seq, Resequencing, and SNP Discovery Applications
All of these applications utilize the Genomic DNA library preparation kit.
Gene Expression (transcriptional profiling)
This application uses the RNA library preparation kit(s). You will achieve ~95%
transcript coverage with each kit (varies by organism) or ~99% transcript
coverage with both kits (doubles the cost per sample / number of runs).
micro RNA and low molecular weight RNA studies
These applications utilize the small RNA preparation kit.
Back to top
Prior to preparing any sample, we highly recommend you contact our facility to
discuss your project. For many projects, pre-analysis of your sequence(s) may
be necessary to determine the optimal approach or if it is even possible to
perform your project at all on the Illumina GA platform. In particular,
resequencing projects will require an existing reference sequence to allow
for assembly and analysis of your sequence data. Other issues, such as
depth of coverage must be predetermined and information such as whether a
DNA source is clonal or of a mixed population or has potential
contaminating DNA from a co-cultured organism, etc. will be very important
in determining how much sequence to generate.
Chromosomal, BAC, or related genomic libraries will require 5 ug of high
quality DNA at a concentration > 100 ng/ul with an OD 260/280 close to 1.8.
ChIP libraries we will require 100-200 ng of PicoGreen quantified material.
It is necessary to assay the ChIP material by PCR to determine if the material
is suitable for cluster generation and sequencing.
Expression profiling RNA library preparation will require 1 to 2 ug of Total
RNA (concentration > 20 ng/ul, OD 260/280 close to 2.0), validated for
quality on an Agilent Bioanalyzer or related instrument (we have a Nanodrop
and Bioanalzyer in our facility to perform this analysis for you).
Small RNA library preparation requires 10 ug Total RNA and should be
validated for quality as noted above.
Please fill out the Inquiry Form and email it to nextgen@genome.med.harvard.edu.
Preparation of the flow cell (cluster generation) using the cluster
station will be performed only by our facility personnel. Determining
the amount of library DNA to be used for this step is critical for
generating the maximum amount of DNA Sequence. Too few clusters will
result in less sequence and too many clusters will cause signal
interference during sequence imaging and will cause excessive data
loss. Achieving the optimal cluster density is difficult but critical
to obtain the greatest amount of high quality sequence data.
Utilizing our experience, our staff will try to determine the optimal
concentration for your particular library to achieve the greatest
amount of sequence. We will do our best to generate as much quality
sequence as we can from your library but cannot guarantee a specific
sequence yield, only a range. You may decide to run a single channel,
evaluate the results and then run additional channels at a different
concentration should you require more sequence (this approach will
result in lower costs but may involve a very significant time lag
depending upon instrument availability). Alternatively, you may wish
to run multiple channels at various concentrations all at once (this
approach will result in higher costs but no loss in time will occur
in obtaining all the data you require). The strategy of cost versus
time is one that must be made by each individual lab. Once a
library is created, it provides enough material for many flow cell
channels and it is stable for a long period of time. The sample
may be run repeatedly through the cluster generation and
sequencing steps until enough data is produced. Normal cluster and
sequencing fees will apply.
Sequencing on the Illumina GA instrument will be carried out only by
facility personnel. At this time there is little that can be done
to alter run parameters to improve data quality or quantity, though
Illumina continues to improve chemistries, optics, etc. in an
effort to achieve this. Our facility is in constant contact with
Illumina to be sure that we are utilizing the latest protocols
and reagents on the GA platform as well as upgrading instrument
parts as necessary. Quality metrics generated during the run
provide an indication of the sequence quality and the run can be
stopped after the first base cycle to save reagent costs should
the flow cell contents be determined to be of poor quality. Run
times vary based on the number of cycles collected but range from
one to seven days (for example an eight channel, 36 cycle
sequencing run currently takes 84 hours-3.5days to run).
Back to top
The bioinformatics procedures related to sequencing on the
Illumina GA platform are split into three basic categories:
- Base Calling.
- Sequence Assembly.
- Data Analysis.
Each flow cell has eight channels and the Genome Analyzer runs one
flow cell at a time. This process generates approximately 0.7 TB
of raw image data - 90 GB per channel. After processing; 0.5 TB of
FASTA sequence files and quality and intensity meta-data
are generated.
We process the data on the
Harvard Medical School Orchestra cluster
and we utilize
Harvard Medical School's mass disk storage array
as a data repository.
The sequence files generated will contain approximately 3
million reads per channel. Generally, these files will be available
within two to three days following the completion of the sequence
run. Options for retrieving this data follow:
Harvard Users:
- HMS Storage - We encourage you to create an account on the mass storage
array and directly access your data there.
- FTP - FASTA sequence files only.
- External Hard Drive - If you are interested in retrieving your raw
data image files and the associated meta-data; you will need to make
arrangements for file transfer to an external drive with our facility
computer staff. There are additional fees associated with this.
External Users:
- FTP - FASTA sequence files only.
- External Hard Drive - If you are interested in retrieving your
raw data image files and the associated meta-data; you will need to
make arrangements for file transfer to an external drive with our
facility computer staff. There are additional fees associated with
this.
For projects that involve resequencing an organism with a known
reference sequence, assembly of the short reads is necessary. We
are investigating the possibility of providing this as part of our
overall service or if there will be additional fees involved. Please
contact us for
more details.
Further manipulations of your data will be necessary, such as
alignments, mutation detection, tag counting, etc. Depending on
the nature of the project, the subsequent bioinformatics analyses
may be straight-forward or quite complicated. We will endeavor to
assist you with these analyses where possible but additional fees
may be involved. Contact us at
nextgen@genome.med.harvard.edu
for more details.
The volume of data that our facility will generate on an annual
basis is too great for us to maintain long-term storage of all
the raw image files. Therefore long-term storage and backup
costs will be quoted for individual projects. Contact us at
nextgen@genome.med.harvard.edu
for more information.
Back to top
Please contact us at
nextgen@genome.med.harvard.edu
for pricing information for your specific project.
Back to top
No FAQs yet.
Please watch this site for continued updates!
Back to top
|