Biopolymers Facility at Harvard Medical School

Next Generation Sequencing Services

We now offer ultra high throughput DNA sequencing services on the Illumina Genome Analyzer (formerly known as the Solexa 1G) platform.

Funding for the instrument was generously provided by Harvard Medical School institutional support through the Taplin Funds for Discovery Program as well as funds from a five-department consortium.

This new service is available to all our users (both internal and external), but preference in queuing will be given to the laboratories that participated in the Taplin proposal.

If you have questions regarding the service please contact the Biopolymers Facility at nextgen@genome.med.harvard.edu.

Overview

Using a massively parallel sequencing approach, first developed by Solexa, the Illumina Genome Analyzer (GA) can generate more than one billion bases of data in a single run. The sequence platform utilizes novel reversible terminator chemistry optimized to achieve high levels of cost effectiveness and throughput. This document describes the method in more detail. Additional information describing this instrument and possible applications is available at the Illumina web site.

Back to top

Applications

The following technologies have been developed for this platform:

  • Chromatin Immunoprecipitation applications (ChIP-Seq).
  • Gene Expression (transcriptional profiling).
  • Resequencing.
  • Small RNA / (microRNA) Discovery.
  • SNP Discovery.
  • Paired End Module Protocols (Not yet available but coming soon).

For publications describing these applications please go to the Illumina web site.

There are five basic steps in conducting ultra high throughput sequencing experiments that apply to the above applications:

  • DNA / RNA Isolation and Purification.
  • Sample Prep (also referred to as Library Prep).
  • Cluster Generation.
  • Sequencing by Synthesis.
  • Bioinformatics (includes base calling, assembly and data analysis).

DNA / RNA Isolation and Purification

This procedure is to be done by the lab submitting the sample. If you need assistance, please contact us at nextgen@genome.med.harvard.edu regarding recommendations for isolation and purification procedures.

Sample Prep / Library Prep

Our facility offers library preparation services from the point of purified and properly prepared DNA or RNA. Additional procedures, such as target enrichment strategies or other "up front" library preparation steps are the responsibility of your laboratory. Assistance for some of these procedures is available at an additional fee and we can provide reference information and advice on these strategies. Users interested in preparing their own libraries may also do so but it will be your responsibility to order your own sample prep kits in these cases. You may download and read the current protocol for the appropriate kit offered by Illumina (see below).

Note: Library preparation involves sophisticated molecular biological techniques and specialized equipment, and users should be confident in their skills before creating and delivering libraries to our facility for sequencing. Full fees for the cluster generation and sequencing processes will be assessed on a failed run if the problem is determined to be library specific (control libraries will routinely be run in parallel on each flow cell). Several methods to validate the library prior to sequencing are suggested by Illumina, as outlined in the manuals.

A high quality sample library is critical in order to obtain good data. Each application has variations in the preparation methods and is specific to the type of experiment you are performing and are summarized below:

ChIP-Seq, Resequencing, and SNP Discovery Applications
All of these applications utilize the Genomic DNA library preparation kit.

Gene Expression (transcriptional profiling)
This application uses the RNA library preparation kit(s). You will achieve ~95% transcript coverage with each kit (varies by organism) or ~99% transcript coverage with both kits (doubles the cost per sample / number of runs).

micro RNA and low molecular weight RNA studies
These applications utilize the small RNA preparation kit.

Back to top

Sample Submission Details

Prior to preparing any sample, we highly recommend you contact our facility to discuss your project. For many projects, pre-analysis of your sequence(s) may be necessary to determine the optimal approach or if it is even possible to perform your project at all on the Illumina GA platform. In particular, resequencing projects will require an existing reference sequence to allow for assembly and analysis of your sequence data. Other issues, such as depth of coverage must be predetermined and information such as whether a DNA source is clonal or of a mixed population or has potential contaminating DNA from a co-cultured organism, etc. will be very important in determining how much sequence to generate.

Chromosomal, BAC, or related genomic libraries will require 5 ug of high quality DNA at a concentration > 100 ng/ul with an OD 260/280 close to 1.8.

ChIP libraries we will require 100-200 ng of PicoGreen quantified material. It is necessary to assay the ChIP material by PCR to determine if the material is suitable for cluster generation and sequencing.

Expression profiling RNA library preparation will require 1 to 2 ug of Total RNA (concentration > 20 ng/ul, OD 260/280 close to 2.0), validated for quality on an Agilent Bioanalyzer or related instrument (we have a Nanodrop and Bioanalzyer in our facility to perform this analysis for you).

Small RNA library preparation requires 10 ug Total RNA and should be validated for quality as noted above.

Inquiry Form

Please fill out the Inquiry Form and email it to nextgen@genome.med.harvard.edu.

Cluster Generation

Preparation of the flow cell (cluster generation) using the cluster station will be performed only by our facility personnel. Determining the amount of library DNA to be used for this step is critical for generating the maximum amount of DNA Sequence. Too few clusters will result in less sequence and too many clusters will cause signal interference during sequence imaging and will cause excessive data loss. Achieving the optimal cluster density is difficult but critical to obtain the greatest amount of high quality sequence data. Utilizing our experience, our staff will try to determine the optimal concentration for your particular library to achieve the greatest amount of sequence. We will do our best to generate as much quality sequence as we can from your library but cannot guarantee a specific sequence yield, only a range. You may decide to run a single channel, evaluate the results and then run additional channels at a different concentration should you require more sequence (this approach will result in lower costs but may involve a very significant time lag depending upon instrument availability). Alternatively, you may wish to run multiple channels at various concentrations all at once (this approach will result in higher costs but no loss in time will occur in obtaining all the data you require). The strategy of cost versus time is one that must be made by each individual lab. Once a library is created, it provides enough material for many flow cell channels and it is stable for a long period of time. The sample may be run repeatedly through the cluster generation and sequencing steps until enough data is produced. Normal cluster and sequencing fees will apply.

Sequencing by Synthesis

Sequencing on the Illumina GA instrument will be carried out only by facility personnel. At this time there is little that can be done to alter run parameters to improve data quality or quantity, though Illumina continues to improve chemistries, optics, etc. in an effort to achieve this. Our facility is in constant contact with Illumina to be sure that we are utilizing the latest protocols and reagents on the GA platform as well as upgrading instrument parts as necessary. Quality metrics generated during the run provide an indication of the sequence quality and the run can be stopped after the first base cycle to save reagent costs should the flow cell contents be determined to be of poor quality. Run times vary based on the number of cycles collected but range from one to seven days (for example an eight channel, 36 cycle sequencing run currently takes 84 hours-3.5days to run).

Back to top

Bioinformatics

The bioinformatics procedures related to sequencing on the Illumina GA platform are split into three basic categories:

  • Base Calling.
  • Sequence Assembly.
  • Data Analysis.

Base Calling

Each flow cell has eight channels and the Genome Analyzer runs one flow cell at a time. This process generates approximately 0.7 TB of raw image data - 90 GB per channel. After processing; 0.5 TB of FASTA sequence files and quality and intensity meta-data are generated.

We process the data on the Harvard Medical School Orchestra cluster and we utilize Harvard Medical School's mass disk storage array as a data repository.

The sequence files generated will contain approximately 3 million reads per channel. Generally, these files will be available within two to three days following the completion of the sequence run. Options for retrieving this data follow:

Harvard Users:
  • HMS Storage - We encourage you to create an account on the mass storage array and directly access your data there.
  • FTP - FASTA sequence files only.
  • External Hard Drive - If you are interested in retrieving your raw data image files and the associated meta-data; you will need to make arrangements for file transfer to an external drive with our facility computer staff. There are additional fees associated with this.
External Users:
  • FTP - FASTA sequence files only.
  • External Hard Drive - If you are interested in retrieving your raw data image files and the associated meta-data; you will need to make arrangements for file transfer to an external drive with our facility computer staff. There are additional fees associated with this.

Sequence Assembly

For projects that involve resequencing an organism with a known reference sequence, assembly of the short reads is necessary. We are investigating the possibility of providing this as part of our overall service or if there will be additional fees involved. Please contact us for more details.

Data Analysis

Further manipulations of your data will be necessary, such as alignments, mutation detection, tag counting, etc. Depending on the nature of the project, the subsequent bioinformatics analyses may be straight-forward or quite complicated. We will endeavor to assist you with these analyses where possible but additional fees may be involved. Contact us at nextgen@genome.med.harvard.edu for more details.

Other Data Issues

The volume of data that our facility will generate on an annual basis is too great for us to maintain long-term storage of all the raw image files. Therefore long-term storage and backup costs will be quoted for individual projects. Contact us at nextgen@genome.med.harvard.edu for more information.

Back to top

Pricing

Please contact us at nextgen@genome.med.harvard.edu for pricing information for your specific project.

Back to top

Frequently Asked Questions

No FAQs yet.

Please watch this site for continued updates!

Back to top

 

 

 

©2004 Harvard Medical School