Project Organization and Management for Genomics

Good data organization is the foundation of any research project. It not only sets you up well for an analysis, but it also makes it easier to come back to the project later and share with collaborators, including your most important collaborator - future you.

Organizing a project that includes sequencing involves many components. There’s the experimental setup and conditions metadata, measurements of experimental parameters, sequencing preparation and sample information, the sequences themselves and the files and workflow of any bioinformatics analysis. So much of the information of a sequencing project is digital, and we need to keep track of our digital records in the same way we have a lab notebook and sample freezer. In this lesson, we’ll go through the project organization and documentation that will make an efficient bioinformatics workflow possible. Not only will this make you a more effective bioinformatics researcher, it also prepares your data and project for publication, as grant agencies and publishers increasingly require this information.

In this lesson we’ll be using data from a study of experimental evolution using E. coli. More about this dataset. In this study there are several types of files

Throughout the analysis we’ll also generate files from the steps in the bioinformatics pipeline and documentation on the tools and parameters that we used.

In this lesson you will learn:

Getting Started

Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to insure the proper setup of tools for an efficient workflow.
These lessons assume no prior knowledge of the skills or tools.


This lesson requires a spreadsheet program, such as Excel or OpenOffice, and a web browser.
To most effectively use these materials, please make sure to install everything before working through this lesson.

For Instructors

If you are teaching this lesson in a workshop, please see the Instructor notes.


Setup Download files required for the lesson
00:00 1. Data Tidiness How to collect and structure the data about your sequencing data
00:30 2. Planning for NGS Projects How to plan and organize your data for a genome sequencing project
01:00 3. Examining Data on the NCBI SRA Database How to work with public data in the NCBI SRA
01:30 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.