CMMG Biocomputing Facility
Online Help Guides

GCG PileUp and List Files

[ Genetics Computer | GCG | Windows 95 | Online Guides | CMMG Home ]


> QUESTION ABOUT PILEUP:
> I'm trying to use "pileup" to create a mutliple sequence alignment, but
> I think that it requires a list file for the input.  I don't understand
> how to create a list file.  I have the appropriate sequences in my home
> directory on the Genetics system, and I have read the GCG manual, but
> there seems to be something missing in their explanation.  Thanks for
> your help. 

OK, here is how it's done.  When pileup asks what sequence(s) you want to
input, you can use either a list file (a file containing a list of
sequence names) or a group of individual sequences that can be specified
with a wild-card (e.g. *.seq).  If using your own sequence files, they
should be located in your current working directory, or you can specify a
full path, such as /home/mydir/*.seq  The sequence files must be in GCG
format.  If they are not, you can use reformat to convert them. 

To use a list file, create the file with a text editor (such as pico on
Genetics).  First, type pico to start the editor.  Second, type 2 dots on 
the first line and press return, then type the name of one sequence on each
subsequent line, like this: 

..
one.seq
two.seq
another.one
and_so_on.two
the_last.seq

Then exit (^X) and save the file with some name such as sequence.list 

The order of the sequences in the list file is not important.  The
sequence names can be the names of sequence files you have in your current
working directory, or you can specify a full path to each sequence file in
other directories or in the databases. 

Having the 2 dots before the sequences is VERY important.  You can get
fancier by including more info in the file if you want (explained on pages
2-17 to 2-23 of the GCG User's Guide), but a basic list file has just the
elements described above.

To use a list file as input for pileup, you have to put an @ in front of
the name of the list file when pileup asks you for the sequence(s), such
as @sequence.list   Again, the list file should be located in your current
working directory, as should the individual sequences included in the list
file (unless you specified a full path for each sequence in the list). 

That's all you need to create and use a list file.  If you really want to 
know, the 2 dots (..) are used by GCG to divide the optional description 
from the actual list of sequences, so you can put anything you want above 
that dividing line.  However, the dividing line has to be there even if 
there is no description above it, since GCG does not start reading data 
from a file until it comes to the 2 dots.

If you are uncomfortable using the pico text editor on Genetics (even though
it's quite simple to use), use a text editor on your PC or Mac (don't use
a word processor, unless you can save the file as ordinary text) to
create the list file and then transfer the file to Genetics with ftp.

It is also easy to create a list file using UNIX commands and output
redirection.  Put all your sequence files that you wish to include in the
list in a single subdirectory, cd into that directory, then type:

ls -1 > sequence.list

NB the character after the - is a numeral one (1).  That will list all
the files in your directory, 1 file per line, and place the results into
a new file named sequence.list.  Use pico to edit the list file and put
in the 2 dots at the top as described above, and check to make sure
all/only the sequences you want are included in the list (you may need
to delete the name sequence.list from itself).

It is also easy to create list files from within SeqLab, the X Windows
interface to GCG.

Run pileup on your list file like this: pileup @sequence.list
Pileup jobs are automatically submitted to the batch que.  When pileup is
finished, it will write 2 files with your results, sequence.msf (the
"sequence" part of the name will be replaced by the name of your list 
file, or whatever you specified when you ran pileup), which contains the 
multiple sequence alignment, and pileup.figure, which can be used with 
the GCG figure program to create a graphical representation of your 
alignment (please read the GCG Graphics/Printing help file for more 
information about GCG graphics).  The *.msf file can be edited manually
to improve the alignment if needed (please read the multiple sequence 
alignment section of the GCG manual for further details).


[ Genetics Computer | GCG | Windows 95 | Online Guides | CMMG Home ]


Send comments to: dwomble@genetics.wayne.edu

Copyright © 2001, David D. Womble.