[ Genetics Computer | GCG | Windows 95 | Online Guides | CMMG Home ]
> QUESTION ABOUT PILEUP: > I'm trying to use "pileup" to create a mutliple sequence alignment, but > I think that it requires a list file for the input. I don't understand > how to create a list file. I have the appropriate sequences in my home > directory on the Genetics system, and I have read the GCG manual, but > there seems to be something missing in their explanation. Thanks for > your help. OK, here is how it's done. When pileup asks what sequence(s) you want to input, you can use either a list file (a file containing a list of sequence names) or a group of individual sequences that can be specified with a wild-card (e.g. *.seq). If using your own sequence files, they should be located in your current working directory, or you can specify a full path, such as /home/mydir/*.seq The sequence files must be in GCG format. If they are not, you can use reformat to convert them. To use a list file, create the file with a text editor (such as pico on Genetics). First, type pico to start the editor. Second, type 2 dots on the first line and press return, then type the name of one sequence on each subsequent line, like this: .. one.seq two.seq another.one and_so_on.two the_last.seq Then exit (^X) and save the file with some name such as sequence.list The order of the sequences in the list file is not important. The sequence names can be the names of sequence files you have in your current working directory, or you can specify a full path to each sequence file in other directories or in the databases. Having the 2 dots before the sequences is VERY important. You can get fancier by including more info in the file if you want (explained on pages 2-17 to 2-23 of the GCG User's Guide), but a basic list file has just the elements described above. To use a list file as input for pileup, you have to put an @ in front of the name of the list file when pileup asks you for the sequence(s), such as @sequence.list Again, the list file should be located in your current working directory, as should the individual sequences included in the list file (unless you specified a full path for each sequence in the list). That's all you need to create and use a list file. If you really want to know, the 2 dots (..) are used by GCG to divide the optional description from the actual list of sequences, so you can put anything you want above that dividing line. However, the dividing line has to be there even if there is no description above it, since GCG does not start reading data from a file until it comes to the 2 dots. If you are uncomfortable using the pico text editor on Genetics (even though it's quite simple to use), use a text editor on your PC or Mac (don't use a word processor, unless you can save the file as ordinary text) to create the list file and then transfer the file to Genetics with ftp. It is also easy to create a list file using UNIX commands and output redirection. Put all your sequence files that you wish to include in the list in a single subdirectory, cd into that directory, then type: ls -1 > sequence.list NB the character after the - is a numeral one (1). That will list all the files in your directory, 1 file per line, and place the results into a new file named sequence.list. Use pico to edit the list file and put in the 2 dots at the top as described above, and check to make sure all/only the sequences you want are included in the list (you may need to delete the name sequence.list from itself). It is also easy to create list files from within SeqLab, the X Windows interface to GCG. Run pileup on your list file like this: pileup @sequence.list Pileup jobs are automatically submitted to the batch que. When pileup is finished, it will write 2 files with your results, sequence.msf (the "sequence" part of the name will be replaced by the name of your list file, or whatever you specified when you ran pileup), which contains the multiple sequence alignment, and pileup.figure, which can be used with the GCG figure program to create a graphical representation of your alignment (please read the GCG Graphics/Printing help file for more information about GCG graphics). The *.msf file can be edited manually to improve the alignment if needed (please read the multiple sequence alignment section of the GCG manual for further details).
[ Genetics Computer | GCG | Windows 95 | Online Guides | CMMG Home ]
Send comments to:
dwomble@genetics.wayne.edu
Copyright © 2001, David D. Womble.