A Perl program by Alex Williams.
University of Tennessee, Memphis
Documentation last updated August, 2002.
Check http://www.nervenet.org/genome_mixer/ for updated information.
Genome Mixer is a Perl program designed for simulating mouse breeding experiments. You can use it with just about any animal, however, if you change the number and length of the chromosomes.
The files that make up the entire Genome Mixer package:
Perl source files (*.pm files):
- GenomeMixer.pm - main program. You run Genome Mixer by running this Perl fileall of the other files are referred to from this file. You will NOT need to edit this file; all the user-defined configuration options are in the "UserDefined.pm" file.
- Chromosome.pm - specs for the chromosome objects
- GeneRange.pm - specs for "gene ranges," which are used to represent "runs" of genetic data. (instead of storing a thousand "A"s from position 0 to 1000, we store the "generange" "<A, 0, 1000>")
- MarkerCollection.pm - deals with the marker data after it is read in from the user-specified file
- Mouse.pm - stores the data for each mouse
- MouseIO.pm - handles input/output
- MouseCell.pm - aggregates mice together, for litters
- Selection.pm - selection by genotype (favorable, lethal alleles, etc)this is NOT yet incorporated completely into GenomeMixer
- Glob.pm - global variables.
Perl source file to be edited by the end user (you):
- UserDefined.pm - user-defined variables (names of input/output files, etc). Under normal circumstances, this is the only Perl file that you will want to edit.
Spreadsheet-format input files:
Note: The following two spreadsheet files should be saved as tab-delimited spreadsheet files. If you use Excel or some other spreadsheet program to edit this file, you should be sure to export it as a tab-delimited file. If they are not saved as tab-delimited files, Genome Mixer will be totally unable to read them, and may give you cryptic errors.
- Spreadsheet configuration file (tab delimited spreadsheet).This file contains the information about how many chromosomes animals of this species have, how long each chromosome is, and what breeding scheme is to be executed. The name of this file can be anything, but you will have to specify it by editing the UserDefined.pm file (instructions).
- Marker description file (tab delimited spreadsheet). This file contains the names and locations of markers. The name of this file can be anything, but you will have to specify it by editing the UserDefined.pm file (instructions).
Information about the Marker Description File:
Setting your species (number of chromosomes, chromosome length)
The default chromosome configuration is for mice. If you want to simulate an animal with more or fewer chromosomes, you can edit the configuration file spreadsheet. The line that begins with "<NON-SEX CHR NAMES>" contains the chromosome information for the autosomes, and the "<FEMALE SEX CHR>" and "<MALE SEX CHR>" lines below handle the sex chromosomes, which are set to "X" and "Y" by default, but can be renamed if desired.
Here is the format of the data for each chromosome: "
My_chromosome : 197 MB, 107 cM
"this creates a new chromosome with the name "My_chromosome" and the length of 197 megabases (197,000,000 bases) and 107 centimorgans. You can add as many chromosomes as you like, or just have one. If you are using Excel to edit the configuration file, each set of chromosome & information should be in its own cell (this means that when you export as tab-delimited, each set of chromosome/data will be separated by a tab).The sex chromosomes are edited in the same way, except that there may only be one male and one female sex chromosome. Additional sex chromosomes will be disregarded.
Note that all animals will have the same number of chromosomes (except for the sex chromosomes), males are assumed to be heterogametic (XY, different sex chromosomes), and females are assumed to be homogametic (XX, same sex chromosomes). If you are simulating an animal in which XY is female and XX is male (or some other configuration) then you may have to disregard the sex chromosomes or adjust the results accordingly.
Setting up the desired breeding scheme
The breeding scheme is also set in the configuration file spreadsheet (the name of this file may vary, it can be set by the user in the "UserDefined.pm" file).
Adding markers to the chromosomes
Without markers, you will have no data. To add markers to the chromosomes you specified in the configuration file spreadsheet, you will need to edit the marker description file.
Choosing the input files (editing UserDefined.pm)
You need to tell Genome Mixer the names of your configuration file spreadsheet and marker description file. Additionally, if you are not using Mac OS 9, you will need to specify the current path to your Genome Mixer folder. To give Genome Mixer this information, you will need to open the "UserDefined.pm" file and edit the three variables below.
Note: If you have a plain-text text editor (BBEdit, SimpleText, Emacs, Notepad, etc), use it to edit "UserDefined.pm". A fancy word processing program like Microsoft Word will want to save in its own proprietary format with additional data (font type, margins, etc), and the file will no longer run as a Perl script. If you really want to use Word, make sure to save UserDefined.pm as "text only" when you are done editing it.
my($fullPathStart)
= "some_path/folder/";
$fullPathStart
= '';
#<- don't forget the semicolon). In other
words, there is no need for the $fullPathStart in Mac OS 9, just
set it to equal "" or '' (double quotes and single quotes
both work)./Volumes/Other_Hard_Drive_Name/More_path/
.
If the input files are on your external drive "Pescado,"
inside the "Extras" folder, then the path will be
$fullPathStart = "/Volumes/Pescado/Extras/";
my($configFilename)
= "config_simple.txt";
my($markerLocationsFilename)
= "marker_locations.tab";
$configFilename
description above for more specifics.After editing these three variables, you should save and close "UserDefined.pm." Now you should be able to run the Genome Mixer Perl script (instructions below).
Running the Perl script
By now, hopefully you have set up all the input files the way you want them (or you are using our default files, because you just want to see what Genome Mixer does). Unfortunately, you can't just double-click on the Perl script to run it, so here are platform-specific instructions for making it work.
Instructions below are for Mac OS 9, Mac OS X, Linux / Unix, and Windows. Currently, the Linux / Unix and Windows instructions are not available.
If you are using Mac OS 9, then running this script with a large (> 200 individuals) breeding scheme may tie up your computer for minutes or hours. It is also advisable to quit all other applications before running this script. Make sure to save all your other documents; MacPerl sometimes crashes if it hasn't been assigned enough memory to handle the specified number of animals. You can normally abort a calculation by pressing Command-period, but this will cause all calculations to be lost, and there is no way to resume a calculationit will have to be redone from the beginning.
Using BBEdit for OS 9:
1. The first thing you should do is download a copy of MacPerl. This is a free program for Mac OS 9. Once you have downloaded, decompressed, and installed it, you can use BBEdit with MacPerl.
2. You should assign MacPerl quite a bit of memory (RAM), or you will get "Out of Memory" errors when you run long or complex breeding schemes. You can assign memory to MacPerl by finding the "MacPerl" application, getting info on it (click on it, then select "Get Info..." from the File menu), and selecting the "memory" menu option in the get info box. Give MacPerl as much memory as you can spare; a couple hundred megabytes is not a bad idea (it will probably run with much less, but if it runs out of memory, all the calculations it had done up to that point will be lost).
3. If you have a copy of BBEdit, this is the easiest way to run Genome Mixer. Simply double-click the "GenomeMixer.pm" file, and it should open in BBEdit. If it doesn't open in BBedit, you should launch your copy of BBEdit, select "open" from the "file" menu in BBEdit, and then select "GenomeMixer.pm" from the "open file" dialog box.
4. Now that GenomeMixer.pm is open and is the frontmost (active) document, look in the menu bar; toward the right of the BBEdit menus, you should see a menu labeled with a brown camel-shaped icon instead of text. This is the Perl menu; from it, select "Run in MacPerl" and the Perl script should run. If it doesn't, see the "Troubleshooting" section of this document.
Using MacPerl:
If you don't have BBEdit, you can run GenomeMixer.pm directly from MacPerl.
1. First, you will have to download MacPerl, decompress it, and install it. MacPerl is a free download for Mac OS 9.
2. You should assign MacPerl quite a bit of memory (RAM), or you will get "Out of Memory" errors when you run long or complex breeding schemes. You can assign memory to MacPerl by finding the "MacPerl" application, getting info on it (click on it, then select "Get Info..." from the File menu), and selecting the "memory" menu option in the get info box. Give MacPerl as much memory as you can spare; a couple hundred megabytes is not a bad idea (it will probably run with much less, but if it runs out of memory, all the calculations it had done up to that point will be lost).
3. Now, double-click on the "MacPerl" application to launch it. From the "Script" menu in MacPerl, select "Run Script..." and choose GenomeMixer.pm. This should run the script (a window will appear saying "Running Genome Mixer" or something to that effect). You can abort the program (all calculations will be lost, and nothing will be output, however) by pressing Command-period.
Using BBEdit for OS X:
1. If you have a copy of BBEdit for OS X, this is the easiest way to run Genome Mixer. Simply double-click the "GenomeMixer.pm" file, and it should open in BBEdit. If it doesn't open in BBedit, you should launch your copy of BBEdit, select "open" from the "file" menu in BBEdit, and then select "GenomeMixer.pm" from the "open file" dialog box.
2.Now that GenomeMixer.pm is open and is the frontmost (active) document, look in the menu bar; toward the right of the BBEdit menus, you should see a menu labeled with a brown camel-shaped icon instead of text. This is the Perl menu; from it, select "Run," and the Perl script should run. If not, see the "Troubleshooting" section, below.
Additional info: Because you are using Mac OS X, you don't have to deal with assigning memory to MacPerl or BBEdit. Also, the script is much less likely to crash; running under OS X is preferred to running in OS 9. Remember that the linebreaks are different in OS X and OS 9, so the Perl script for one version won't run in the other unless you change the linebreak format (OS X uses Unix rather than Macintosh linebreaks for the Perl scripts and input files).
Using the Mac OS X Terminal:
(see the Linux / Unix instructions below)
Unix / Linux / Mac OS X Terminal Instructions:
No instructions yet.
No instructions yet.
Importing data into QGene (Mac OS 9):
You should be able to just select "Open" from the File menu, select the population file that Genome Mixer output (it should be the ".pop" file), then select the map file (with the suffix ".map").
Importing data into Map Manager QTX (Mac OS 9):
You will need to open Map Manager QTX, and select the "Text..." option from the Import submenu of the File menu (File menu -> Import submenu -> Text... option).
Now a new dialog box will appear. You must set the "# Progeny" option to the correct number of progeny in the output file (given as part of the filename: n = 100, n = 150, etc: n = "number of animals"). Then you must click the "Apply" button that is in the top-right side of the dialog box.
The default output for Genome Mixer is for Mat, Pat, and Het to be "A" "B" and "H," respectively. Those values should work with the default files, but "A" and "B" will change if you name your parental strains something else. If you had parental strains "BB" and "DD" (indicating inbred "B" and "D" lines), you would set Mat to "B" and Pat to "D," or vice versa.
Now say "OK," and select the Map Manager QTX output file that Genome Mixer produced.You will then be given another dialog box, listing the options for reading the file. You should check the following options on the left: "Dataset," "Progeny," "Chr," "Name" (for Chr), "Name" (for Locus), "Alias," and "Geno." All of the other checkboxes should be unchecked, leaving you with a total of 7 checked boxes. "Between Items" should be set to "Tab (t)," "Between Records" should be "Carriage Return (r)" (that's the Mac OS 9 linebreak format), and "Between Genotypes" should be set to "None." You could also view a screenshot of the correctly-checked options.
Including information for making changes to the Genome Mixer script.
Where is recombination handled, so I can change it?
The "
pickRecombLocs(chromsome_number, marker_collection_pointer)
" function picks locations on a chromosome for recombination (designating the locations in bases, not cM), puts them in an array, and returns that array. You want to change that function and the sub-function that checks for interference ("checkProposedPoint"for checking a proposed breakpoint to see if interference prevents a recombination from occurring there).Just to give you an idea of how it works, find the line "
return @recombLocs;
" at the end of the "pickRecombLocs
" function. If you were to change that line to "return 105;
", then on every chromosome, there would be a breakpoint (and a recombination) at the 105th base pair for every mouse.There is currently no support for different rates of recombination on different chromosomes or between males and females (except for the sex chromosomes: X recombines only in females, and Y never recombines)
How do I change the minimum distance required between recombinations?
First, note that the distance is given in bases, not cM or megabases. A reasonable interference distance for a mouse might be 25000000 bases (25,000,000, or 25 megabases). The perl file doesn't like commas, so you'll have to enter 25 million as "25000000," not "25,000,000." Count the number of zeros carefully: it's very easy to leave out a zero or put in an extra one.
Also, this minimum distance is an absolute threshold. If Genome Mixer has picked a recombination at, say, 50,000,000 bases, then (if the minimum distance between recombinations is 25 million) there is a 0% chance of another recombination occurring between 25,000,000 and 75,000,000 bases (50 million ± 25 million), but a recombination at 24,999,999 or 75,000,001 is perfectly OK.
To actually make the change:
- Ideally, you should use BBEdit (Mac), or emacs (Mac OS X, Linux/Unix), or TextPad, or some program that is capable of handling non-formatted text. You can use Microsoft Word, but you'll have to make sure to save the modified file in text-only format. If you let Word save it in regular Word format, you won't be able to run the script anymore.
- Find the line in "UserDefined.pm" that says "
my($kNO_RECOMB_DIST_BASES) =
(some number);
"
(My default value was "my($kNO_RECOMB_DIST_BASES) = 25000000;
")- Now you can change the "25000000" to whatever number you want (well, no negatives, and no fractions). Again, double-check the number of zeros
Input file problems (usually linebreak issues):
Problem: Why is the program finding the input files (there aren't any "can't find file" errors) but then not reading the data properly? Or: Why can't I run this Perl script on Mac OS 9 or Mac OS X? I get errors like "Can't locate Glob.pm in @INC." or "BEGIN failed--compilation aborted."
Solution: This is probably a problem with the type of linebreaks used in the input files (first problem) or the Perl source files (second problem). Mac OS 9 expects line breaks in "Macintosh" format (the return "\r" character). Mac OS X (and Unix/Linux systems) expect the linebreaks to be in "Unix" format (the newline "\n" character). Windows uses both "\r" and "\n" in sequence to represent the end of a line. To fix this problem (on the Macintosh), you need to use BBEdit or an equivalent linebreak-changing program to change the linebreaks. If you are using Mac OS 9 or earlier, you will want to open each Perl or input file and select "Macintosh" from the linebreak menu (the little document-shaped icon on the button bar of each file in BBedit). If you are using Mac OS X, you will want to select "UNIX" from that menu. Now, save and close each file that you changed. Note that if you try to use an input file that has the wrong type of linebreaks, the Perl script won't know how to read it. If you try to run a Perl script that has incorrect linebreaks, the Perl interpreter will give you cryptic errors and refuse to run. Be aware that the incorrect-linebreak files will look fine in most text editors.
Quick summary: Make sure the Perl (*.pm) files and the configuration spreadsheet ($configFilename) and marker description file ($markerLocationsFilename) all have the proper linebreak types. If even one file has the wrong linebreak type, then you'll get all sorts of bizarre errors. This will probably only arise as an issue if you were using Mac OS 9 and start up in X, or vice-versa, or you downloaded a version of Genome Mixer that doesn't match the platform you are running it on. It is also possible that someone uploaded an incorrect-linebreak copy to our server, so if you find that one of the files has different linebreaks than it says it should, please let us know.
0.52 (August 2002)
- Minor bug fixes, more information is now output to the user.
0.51 (July 2002)
- Fixed cM output in QGene, added Map Manager QTX output.
0.50 (July 2002)
- First released version.
Alex Williams (programmer): "alexgw", at (@) "uclink.berkeley.edu" . You can email me with bug reports, feature requests, etc.
Dr. Robert Williams: "rwilliam", at (@) "nb.utmem.edu". University of Tennessee, Memphis: (901) 448-7018.
Genome Mixer Documentation by Alex Williams, 2002