JoinMap ® 5Software for the calculation of genetic linkage maps in experimental populations of diploid species
JoinMap is an MS-Windows ® program for the calculation of genetic linkage maps in experimental populations of diploid species. The software can deal with the common types of experimental population, including the full-sib family (F1) of a cross between two individuals of an outbreeding species. It provides high quality tools that allow detailed study of the experimental data. The intuitive user interface invites to a better exploration of the data. It is easy to perform various diagnostical tests, both before and after the actual map calculation. Subsequently, any locus or individual with possibly erroneous observations can be excluded from the calculations by a simple mouse-click, after which a new and improved map may be calculated.
This present version number 5 builds on its predecessor, v4.1. While preserving the user interface and workflow with the program on the whole, v5 has many enhancements over v4.1, several of which are quite significant. Most are intended as improvements for working with very large sets of genetic markers. In brief, the major technical improvements are: (a) the change towards 64-bit software, (b) the use of an embedded database system for storing all data, and (c) the parallelization of some calculations. The change to 64-bit allows access to more system memory, which is useful for very large datasets. Using an embedded database system greatly improves the responsiveness of the user interface with large datasets; due to the embedded property, the database system requires no special installation and usage instructions, everything is taken care of by the program. Finally, parallelization increases the computational speed on systems with a multi-core processor, of which all available computational cores can be utilized. In addition to these technical improvements, several aspects of the various algorithms and the user interface are enhanced. The important advances are described in the next section.
Enhancements introduced with v5
The v5 executable program is a 64-bit MS Windows application. This means that it can have access to more than the 32-bit limit of 4 GB memory (i.e. RAM) of the computer. Obviously, the program requires that it runs under a 64-bit version of the MS Windows operating system and that the computer has more than this 4 GB of memory. An amount of 16 GB of memory is recommended for common analyses. Access to more memory means in practice that computations can take place in memory without having to store intermediate results on the hard drive, which results in higher speeds.
- Database driven
When displaying data in tables, previous versions of JoinMap retrieved always the entire data from file. This could make the program very slow for large file sizes, for instance when dealing with pairwise data. The various data within JoinMap v5 projects are now stored in databases. Displaying these data in tables or as plain text from within the program is made in such a way that only the currently visible part on screen is retrieved from the database file. This approach is called database driven. Using a database system this way greatly improves the responsiveness of the user interface with large datasets. With small datasets the responsiveness can be slightly slower than with previous JoinMap versions due to the database system overhead. The database system used is the embedded database engine called SQLite. It does not require any database server installation or maintenance. As a final remark, it is good to realize that the speed of the hard drive can become a limiting factor in dealing with large datasets: the enormous amount of results of computations, e.g. on all pairs of markers, must be stored in huge database files.
Some calculations are enhanced to be able to run in parallel. Modern processors have often multiple computation cores that can execute separate calculations simultaneously. JoinMap v5 can make use of all available cores of the processor, thus the more cores the processor has, the faster the computations: ideally, the speed scales linearly with the number of cores, except for a small amount of overhead. However, it can be difficult or sometimes even impossible to change algorithms towards a parallel approach. In JoinMap v5 the following four algorithms are made to run in parallel: (1) the calculation of the locus similarities and (2) of the individual similarities in population nodes, (3) the determination of the groupings in population nodes, and (4) the computation of the recombination frequencies in group nodes.
A parallel algorithm is only really useful if all required data are accessible in memory (i.e. RAM) instead of from file, as files cannot be accessed in parallel. In practice, this can be problematic for the determination of the groupings of very large datasets as the amount of pairwise data scales quadratically with the number of markers. Therefore, these computations are made in such a way that they dynamically switch to regular serial (vs parallel) computations using temporary files on the hard drive if it turns out insufficient memory is available.
- Identical markers
The production of a reliable high resolution linkage map does not only require many markers, it also requires a population of sufficient size containing the necessary segregation information. Unfortunately this latter aspect is not always the case. In such instances, there will be a large amount of redundancy in the marker data, which should lead to many markers having identical observations over all individuals. In most computations JoinMap v5 determines which markers (loci) have an identical segregation pattern and will perform the requested computation only for a single representative marker of each set of identical markers. Subsequently, it will present the same results for the identicals next to their representative. Detecting and removing identical markers in the population nodes, as advised in previous versions of JoinMap, is therefore not necessary in v5.
- Multipoint recombination frequency estimation
In the previous versions, the multipoint recombination frequencies in the maximum likelihood mapping procedure were estimated using the Gibbs sampler, which is a very time consuming method, being a Monte Carlo Markov chain approach used for the Expectation-Maximization (EM) algorithm. In v5 the Gibbs sampler is replaced by a much faster deterministic EM algorithm (implemented using a so-called forward-backward algorithm). Besides being obtained faster, the estimates are also more accurate.
- Batch computations
Often the same computations are to be executed on multiple nodes in the navigation tree, for instance computing a map for all group nodes of a population. To make this easier to do, it is now possible in v5 to have the program execute identical computations in batches. Nodes of the same type in the navigation panel can be marked to be part of the batch (by right-clicking). When subsequently the calculations for a certain selected tabsheet of a node in the batch are requested, the same calculations will be done for each node in the batch.
Any JoinMap project may grow to a situation where the navigation tree contains very many nodes. At some point, certain nodes may be regarded redundant, at least for the time being. JoinMap v5 offers the possibility to move entire tree branches to be stored under an archive node. There, the data remain available for viewing and even for computations, while at the same time the more essential part of the navigation tree remains more clearly arranged. Archived branches can be returned to the regular project tree if needed.
- Dataset node
The dataset node functionality is renewed to be able to accommodate thousands of markers. For instance, a set of data of 50,000 markers for 100 individuals copied from an MS-Excel spreadsheet can be pasted into a JoinMap Dataset tabsheet, which takes less than half a minute on a regular PC.
Next to this, the Dataset menu is extended with the function Flip Marked Genotypes. Executing this function will recode marked genotypes in the opposite way as if exchanging the parents, e.g. in an F2 exchanging A's with B's and vv. The recoding depends on the population type; if applicable, the segregation type, the phase type and/or the classification type are recoded correspondingly. This function is useful in case the coding had been done inversely for some of the loci, leading to recombination frequency estimates larger than 0.5, which may show up on the Suspect Linkages tabsheet.
Another useful function was added to the Dataset menu: Create Dataset from Data Tabsheet. This function creates a new dataset node and fills it with the genotype data of a marked population, group or map node that has a Data tabsheet with genotype data. The data of loci and individuals that are checked as Excluded on the Loci and Individuals tabsheets, will not be transferred to the new dataset node, leading to a 'cleaned up' dataset.
- Project reconstruction
A major effort was made for JoinMap to be fault tolerant with project files. In the hopefully very rare occasion that the project cannot be properly opened or appears corrupted, you may have JoinMap attempt to reconstruct the project database. The program will attempt to reconstruct the project database as good as possible.
Besides the above described enhancements, there are many smaller but quite useful improvements:
- In v5 each node has its own Notes tabsheet, useful for your administration.
- The program maintains a history of the last 50 messages that were shown on the status bar (as long as the program is active); this message history can be accessed by right-clicking on the status bar.
- The progress bar is improved to give a better representation of the progress of the executing procedure. Some database actions cannot be predicted for their duration, so that the standard progress bar growing to 100% cannot be used. To give feedback that the program really is busy in such cases, the progress bar area will show sequences of '>' symbols.
- Data tabsheets from multiple marked nodes (population, group or map nodes) can be exported as one joint loc-file, e.g. for use with MapQTL. Similarly, Map tabsheets from multiple marked map nodes can be exported as one joint map-file.
- The map building approach of the ML mapping algorithm was adjusted in order to produce more reliable results for high density maps, while at the same time speeding up the first levels of the map building.
- Map nodes resulting directly from mapping contain various tabsheets with diagnostics about the loci, such as the NN Fit and Stress. In these tabsheets, rows (i.e. loci) can be marked and subsequently the Map menu function Exclude Marked Loci in Group Node can be applied. This will check the Exclude checkboxes of the loci in the map node's grandparental group node that correspond to the marked rows.
- The Invert Map function will now invert the map order on all relevant tabsheets of a map node.
New feature introduced with version 4.1The only, but very important, enhancement of version 4.1 of JoinMap is the ability to use the multipoint maximum likelihood mapping algorithm on populations of type CP, i.e. the outbreeding species full-sib family. The new method has a very high speed at computing dense maps for CP populations. For instance, a linkage group of about 250 good quality SNP markers (a mix of <hkxhk>, <lmxll> and <nnxnp> segregation types) is estimated in about 8 minutes (on a regular PC). The method is described in a paper that was accepted for publication:
Projects of JoinMap version 4 can be opened and will automatically be converted into projects of version 4.1.
New features introduced with version 4With this edition JoinMap is taking another big step in linkage analysis software! Many new features were added, some improving the user interface, others supplying more powerful methods, for instance:
- data management: copy and paste your marker data from MS-Excel into JoinMap; easily check for coding errors;
- new population types: advanced intermated families and advanced backcross families, of any given generation;
- more criteria to study the linkage group formation: linkage LOD, independende test P-value, recombination frequency;
- use existing maps (of multiple groups) or existing groupings to create the linkage groups of a new population;
this is very handy when employing markers with known map positions in new populations, and also when expanding your map with an additional set of markers;
- use the so-called strongest cross link information to verify assignments of markers to groups;
- very fast computation of high density maps with the new mapping algorithm according to Jansen, et al, 2001.
TAG 102: 1113-1122 based on Monte Carlo Maximum Likelihood: the algorithm needs only couple of minutes for a 100
markers linkage group!!;
(the new mapping algorithm and the regression mapping algorithm of JoinMap 3.0 are available side by side)
(for the outbreeder full-sib family (CP) the new mapping algorithm is limited to pseudo-testcross analyses, i.e. a map for each of the two parental meioses separately)
- get an idea of plausible map positions of markers;
- graphical genotyping;
- bar and XY charts;
- print preview.
- an intuitive MS-Windows user interface, which adds a lot of practical functionality
- all analyses are based upon just a single input file in plain text format with a flexible layout
- also imports MAPMAKER raw data format (data types: f2 intercross, f2 backcross, ri self)
- experimental population types: BC1, F2, RIL, F1-derived and F2-derived DH, outbreeder full-sib family
- powerful determination of linkage groups
- automatic determination of linkage phases for outbreeder full-sib family
- several diagnostics, before and after the actual map calculation:
- test segregation distortion
- check similarity of loci
- check similarity of individuals
- calculate genotype probabilities conditional on map and flanking genotypes to discover double recombinations
- test heterogeneity of recombination estimates between different populations
- combine ('join') data derived from several sources into an integrated map
- map charts, with many adjustable features and exportable to MS-Word ® and MS-PowerPoint ®
- copying of results to clipboard for additional use in MS-Excel ®
- print or export results, e.g. export maps for use in MapQTL ®
- no limits to the amount of loci, linkage groups, etcetera, apart from the physical memory (RAM) of the computer
- manual in Acrobat ® Reader PDF file format
- easy-to-use InstallShield ® installer
LimitsThe facts that JoinMap v5 is 64-bit software and uses a database system make it suitable for working with thousands of loci. The software does not have a built-in fixed limit regarding the maximum number of loci, with the exception of the Dataset tabsheet which can accomodate a maximum of one million loci (although that amount was not tested). The software was tested to work fine with a dataset of 50,000 loci. Please note, however, that the production of a reliable high resolution linkage map does require a population of sufficient size containing the necessary segregation information.
With this version of JoinMap, the speed of the hard disk drive will now become a limiting factor in dealing with large datasets. For instance, a dataset of 50,000 loci in an F2 population will produce a table of ~1.2 billion records of pairwise data that must be stored, resulting in a database file of over 30 GB in size, which will require some time to write.
ImpressionGet an impression of the software with the slide show:
JoinMap ® 4 slide show (size: ~0.9 MB).
Version informationJoinMap 5 is 64-bit software for the 64-bit MS-Windows platforms 7, 8, 8.1 and 10. Other MS-Windows platforms are not supported. Previous versions are no longer available.
The original version of JoinMap was published in 1993 in The Plant Journal by Piet Stam . Version 2.0 of JoinMap was presented at the Plant Genome III Conference, January 1995, San Diego, California, USA .
- Stam, 1993. Construction of integrated genetic linkage maps by means of a new computer package: JoinMap. The Plant Journal 3: 739-744.
- Stam, 1995. JoinMap 2.0 deals with all types of plant mapping populations. Plant Genome III Abstracts.