JoinMap ® 5
Software for calculating genetic linkage maps in experimental populations of diploid species
Analyse your genetic mapping experiments with powerful software. JoinMap is easy to use, is very fast and presents the analysis results in tables and adjustable charts. The results can be exported to MS-Windows ® text processing, presentation and spreadsheet software.
- What is JoinMap ?
- Version 5
- V5 Enhancements
- More ...
What is JoinMap ?
JoinMap is an MS-Windows ® program for the calculation of genetic linkage maps in experimental populations of diploid species. The software can deal with the common types of experimental population, including the full-sib family (F1) of a cross between two individuals of an outbreeding species. It provides high quality tools that allow detailed study of the experimental data. The intuitive user interface invites to a better exploration of the data. It is easy to perform various diagnostical tests, both before and after the actual map calculation. Subsequently, any locus or individual with possibly erroneous observations can be excluded from the calculations by a simple mouse-click, after which a new and improved map may be calculated.
This present version number 5 builds on its predecessor, v4.1. While preserving the user interface and the general workflow, v5 has many enhancements over v4.1, several of which are quite significant. Most are intended as improvements for working with very large sets of genetic markers. In brief, the major technical improvements are: (a) the change towards 64-bit software, (b) the use of an embedded database system for storing all data, and (c) the parallelization of some calculations. The change to 64-bit allows access to more system memory, which is useful for very large datasets. Using an embedded database system greatly improves the responsiveness of the user interface with large datasets; due to the embedded property, the database system requires no special installation and usage instructions, everything is taken care of by the program. Finally, parallelization increases the computational speed on systems with a multi-core processor, of which all available computational cores can be utilized. In addition to these technical improvements, several aspects of the various algorithms and the user interface are enhanced.
JoinMap 5 is 64-bit software for the 64-bit MS-Windows 10 platform. Other MS-Windows platforms are not supported. Previous versions are no longer available for ordering.
Enhancements introduced with version 5:
- Identical markers
- Multipoint estimation
- Batch computations
- Dataset node
The v5 executable program is a 64-bit MS-Windows application. This means that it can have access to more than the 32-bit limit of 4 GB memory (i.e. RAM) of the computer. Obviously, the program requires that it runs under a 64-bit version of the MS-Windows operating system and that the computer has more than this 4 GB of memory. An amount of 16 GB of memory is recommended for common analyses. Access to more memory means in practice that computations can take place in memory without having to store intermediate results on the hard drive, which results in higher speeds.
When displaying data in tables, previous versions of JoinMap retrieved always the entire data from file. This could make the program very slow for large file sizes, for instance when dealing with pairwise data. The various data within JoinMap v5 projects are now stored in databases. Displaying these data in tables or as plain text from within the program is made in such a way that only the currently visible part on screen is retrieved from the database file. This approach is called database driven. Using a database system this way greatly improves the responsiveness of the user interface with large datasets. With small datasets the responsiveness can be slightly slower than with previous JoinMap versions due to the database system overhead. The database system used is the embedded database engine called SQLite. It does not require any database server installation or maintenance. As a final remark, it is good to realize that the speed of the hard drive can become a limiting factor in dealing with large datasets: the enormous amount of results of computations, e.g. on all pairs of markers, must be stored in huge database files.
Some calculations are enhanced to be able to run in parallel. Modern processors have often multiple computation cores that can execute separate calculations simultaneously. JoinMap v5 can make use of all available cores of the processor, thus the more cores the processor has, the faster the computations: ideally, the speed scales linearly with the number of cores, except for a small amount of overhead. However, it can be difficult or sometimes even impossible to change algorithms towards a parallel approach. In JoinMap v5 the following four algorithms are made to run in parallel: (1) the calculation of the locus similarities and (2) of the individual similarities in population nodes, (3) the determination of the groupings in population nodes, and (4) the computation of the recombination frequencies in group nodes.
A parallel algorithm is only really useful if all required data are accessible in memory (i.e. RAM) instead of from file, as files cannot be accessed in parallel. In practice, this can be problematic for the determination of the groupings of very large datasets as the amount of pairwise data scales quadratically with the number of markers. Therefore, these computations are made in such a way that they dynamically switch to regular serial (vs parallel) computations using temporary files on the hard drive if it turns out insufficient memory is available.
The production of a reliable high resolution linkage map does not only require many markers, it also requires a population of sufficient size containing the necessary segregation information. Unfortunately this latter aspect is not always the case. In such instances, there will be a large amount of redundancy in the marker data, which should lead to many markers having identical observations over all individuals. In most computations JoinMap v5 determines which markers (loci) have an identical segregation pattern and will perform the requested computation only for a single representative marker of each set of identical markers. Subsequently, it will present the same results for the identicals next to their representative. Detecting and removing identical markers in the population nodes, as advised in previous versions of JoinMap, is therefore not necessary in v5.
Multipoint recombination frequency estimation
In the previous versions, the multipoint recombination frequencies in the maximum likelihood mapping procedure were estimated using the Gibbs sampler, which is a very time consuming method, being a Monte Carlo Markov chain approach used for the Expectation-Maximization (EM) algorithm. In v5 the Gibbs sampler is replaced by a much faster deterministic EM algorithm (implemented using a so-called forward-backward algorithm). Besides being obtained faster, the estimates are also more accurate.
Often the same computations are to be executed on multiple nodes in the navigation tree, for instance computing a map for all group nodes of a population. To make this easier to do, it is now possible in v5 to have the program execute identical computations in batches. Nodes of the same type in the navigation panel can be marked to be part of the batch (by right-clicking). When subsequently the calculations for a certain selected tabsheet of a node in the batch are requested, the same calculations will be done for each node in the batch.
Any JoinMap project may grow to a situation where the navigation tree contains very many nodes. At some point, certain nodes may be regarded redundant, at least for the time being. JoinMap v5 offers the possibility to move entire tree branches to be stored under an archive node. There, the data remain available for viewing and even for computations, while at the same time the more essential part of the navigation tree remains more clearly arranged. Archived branches can be returned to the regular project tree if needed.
The dataset node functionality is renewed to be able to accommodate thousands of markers. For instance, a set of data of 50,000 markers for 100 individuals copied from an MS-Excel spreadsheet can be pasted into a JoinMap Dataset tabsheet, which takes less than half a minute on a regular PC.
Next to this, the Dataset menu is extended with the function Flip Marked Genotypes. Executing this function will recode marked genotypes in the opposite way as if exchanging the parents, e.g. in an F2 exchanging A's with B's and vv. The recoding depends on the population type; if applicable, the segregation type, the phase type and/or the classification type are recoded correspondingly. This function is useful in case the coding had been done inversely for some of the loci, leading to recombination frequency estimates larger than 0.5, which may show up on the Suspect Linkages tabsheet.
Another useful function was added to the Dataset menu: Create Dataset from Data Tabsheet. This function creates a new dataset node and fills it with the genotype data of a marked population, group or map node that has a Data tabsheet with genotype data. The data of loci and individuals that are checked as Excluded on the Loci and Individuals tabsheets, will not be transferred to the new dataset node, leading to a 'cleaned up' dataset.
Besides the other described enhancements, there are many smaller but quite useful improvements:
- A major effort was made for JoinMap to be fault tolerant with project files. In the hopefully very rare occasion that the project cannot be properly opened or appears corrupted, you may have JoinMap attempt to reconstruct the project database. The program will attempt to reconstruct the project database as good as possible.
- In v5 each node has its own Notes tabsheet, useful for your administration.
- The program maintains a history of the last 50 messages that were shown on the status bar (as long as the program is active); this message history can be accessed by right-clicking on the status bar.
- The progress bar is improved to give a better representation of the progress of the executing procedure. Some database actions cannot be predicted for their duration, so that the standard progress bar growing to 100% cannot be used. To give feedback that the program really is busy in such cases, the progress bar area will show sequences of '>' symbols.
- Data tabsheets from multiple marked nodes (population, group or map nodes) can be exported as one joint loc-file, e.g. for use with MapQTL. Similarly, Map tabsheets from multiple marked map nodes can be exported as one joint map-file.
- The map building approach of the ML mapping algorithm was adjusted in order to produce more reliable results for high density maps, while at the same time speeding up the first levels of the map building.
- Map nodes resulting directly from mapping contain various tabsheets with diagnostics about the loci, such as the NN Fit and Stress. In these tabsheets, rows (i.e. loci) can be marked and subsequently the Map menu function Exclude Marked Loci in Group Node can be applied. This will check the Exclude checkboxes of the loci in the map node's grandparental group node that correspond to the marked rows.
- The Invert Map function will now invert the map order on all relevant tabsheets of a map node.
Overview of JoinMap's main features
- intuitive MS-Windows user interface;
- many experimental population types:
- BC1 - first generation backcross;
- RIx - recombinant inbred lines family;
- DH1, DH - family of F1-derived doubled haploids;
- DH2 - family of F2-derived doubled haploids;
- HAP1, HAP - family of haploids;
- BCpxFy - advanced backcross inbred lines family;
- IMxFy - advanced intermated inbred lines family;
- CP - outbreeder full-sib family;
- input in plain text files with a flexible layout;
- input also by pasting marker data copied from MS-Excel;
- also imports MAPMAKER raw data format (data types: f2 intercross, f2 backcross, ri self);
- easily check for genotype coding errors in Dataset node;
- several diagnostics, before and after the actual map calculation:
- test segregation distortion;
- check similarity of loci;
- check similarity of individuals;
- calculate genotype probabilities conditional on map and flanking genotypes to discover double recombinations;
- test heterogeneity of recombination estimates between different populations;
- powerful determination of linkage groups;
- four criteria to study the linkage group formation: independence LOD, independende test P-value, linkage LOD, recombination frequency;
- use existing maps (of multiple groups) or existing groupings to create the linkage groups of a new population; this is very useful when employing markers with known map positions in new populations, and also when expanding a map with an additional set of markers;
- use the so-called strongest cross link information to verify assignments of markers to groups;
- automatic determination of linkage phases for outbreeder full-sib family;
- very fast computation of high density maps with the ML mapping algorithm according to Jansen, et al, 2001. Constructing dense genetic linkage maps. TAG 102: 1113-1122, based on Monte Carlo (MC) Maximum Likelihood (ML);
- has since v4.1 the ability to use the ML mapping algorithm also on populations of type CP (outbreeder full-sib family); method according to: Van Ooijen, J.W. (2011). Multipoint maximum likelihood mapping in a full-sib family of an outbreeding species. Genetics Research (2011) 93, 5, 343-349;
- since v5 the Gibbs sampler for the estimation of multipoint recombination frequencies is replaced by the deterministic forward-backward EM algorithm, which is much faster and more accurate;
- the new ML mapping algorithm and the original regression mapping algorithm of JoinMap are available side by side;
- combine ('join') data derived from several sources into an integrated map;
- get an idea of plausible map positions of markers;
- graphical genotyping;
- no limits to the amount of loci, linkage groups, etcetera, apart from the physical memory (RAM) of the computer;
- bar and XY charts;
- map charts with many adjustable features;
- results and charts exportable to most MS-Windows text processing, presentation and spreadsheet software;
- export loc-files and maps for use in MapQTL;
- print preview;
- manual in Adobe ® PDF format;
- easy-to-use installer.
The facts that JoinMap v5 is 64-bit software and uses a database system make it suitable for working with thousands of loci. The software does not have a built-in fixed limit regarding the maximum number of loci, with the exception of the Dataset tabsheet which can accomodate a maximum of one million loci (although that amount was not tested). The software was tested to work fine with a dataset of 50,000 loci. Please note, however, that the production of a reliable high resolution linkage map does require a population of sufficient size containing the necessary segregation information.
With this version of JoinMap, the speed of the hard disk drive may become a limiting factor in dealing with large datasets. For instance, a dataset of 50,000 loci in an F2 population will produce a table of ~1.2 billion records of pairwise data that must be stored, resulting in a database file of over 30 GB in size, which will require some time to write.
Get an impression of the software with the slide show:
JoinMap ® 5 slide show (size: 1.3 MB).
If needed, support will be given to help you get the software running and solve problems not described in the manual. This support is limited to advice by e-mail to <support(at)kyazma.nl>. A list of frequently asked questions is presented at this web site.
The original version of JoinMap was published in 1993 in The Plant Journal by Piet Stam . Version 2.0 of JoinMap was presented at the Plant Genome III Conference, January 1995, San Diego, California, USA . JoinMap 3.0 was released in 2001, JoinMap 4 in 2006, JoinMap 4.1 in 2012, JoinMap 5 in 2019.References
- Stam, 1993. Construction of integrated genetic linkage maps by means of a new computer package: JoinMap. The Plant Journal 3: 739-744.
- Stam, 1995. JoinMap 2.0 deals with all types of plant mapping populations. Plant Genome III Abstracts.