Abstract
A general finescale Bayesian quantitative trait locus (QTL) mapping method for outcrossing species is presented. It is suitable for an analysis of complete and incomplete data from experimental designs of F_{2} families or backcrosses. The amount of genotyping of parents and grandparents is optional, as well as the assumption that the QTL alleles in the crossed lines are fixed. Grandparental origin indicators are used, but without forgetting the original genotype or allelic origin information. The method treats the number of QTL in the analyzed chromosome as a random variable and allows some QTL effects from other chromosomes to be taken into account in a composite interval mapping manner. A blockupdate of ordered genotypes (haplotypes) of the whole family is sampled once in each marker locus during every round of the Markov Chain Monte Carlo algorithm used in the numerical estimation. As a byproduct, the method gives the posterior distributions for linkage phases in the family and therefore it can also be used as a haplotyping algorithm. The Bayesian method is tested and compared with two frequentist methods using simulated data sets, considering two different parental crosses and three different levels of available parental information. The method is implemented as a software package and is freely available under the name Multimapper/outbred at URL http://www.rni.helsinki.fi/~mjs/.
INBRED line cross designs are routinely used for quantitative trait locus (QTL) mapping in experimental organisms, because then full heterozygosity and perfect coupling between alleles in the QTL and in nearby marker loci are found in all F_{1} individuals. Furthermore, the biallelic nature of the design suits well the tradition in genetics, where QTL are treated as biallelic and all different heterozygous QTL effects are considered jointly as a dominance effect. Depending on the organism, an attempt to produce inbred lines is not always practical or even possible (Haleyet al. 1994); then methods developed for outbred designs are to be used.
Presently, there are QTL mapping methods suitable for the analysis of outbred populations (for a review see Hoescheleet al. 1997) as well as general pedigrees (e.g., Heath 1997). However, an application of general pedigree analysis methods tends to be statistically inefficient and might actually not be possible for data arising from controlled outcrossing experiments. Therefore “designspecific” methods are needed. Their main advantages over general purpose pedigree methods are (1) incorporation of designspecific properties (such as a control of maximum number of possible QTL genotypes) into the analysis; and (2) background variation can be controlled by marker covariates, instead of using polygenic components or unlinked QTL.
When interval mapping, where a putative QTL is placed somewhere between markers, is applied to outbred offspring data, linkage phases (haplotypes) of parents must be considered. They are needed to determine whether paternally (or maternally) derived alleles at two neighboring loci are of the same grandparental origin or not. For a comparison, note that the grandparental line origin of alleles found in inbred linecross offspring is automatically known in all marker positions.
Haley et al. (1994) presented a QTL mapping method for outbred linecross data (F_{2}) concerning two divergent breeding populations, where a fixation of different influential QTL alleles in different grandparental lines was assumed. Their method requires genotyped grandparents to establish haplotypes for parents. They reduce the allelic space by using grandparental origin indicators instead of original marker alleles (see also Lander and Green 1987; Thompson 1994; Kruglyaket al. 1995; where a grandparental origin indicator is a binary digit in the inheritance vector). This work was extended to four segregating alleles by Knott et al. (1997). Somewhat earlier Maliepaard and Van Ooijen (1994) and Jansen (1996) presented more general algorithms for outcrossing experiments. Their methods assume neither a fixation of QTL in the crossed grandparental lines nor the availability of grandparental genotypes, but instead require known haplotypes for parents. The method of Jansen (1996) was recently generalized by Jansen et al. (1998) to more complex populations, where parental haplotypes were not required to be known in advance. In this method, the full genotypic and allelic origin information is considered in all founders, but only segregation indicators (i.e., grandparental origins) are used in nonfounders. No information is lost because the actual allelic forms for nonfounders can be traced from the pedigree by following each gene flow backward. Moreover, this treatment was shown to lead to more efficient mixing of the sampler than did methods in which the genotypes for nonfounders also are stored. The same idea was mentioned by Thompson (1994), and it was used by Sobel and Lange (1996) for descent graphs in pedigree analysis. Jansen et al. (1998) tested many different models.
Recently we presented a Bayesian QTL mapping method from incomplete inbred linecross data (Sillanpää and Arjas 1998). This article also contains numerous references to other Bayesian works on QTL mapping. In this framework, the number of influential QTL in the analyzed chromosome is treated as an unobserved random variable, and then the algorithmic ideas of Green (1995) are applied to deal with the varying dimension of the parameter space. We used an idea similar to composite interval mapping (Jansen 1993; Zeng 1993, 1994; Jansen and Stam 1994; Kao and Zeng 1997) to account for the influence of some QTL in other chromosomes. We also advocated the use of the posterior QTL intensity as a new probabilistic summary measure for the inference. Now we generalize this approach to cover also backcross and F_{2} (fullsib) offspring data, or multiple F_{2} families from outcrossing experiments. In the method, the assumption concerning the fixation of QTL alleles in the crossed lines, as well as the degree in which the haplotypes or genotypes in parents or in grandparents are known, are optional. The assumption concerning fixation of QTL, together with the design (BC or F_{2}), determines the maximal number of QTL genotypes that can segregate in a family structure. We assume that the offspring are at least partly genotyped and that corresponding quantitative phenotypic measurements from the trait are available. If the parents and/or grandparents are not genotyped, we use information from progeny to impute consistent multiple random haplotypes for the parents, following a Markov Chain Monte Carlo (MCMC) scheme. We also use grandparental origin indicators as in Haley et al. (1994), but the coding is redone for each haplotype arrangement (imputation) in parents. As a byproduct, this approach produces the linkagephase distributions for each offspring and their parents. Therefore it can also be used for haplotyping, in data with at least partially genotyped parents (see discussion).
If the F_{2} family sizes in the studied plant or animal organism are relatively small, one has to combine information from several families. A complication arising from family pooling is that there will then typically be a large number of founders and therefore possible QTL alleles in the data. (Note also that the applicability of marker covariates needs to be considered.) To keep the maximum number of QTL genotypes low (≤4) in the combined data, one can assume one of the following alternatives: (1) Grandparents in each family have been drawn from the same two gene pools (lines), in which case they all represent two different QTL alleles in each trait locus. (Fixation of different QTL alleles in these two lines has been assumed.) (2) All families to be combined are related and share the same two grandparents, i.e., all parents belong to the same F_{1} generation. (Fixation of different QTL alleles in the two lines is again assumed.) (3) All families in the (combined) data are related and share the same four grandparents (numbered from 1 to 4) in such a way that one parent in each family is always progeny of grandparents 1 and 2 and the other parent is always progeny of grandparents 3 and 4; parents descending from grandparents 1 and 4, or 2 and 3, are excluded. Fixation of different QTL alleles in all four grandparental lines and that these lines show somewhat different phenotypic values has been assumed. If these assumptions are met, the resulting offspring population will have four different QTL alleles segregating in each trait locus.
In the following, we focus mainly on data from a onefamily experiment. Our model is described next, followed by the results from simulation experiments and a discussion. In two appendixes, parameter estimation and summary measures for statistical inference are considered.
MODEL
We use the notation of Sillanpää and Arjas (1998) for the following entities: phenotype vector (y), the number of offspring individuals (N_{ind}), the number of QTL (N_{qtl}), QTL location vector (l), QTL genotype matrix (χ), the number of background controls (N_{bc}), incomplete and complete background control genotype information including parents (X_{o} and X *_{o}), the number of QTL genotypes (N_{gen}), QTL genotypic effect (regression coefficient) vectors (b_{1}, b_{2},..., b_{N}_{qtl}), genotypic effects for background controls (C), residual variance (σ^{2}), fixed marker map m, and consistency between complete and incomplete information (A* ∼ A).
Let I = (I_{i}) be the indicator vector, where element I_{i} = 1_{{yi observed}} takes the value one or zero depending on whether y_{i} is observed or not. Let H* and H be the corresponding complete and incomplete (observed) haplotype information (genotype + allelic origin information:paternal/maternal) in the marker positions. In each case, we indicate the split between maternally and paternally inherited haplotypes by writing H* = (H*^{F}, H*^{M}) and H = (H^{F}, H^{M}). Here H* and H are taken to be (N_{ind} + 2) × N matrices, where N is the number of markers in the considered chromosome. Note that incomplete haplotype information often covers complete genotypic information but not the allelic origin.
In the chosen experimental design, let α = (α_{1},..., α_{N}_{gen}) be the vector containing all possible QTL genotypes at any locus, so that their actual allelic forms are unknown. These QTL genotypes correspond to combinations of QTL alleles that were present in the crossed grandparents (founders) and that were transmitted to the F parents. Let
We consider the following composite interval mapping (Kao and Zeng 1997) model for y:
We use the shorthand notation δ = (b_{1},..., b_{N}_{qtl}, σ^{2}, ρ, C) and
The ingredients of the prior density (2) are specified as follows. Denote complete haplotype information at the marker positions of the ith offspring by
Let the complete background control marker information in parents F and M be
The prior distribution of the number of QTL is assumed to be truncated Poisson (see Sillanpää and Arjas 1998). For all QTL locations, we assume the uniform prior distribution on the considered chromosome. The prior for QTL genotype coefficients is assumed to be normal with zero mean and zero correlation, the variance being a hyperparameter specified by the analyst.
As in Sillanpää and Arjas (1998), we use the term object to represent any marker or QTL in the considered chromosome and the term flanking object (of the QTL q) to represent any combination of two entities [markers and/or QTL: 1,..., (q – 1)] having their loci closest to the QTL q. Now, denote by
The QTL analysis of the offspring is done in terms of parental haplotypes. The numbers of possible QTL alleles and QTL genotypes in BC and F_{2} designs are found in Table 1. Given the QTL genotype vector α = (α_{1},..., α_{N}_{gen}), the prior probabilities for s = 1,..., N_{gen} are calculated from the equation
SIMULATION ANALYSIS
To test the performance of this method, an outcrossing F_{2} population consisting of N_{ind} = 200 offspring was generated by a simulation program provided by J. W. Van Ooijen (Centre for Biometry Wageningen, CPRODLO, The Netherlands). We considered two 100cM long chromosomes, both having 11 evenly spaced markers, at every 10 cM. The simulated trait had a genetic (QTL) variance 4.47 and a phenotypic variance 6.35, resulting in heritability 0.7. Two sets of parental crosses were generated: In the first set the parental mating type was fully informative (AB × CD) at all marker loci, and in the second set the degree of informativeness, as well as the corresponding linkage phases, varied from locus to locus. The simulated true underlying parental cross in the second set is shown in Figure 3; it is underlying in the sense that after the simulation this information was “forgotten” and not used in the Bayesian analyses (as explained below). The genotypespecific phenotype effects and the locations of the three simulated QTL can be found from Table 2. All haplotypic assignments in the offspring were assumed unknown. In the statistical analyses, three specifications regarding the amount of parental information were considered: (1) All genotypes and haplotypic assignments in parents were assumed known; (2) all genotypes were assumed known but their phases unknown in parents; and (3) all parental and grandparental marker information was assumed unknown (missing). The performance of our method was compared to that of “allmarkers” interval mapping (IM; Maliepaard and Van Ooijen 1994) and to multiple QTL mapping with two background controls [MQM/02; both implemented in the MAPQTL program of Van Ooijen and Maliepaard 1996; MAPQTL (tm) version 3.0; CPRODLO, Wageningen, The Netherlands]. Note that in the IM and MQM methods the genotypes and the linkage phases in parents must be known.
In addition, the simulated data in which each QTL had four alleles were analyzed (in cases 1 and 3), having incorrectly assumed fixed grandparental lines (where grandfathers were assumed to originate from the same line). This was done to see how this erroneous assumption influences the results.
In all Bayesian analyses described here, our Cprogram implementing a MetropolisHastings chain was run 5,000,000 cycles in a Pentium II/266MHz computer. No values were deleted because of burnin, but the chain was thinned so that only every fifth iteration was saved, resulting in 1,000,000 sampled values for each parameter. After a preprocessing stage (see appendix a), background controls were chosen. When analyzing a real data set, they can be determined by a single marker regression or by performing several analyses. Here, however, we simply chose marker 3 in chromosome 1 and marker 4 in chromosome 2 as background controls. Very likely, a few reanalyses would have led to the same conclusion. As no covariates (age, sex, etc.) were used, there was a common intercept (ρ = a and B_{i} = 1 for all i). The running times, in circumstances where there was practically no other load in the computer, varied around 9 hr. The initial value for the number of QTL was three, and the corresponding locations were 20.0 cM, 50.0 cM, and 80.0 cM. The Poisson mean (hyperparameter) was set to λ = 2 and the maximum number of QTL (in the analyzed chromosome) to three. The residual standard deviation was chosen to be uniform over the range [0.0, 2.55], the right endpoint being equal to the phenotypic standard deviation estimate from the data. The prior of the intercept was taken to be uniform on [–13, 13], those of the QTL genotypic regression coefficients were independent normal distributions with mean zero and variance 100, and the prior of the background control genotypic regression coefficients was uniform on [–13, 13]. Finally, the prior of the QTL locations was uniform over [0, 100]. The control parameter values used in the final analyses are given in Table 3. The proposal distribution for the genotypic effects (coefficients) was chosen to be N(0, 0.5) in cases where the addition of a new QTL to the model was proposed.
In the IM and MQM/02 analyses, walking speed was set to 0.5 cM, which is the smallest admissible value in the MAPQTL software. We used the same background controls in MQM/02 as in the Bayesian analyses.
RESULTS
The Bayesian posterior QTL intensities (see appendix b) in chromosome 1, when all parental information was present (case 1) or when parental linkage phases were absent (case 2), are shown in Figure 4 (top) when all markers are fully informative, and Figure 5 (top) when marker information varies from marker to marker. The curves consisting of the pointwise medians and the 2.5 and 97.5% quantiles of the posterior distribution of the phenotypic effects of the four genotypes, as functions of the putative QTL location, are shown in the same figures when all parental information is present (left), or when parental linkage phases are unknown (right). Approximate posterior distributions of the number of QTL in chromosome 1, obtained from these four different analyses, are shown in Table 4. The analyses where all parental information was absent (case 3) are not summarized in figures or in tables. This is because in theory case 3 is not fully identifiable, resulting in probabilistic summary measures (the posterior QTL intensity and the posterior distribution of the number of QTL) that are not unique. These problems are described and considered more in the discussion.
Table 5 gives a brief summary of our findings concerning the localization of QTL as suggested by the QTL intensities in Figures 4 and 5. The table makes direct reference to (approximate) posterior probabilities that a particular chromosomal region
In the analyses where all markers were fully informative (Figure 4, top), the two posterior QTLintensity graphs (from cases 1 and 2) became nearly identical, regardless of whether parental linkage phase information was available or not. Both posterior QTLintensity graphs were nicely concentrated around the left QTL at 32.7 cM. The graphs surrounding the right (weaker) QTL at 58 cM were much wider, and there was also some bias to the left. However, the true simulated QTL is still inside the regions [41 cM, 60 cM] and [41 cM, 63 cM] of elevated posterior QTL intensities. In this case (Figure 5, top left), the MQM analysis performed well in both QTL localizations in chromosome 1, but the IM analysis managed to localize only the left QTL. (Note that the posterior QTLintensity graphs covering the regions [41 cM, 60 cM] and [41 cM, 63 cM] are multimodal. This is apparently the same phenomenon that is typical to the LODscore curve at marker points: often there is more evidence, because of marker genotyping, against placing a putative QTL exactly at a marker locus than against placing it somewhere nearby.) The graph leaves somewhat uncertain why, of the two modes, the one that is farther away from the true simulated QTL at 58 cM ended up being higher in the first case.
It can be seen from Figure 5 that the nonconstant marker information analysis (case 1) results in high posterior QTL intensities surrounding both simulated QTL in chromosome 1. The IM and MQM analyses localized quite well the “left” QTL at 32.7 cM, but localization of the “right” QTL at 58 cM was poor with both methods. Somewhat surprisingly, in the Bayesian method, the left, more influential, QTL was not localized as accurately as the right QTL when linkage phases were available in parents. This may be a consequence of the fact that there is a highly informative marker very close to the right QTL, whereas this is not the case with the left QTL (see Table 6). As could be expected, the localization was somewhat less accurate when the parental genotypes or their linkage phases were not available.
Consider next the estimation of the phenotypic effects, indicated by asterisks in Figures 4 and 5. As could be expected, the estimation was most successful in the case (displayed in Figure 4, left) where marker information was complete and where complete parental information was available. In the case of nonconstant marker information, but still assuming complete knowledge of the parental genotypes and linkage phases, the estimates were somewhat less accurate, with some of the true values being just outside the 95% credible boundaries (Figure 5, left). When analyzing real data, the true labeling [i.e., assigning of the QTL genotypes (13, 14, 23, 24) to the true grandparental alleles] of the phenotypic effects is almost always unknown (except for the QTL genes that have been positionally cloned). If parental genotype and/or linkage phase information are missing, the labeling of the genotypic effects according to the grandparental origin of the alleles also becomes nonunique in the simulated case. For this reason, when comparing the phenotypic effect estimates with the true values used in the simulation, we have to make sure that each estimate is matched correctly with a combination of two grandparental QTL alleles. Such reassignment of the QTL genotypes is indicated on the righthand side of Figures 4 and 5 by circles. In chromosome 1, note that the genotype labels are not consistent with each other in case 2.
The performance of the IM and MQM methods in the estimation of the phenotypic coefficients of the putative QTL was not particularly good. Moreover, they do not provide confidence intervals for such point estimates. Confidence intervals would have to be determined separately, for example, by employing bootstrap techniques.
The point estimates of QTL locations and their support regions are summarized in Table 7 for four different analyses of chromosome 1.
When considering chromosome 2 (which was analyzed only in cases 1 and 3), the posterior QTLintensity graphs (see Figure 6) were all nicely concentrated around the simulated true QTL at 41.2 cM, regardless of whether the markers were fully informative or not. Also, the IM and MQM methods were able to localize the QTL at 41.2 cM quite well.
The performance of the analyses (cases 1 and 3), when it was incorrectly assumed that the grandparental lines are fixed (pictures not shown), was quite poor in chromosome 1. The only exception was the case where all parental information was available and all markers were fully informative. Then the simulated QTL at 32.7 cM was localized rather well, and there was also some indication of QTL activity around the QTL at 58 cM. Assuming fixation in the situation where all markers were fully informative but where all parental information was absent, only the latter QTL resulted in a high (but broad) QTLintensity concentration.
DISCUSSION
We have presented here a Bayesian procedure for mapping multiple QTL from incomplete outbred offspring data, thus extending our earlier method (Sillanpää and Arjas 1998) to a more general experimental design. A test version of the software (written in C language) is available at http://www.rni.helsinki.fi/~mjs/. The method is capable of handling situations where marker information from parents and/or grandparents is missing in varying degrees, as well as cases where some of the marker information from the offspring is unavailable. In contrast to Sillanpää and Arjas (1998), the present model was not overparameterized, because this did not seem to improve the mixing properties of the sampler.
Following Sillanpää and Arjas (1998), we use the posterior QTL intensity as a probabilistic summary measure for the localization of QTL. During the MCMC sampling, we do not restrict the order of the QTL in any way to label them. If orderbased labeling is preferred, it can be established afterward from the MCMC realizations. This is an alternative to imposing constraints on the MCMC simulation as was done, e.g., in Satagopan et al. (1996), Satagopan and Yandell (1996), Richardson and Green (1997), and in Uimari and Hoeschele (1997).
We tested the performance of our method by using simulated F_{2} data sets (two informativeness levels), with varying degrees of parental marker information (three levels). It seems intuitively plausible, and it also became clear from our simulations, that the availability of parental linkage phase information is more important in the case where the markers are not fully informative. The situation where also a part of the offspring marker genotypes is missing was not considered in the test analyses.
Standardization of the phenotypic data is recommended before applying Bayesian QTL mapping in practice. Then the same proposal windows and other control parameters can be applied to different data sets, instead of performing separate test trials for each. Another advantage is that the numerical accuracy may be improved because computers' ability to store floating point numbers is maximal when dealing with numbers between zero and one.
The marker covariates can be chosen by an application of simple linear regression at each marker (putative QTL) position, omitting individuals whose genotype at that locus was unknown (because data augmentation would need linkage phase information). In doing so, one should pay attention to how much information a potential covariate marker carries and how many missing values there are. If an interesting region does not contain any fully informative markers, one can often find two closely linked markers such that each marker alone is informative only with respect to one (and a different) parent.
Parental mating type is usually not constant in outcrossing experiments. Thus a systematic application of some index describing the proportion of informative meioses locally present in the data will help the analyst to quantify the possibility of localizing a QTL in different areas of the considered chromosome. One such measure is displayed in Table 6. The influence of marker informativeness (cf. marker polymorphism in Kruglyak 1997) can be seen clearly from our simulation analysis (Table 6 and Figures 4 and 5) where, in the uninformative areas, intensity graphs are much more spread out, or even biased in some direction.
The phenotypic effects can be estimated reliably only in chromosomal regions in which the posterior QTL intensity is sufficiently high. As an alternative to the locationwise posterior densities for phenotypic effects shown in Figures 4 and 5, the posterior density can be constructed as an expectation over several pointwise values (of phenotypic effects), each being associated with a putative QTL location within a particular region of high posterior QTL intensity. One such posterior density is shown in Figure 7.
There appear to be two possible philosophies about how the indexing of QTL genotypes should be interpreted. Considering QTL genotype 13, for example, the first interpretation says that lines 1 and 3 are names for the parental haplotypes. In this case the remaining uncertainty concerning linkage phase is in how the grandparental alleles are assigned to these haplotypes. According to the second interpretation, lines 1 and 3 are names for the grandparental lines (alleles), and uncertainty is in the assignment of the parental haplotypes to these lines. Obviously, these two ways of thinking lead to different results only when there is some uncertainty in the parental linkage phases. We have adopted here the first interpretation, even though the second one is in some sense more fundamental in the context of QTL mapping.
We stress that in situations where all parental information is missing (case 3) it will be problematic to assign unique grandparental origins to the estimated phenotype effects. In this situation, both parents have symmetric pairs of haplotype configurations that are a posteriori equally likely to be the correct underlying mating structure. As a consequence, under these circumstances the correspondence between QTL genotypes (13, 14, 23, and 24; cf. Figure 1) and their grandparental alleles is not unique. In our program, the assignment can actually change from one iteration cycle to another within one MCMC run, let alone in different runs. (In practice such changes are rare because of the strong local dependence between offspring and their parents and between adjacent loci.) In case 3, the parental phase reconstruction can actually change suddenly in some region of the chromosome to a symmetrical mating type. (This can only be checked from the simulated data.) Also the resulting posterior QTLintensity curves can differ in such regions in different MCMC runs.
In cases 1 and 2, the very strong local dependency structure between parents and offspring and between adjacent loci will in practice prevent such phase transitions during the same MCMC run. Therefore, to avoid problems of this kind, we strongly recommend that at least one of the parents should be genotyped in several marker loci along the chromosome, as equidistant as is possible.
Locally, of course, if there is a fully informative (reference) marker, in case 3 we can also avoid such identifiability problems and averaging in estimation by fixing the assignments (segregation indicators) arbitrarily at the reference marker and then using the fact that, as long as the genetic distance from the marker is short, haplotype assignment can be made in a way that is with high probability consistent with that chosen at the reference marker locus. If this informative marker is near a contemplated QTL, this technique will also facilitate the estimation of the corresponding phenotypic effects, by keeping the four haplotypic assignments (and thus the corresponding QTL allele combinations) apart. A more negative aspect of this technique is that it works only locally, as simultaneous haplotype assignments at two or more marker positions might not agree with the true haplotype configuration. As a consequence, the estimation would need a new MCMC run for each such local assignment.
Acknowledgments
M.S. thanks Matti Taskinen for his advice in the programming work, and Päivi Hurme and Outi Savolainen for many useful discussions about the designs. We are grateful to Johan Van Ooijen for providing his simulation program, which was used to generate test data sets, and to Pekka Uimari and three anonymous referees for their constructive comments on the manuscript. This work was supported by a research grant (no. 38352) from the Academy of Finland, and by the ComBi Graduate School.
APPENDIX A: PREPROCESSING AND PARAMETER ESTIMATION
Before the actual statistical analysis, the data go through a preprocessing stage. In this process, we infer as much of the marker genotype and linkage phase information as is possible by direct logical deduction from known parts of the family structure. The deduction rules applied here (sequentially until there are no new assignments) are similar to the genotyping rules of Wijsman (1987). If grandparental genotypes are present, these deduction rules are first applied to the grandparents and parents, and then to the parents and offspring. In this process, sets of consistent parental mating types are determined for each marker (see below) and they are later repeatedly applied for the estimation.
Let us consider a multiallelic marker in the chromosome to be analyzed, where, after the logical deductions, the genotypes of the parents are still unknown. Further, consider the genotype or complete haplotype imputations for both parents by updating them one at a time. In such situations, when the genotype of one parent has been imputed, some offspring genotypes may in fact uniquely determine the genotype of the other parent. To avoid this and to make the sampler work more efficiently, genotypes of parents are considered jointly, and they have to form a pair that is consistent with the offspring genotypes. Therefore, we go through all possible allele combinations in parents, one at a time at each marker locus, and check whether any of them is inconsistent with the offspring genotypes. All inconsistent pairs are eliminated. In a backcross, one needs to check an additional consistency in genotypes of related parents.
Sometimes a blockupdate is preferred over a singlesiteupdate in MCMC applications to pedigrees (Kong 1991; Jansset al. 1995; Heath 1997; Jensen and Kong 1997). This is because a local dependence resulting from inheritance constraints can be so strong that the sampler in practice will be reducible during the available time if singlesite updating dynamics are used (see Sheehan and Thomas 1993; Linet al. 1994; Lin 1995; Jensen and Sheehan 1998). Even biallelic loci can be practically reducible in some designs; see Janss et al. (1995). Single outbred family (F_{2} design) with many offspring is an extreme example of this kind of strong dependence structure. Therefore, haplotypes for the entire family are updated as one block (Step 2 below) at each marker. (In some cases, due to the dependency between adjacent loci, good mixing properties of the sampler may be difficult to achieve, even when blockupdating is applied within one locus.)
In the following, we describe only those parts of the estimation algorithm that are different from those in Sillanpää and Arjas (1998; see also the graphical representation of the model therein):
Step 2. The following is repeated for each marker, j = 1,..., N: A new ordered genotype proposal (familyblock) at the jth position is constructed as follows:

If one or both genotypes in parents are unknown, a consistent pair of genotypes is proposed. Each consistent genotypepair is considered as equally likely.

If unknown, their allelic origins are also proposed considering each configuration as equally likely.

Incomplete offspring genotypes are completed by taking one allele (with equal transmission probabilities) from each parent. These transmissions simultaneously specify the allelic origins and the grandparental origins, which are then updated accordingly.

Unknown allelic origins of known offspring genotypes are determined by using deduction. Origins of a homozygote can be assigned randomly, and an offspring allele not found in one parent must originate from the other parent. If some origins are left uncertain, they are proposed with equal probabilities.

Grandparental origins are determined for offspring alleles having a heterozygous parent, but are randomly assigned for alleles inherited from homozygotes.
The familyblock proposal
Step 3. Random walk proposals for regression parameters are generated in three different blocks: (1) mean, environmental covariates, and residual standard deviation; (2) all QTL genotypic coefficients; and (3) all background control coefficients. Denote by L_{1} (L_{2}) the likelihood and by p_{1} (p_{2}) the normal density prior for the QTL genotypic coefficients evaluated at the new (old) values. The proposals are accepted separately for each block with probability min{1, L_{1} × p_{1}/(L_{2} × p_{2})}. If accepted, then δ^{(}^{t}^{)} = δ^{new}, and otherwise δ^{(t)} =δ^{(}^{t} ^{–} ^{1)}. (In block 3, the acceptance ratio is evaluated separately for each background control.)
Step 4. Imputation for the missing background control markers is done as in Sillanpää and Arjas (1998) except for the following: A consistent genotype pair is first proposed for the parents. Then all offspring with a missing genotype in the corresponding background control position are completed by sampling alleles according to Mendelian transmission probabilities.
APPENDIX B
As in Sillanpää and Arjas (1998), we divide the chromosome into bins Δ_{1}, Δ_{2},..., Δ_{N}_{bins}, where λ is the approximate posterior QTL intensity on interval Δ_{j}, obtained from the Monte Carlo simulation of N_{cycs} iteration cycles. In a backcross or an F_{2} intercross, let
Footnotes

Communicating editor: ZB. Zeng
 Received July 6, 1998.
 Accepted December 28, 1998.
 Copyright © 1999 by the Genetics Society of America