MFGL Logo
WHICHLOCI
A computer program for determining relative discriminatory power among candidate genetic loci

Increased information content from highly polymorphic molecular marker types such as microsatellites has markedly improved resolving power for discrimination among closely related populations.  This, together with increased automation of techniques for resolving genetic variation, results in an overall boon of new information.  Individual based methods for assigning most likely population origins of samples are among the new statistical techniques emerging to take advantage of this increased amount of information (Paetkau et al. 1995; Waser and Stroebeck 1998; Banks and Eichert 2000).WHICHLOCI maximizes population assignment accuracy through empiric analysis of data drawn from “real” populations.  Trial assignments using data from one locus at a time allows ranking of loci in terms of their efficiency for correct population assignment and conversely their propensity to cause false assignments.  Subsequent trials with increasing numbers of loci are then invoked to determine what minimum number of which specific loci are required to attain defined assignment accuracies set by the program user.  Overall, WHICHLOCI assesses which combination of loci would provide the greatest statistical power for population assignment.

Requirements

Program runs on Windows95, 98, 00 or NT (including Macintosh emulations of these operating systems) and has no specific hardware requirements.

Input File

The program requires data from populations under consideration listed either as genotypes per sample (in the same format used for GENEPOP (Raymond and Rousset 1995, http://www.cefe.cnrs-mop.fr/) or as allele frequencies per population (in the same format as allele frequency files created in WHICHRUN (Banks and Eichert 2000).  The program is written to analyze co-dominant as well as haploid data.

 

Download WHICHLOCI 1.0

Theory and Program Outline

A resample option allows creation of test data for all populations under consideration.  Computer generated random numbers  specify sampling from an allele table created from frequency data for each population.  This table consists of an array of alleles observed in each population, repeating each allele in accord with the frequency of each allele observed in any population.  The user defines how many samples to generate in this manner and has the option to vary sample size among populations. 

Optimum loci combinations that will match user-defined accuracy for population assignment are determined through two basic procedures.  First, repeated iterations for assignment of test data using the method applied in WHICHRUN (Banks and Eichert 2000) are performed employing data from each locus separately, scoring the number of correct assignments to appropriate source populations for each locus.  This score divided by the total possible number of correct assignments is then used to rank loci.  A second round of iterations invokes loci from this rank increasing the number of loci one at a time until the assignment score matches or exceeds accuracy criteria set by the user.  The above description covers procedure for accuracy considered across all populations.  An alternate, critical population, routine allows focus on accuracy for assignment to a specific population set by the user.  Iterations using data from each locus separately occurs as above but loci are scored only according to how many of the trial samples from the critical population are assigned correctly.  Also the number of samples which might originate from other populations but are falsely assigned to the critical population are tallied.  Rank order under the critical population routine is determined by applying the following formula:
 

     LocusScore = % correctly assigned - (% incorrectly assigned * scoreMultiplier), where: 

% correctly assigned = % of members of the critical population that were correctly assigned
% incorrectly assigned = # from other populations assigned to critical population / # from other populations

scoreMultiplier = (100 – User specified accuracy) / User specified inaccuracy 

This allows the user to weight correct assignment or misses according to how important accuracy or inaccuracy might be to the application at hand.  An allele frequency differential following methods described in Shriver et al. (1997) can also be implemented as an alternate means of ranking loci.  As above, a second round of iterations determines empirically how many of which loci are required to match accuracy criteria. 

 There has been increasing interest in the estimation of confidence intervals for assignment results from individual based methods.  Accuracy for this estimation is obviously closely linked to the accuracy of allele frequency information for populations under consideration and is addressed through ensuring that sample sizes among baseline populations matches estimates required in order to provide accurate allele frequency for polymorphic marker types (see Banks et al. 2000).  The issue of confidence interval estimation in the context of population assignment, however, becomes multidimensional given a comparison between alternate likelihoods that a sample may come from each of the populations under study.  The critical population method presented above provides a convenient means of summarizing these multidimensional likelihoods from the perspective of the critical population.  WHICHLOCI provides a means for creating multiple trial data sets.  Summary statistical parameters such as variance, standard deviation and standard error across results from each data set are determined following typical formulae (Sokal and Ralph 1995).  A sub-routine written in WHICHLOCI  allows users to bypass the loci ranking routine to determine assignment accuracy, variance, standard deviation and standard error for a user-selected bank of loci.

 We thus present an empirical method for determining which specific combination of loci would most likely provide defined population assignment power for individuals as well as statistical bounds on the performance of any particular group of loci.  We believe that this method will allow researchers to maximize power limits in focused population assignment contexts.

 

Authors

Michael A. Banks1, Will Eichert2 and J.B. Olsen3
 

 1Marine Fisheries Genetics Laboratory, Coastal Oregon Marine Experiment Station, Hatfield Marine Science Center, Oregon State University, 2030 SE Marine Science Drive, Newport,Oregon, 97365-5229,
2The Bodega Marine Laboratory, University of California at Davis, P.O.Box 247, Bodega Bay 94923-0247 and

3US Fish and Wildlife Service, Alaska Region, Conservation Genetics Laboratory, 1011 East Tudor Road,  Anchorage, Alaska 99503. 

Email:   Michael.Banks@oregonstate.edu
            WFeichert@ucdavis.edu

            Jeffrey_Olsen@fws.gov 

Note: This program is under review for Bioinformatics under the title:
                    Which Genetic Loci have Greater Population Assignment Power?

Thanks

Research and development of WHICHLOCI was supported by funds attained from CALFED and the California Department of Water Resources. 

References

Banks, M.A., Rashbrook, V.K., Calavetta, M.J., Dean, C.A. and Hedgecock, D. (2000)  Analysis of microsatellite DNA resolves genetic structure and diversity of chinook salmon in California’s Central Valley. CJ FAS 57:915-927.

Banks, M.A. and Eichert, W. (2000)  WHICHRUN (version 3.2): A computer program for population assignment of individuals based on multilocus genotype data. J. of Hered. 91:87-89.

Raymond, M. and Rousset, F. (1995) GENEPOP (Version 1.2): Population genetics software for exact tests and ecumenicism. J. of Hered. 86:248-250. 

Paetkau, D., Calvert, W., Stirling, I. and Strobeck, C. (1995) Microsatellite analysis of population structure in polar bears. Mol Ecol 4:347-354.

Shriver, M.D., Smith, M.W., Jin, L., Marcini, A., Akey, J.M., Deka, R. and Ferrell, R.E.  (1997) Ethnic-affiliation estimation by use of population-specific DNA markers.  Amer. J. Hum. Genet. 60:957-964.

Sokal, R.R. and Ralph, F.J. (1995) Biometry. San Francisco: W.H. Freeman 

Waser PM, and Strobeck, C. (1998) Genetic signatures of interpopulation dispersal. 
T. Ecol. Evol. 13:43-44.

 

[HOME]