GeneRecon specific scheme functions

Main (generecon) module

Extra modules

(generecon MS genotype)

Module containing functionality for manipulating micro-satellite genotype data.

(generecon MS haplotype)

Module containing functionality for manipulating micro-satellite haplotype data.

(generecon SNP genotype)

Module containing functionality for manipulating SNP genotype data.

(generecon SNP haplotype)

Module containing functionality for manipulating SNP haplotype data.

(generecon common)

Module containing functionality common to all input types and MCMC algorithms supported by GeneRecon.

Main (generecon) Module

affected/unaffected-genotype-set

Creates a set of genotypes, split in affected and unaffected individuals.

Prototype

(affected/unaffected region affected-list unaffected-list)

Example

(define reg
  (region (list
	   (marker 0.0 '(0.1 0.9))
	   (marker 0.1 '(0.2 0.8))
	   (marker 0.2 '(0.1 0.2 0.7)))))

(define affected
  (list (genotype reg (list '(0 . 1) '(0 . 0) '(1 . 0)))
        (genotype reg (list '(0 . 0) '(0 . 0) '(1 . 1)))
        (genotype reg (list '(0 . 0) '(0 . 0) '(1 . 2)))
        (genotype reg (list '(0 . 1) '(0 . 0) '(1 . 0)))
        (genotype reg (list '(0 . 1) '(0 . 0) '(1 . 1)))
        (genotype reg (list '(0 . 1) '(0 . 0) '(1 . 2)))))
(define unaffected
  (list (genotype reg (list '(0 . 1) '(1 . 1) '(1 . 0)))
        (genotype reg (list '(0 . 0) '(1 . 1) '(1 . 1)))
        (genotype reg (list '(0 . 0) '(1 . 1) '(1 . 2)))
        (genotype reg (list '(0 . 1) '(1 . 1) '(1 . 0)))
        (genotype reg (list '(0 . 1) '(1 . 1) '(1 . 1)))
        (genotype reg (list '(0 . 1) '(1 . 1) '(1 . 2)))))

(define au-genotype-set
  (affected/unaffected-genotype-set reg affected unaffected))

Details

Creates a set of genotypes, split in a set of affected genotypes and a set of unaffected genotypes. The function takes three arguments: the genomic region the genotypes are over, a list of affected genotypes, and a list of unaffected genotypes; the genotypes in the two lists must be build over the region that is the first argument.

affected/unaffected-haplotype-set

Creates a set of haplotypes, split in affected and unaffected individuals.

Prototype

(affected/unaffected region affected-list unaffected-list)

Example

(define reg
  (region (list
	   (marker 0.0 '(0.1 0.9))
	   (marker 0.1 '(0.2 0.8))
	   (marker 0.2 '(0.1 0.2 0.7)))))

(define affected
  (list (haplotype reg '(0 0 0))
        (haplotype reg '(0 0 1))
        (haplotype reg '(0 0 2))
        (haplotype reg '(1 0 0))
        (haplotype reg '(1 0 1))
        (haplotype reg '(1 0 2))))
(define unaffected
  (list (haplotype reg '(0 1 0))
        (haplotype reg '(0 1 1))
	(haplotype reg '(0 1 2))
        (haplotype reg '(1 1 0))
        (haplotype reg '(1 1 1))
        (haplotype reg '(1 1 2))))

(define au-haplotype-set
  (affected/unaffected-haplotype-set reg affected unaffected))

Details

Creates a set of haplotypes, split in a set of affected haplotypes and a set of unaffected haplotypes. The function takes three arguments: the genomic region the haplotypes are over, a list of affected haplotypes, and a list of unaffected haplotypes; the haplotypes in the two lists must be build over the region that is the first argument.

cluster

Build a set of haplotypes or genotypes, split in a part evaluated in the coalescent tree, the "mutation-cluster," and a part considered in a "null-cluster."

Prototype

(cluster dataset mutation-cluster-size [tree-builder])

Example

(define au-haplotype-set
  (affected/unaffected-haplotype-set reg affected-haplotypes unaffected-haplotypes))

(define au-genotype-set
  (affected/unaffected-genotype-set reg affected-genotypes unaffected-genotypes))

(define au-h-cluster (cluster au-haplotype-set 100))
(define au-g-cluster (cluster au-genotype-set  100))

Details

Build a set of haplotypes or genotypes, split in a part evaluated in the coalescent tree, the "mutation-cluster," and a part considered in a "null-cluster."

The first argument to the function is the data set to be analyzed, the kind of cluster that is built is determined by this argument.

The second argument determines the size of the mutation cluster, i.e. how many individuals will be included in the coalescent tree during the MCMC. Any individual not in the mutation cluster is considered to be in a null-cluster, and are assumed to have been selected from the background distribution of haplotypes/genotypes rather than be related in a tree.

If the data set provided to the cluster function is an affected/unaffected set, the mutation cluster is build from the affected individuals only. During the MCMC, only affected individuals can be moved between the two clusters, unaffected will always be considered part of the null-cluster.

An optional third argument is a symbol that determines the algorithm used to build the tree in the cluster. Supported algorithms are: 'random-tree (a random topology), 'distance-tree (a tree build using a distance method), and 'weighted-distance-tree (a distance method algorithm that weights differences at markers close to a locus more than differences farther away).

If the 'weighted-distance-tree tree building method is chosen, an additional parameter is expected, the position to weight relateive to.

distance-tree

Builds a coalescent tree from a set of haplotypes, based on the distance between the haplotypes.

Prototype

(distance-tree haplotype-set)

Example

(distance-tree au-haplotype-set)
(distance-tree au-genotype-set)

Details

Builds a random coalescent tree from a haplotype set, using the distance between the haplotypes to determine the topology. The set can be either a haplotype set or a genotype set, build using affected/unaffected-haplotype-set or affected/unaffected-genotype-set respectively.

genotype

Creates a genotype object from a list of allele-pairs.

Prototype

(genotype region allele-pairs

Example

(define reg (region (list
		     (marker 0.0 '(0.1 0.9))
		     (marker 0.1 '(0.2 0.8))
		     (marker 0.2 '(0.1 0.2 0.7)))))

(genotype reg (list '(0 . 1) '(1 . 0) '(2 . 1)))

Details

Creates a genotype object from a region and a list of allele pairs.

The allele pairs are given as a list of pairs, where there must be a pair for each marker in the region and where the two alleles in the pairs must be valid alleles for the markers, i.e. between 0 and one minus the lenght of the marker's frequency list.

haplotype

Creates a haplootype object from a list of alleles.

Prototype

(haplotype region allele-list)

Example

(define reg (region (list
		     (marker 0.0 '(0.1 0.9))
		     (marker 0.1 '(0.2 0.8))
		     (marker 0.2 '(0.1 0.2 0.7)))))

(haplotype reg '(0 1 2))

Details

Creates a haplotype object from a region and a list of alleles.

The alleles are given as a list, where there must be an allele for each marker in the region and where the alleles pairs must be valid alleles for the corresponding markers, i.e. between 0 and one minus the lenght of the marker's frequency list.

marker

Creates a marker object from a position and a list of frequencies

Prototype

(marker position frequencies)

Example

(marker 0.1 '(0.2 0.8))

Details

Creates a marker object from a position and a list of frequencies. The position must be a positive real number and the frequencies a list of real numbers in the range [0,1] that sums to 1. The frequency list must contain at least two elements.

The alleles at the created marker are afterward refered to using indices from 0 to one minus the length of the allele frequencies. The allele identified by 0 is the allele with frequency (car freq-list)---i.e. the first element in the frequency list, the allele identified by 1 is the allele with frequency (car (cdr freq-list))---i.e. the second element in the frequency list, and so forth.

markov-chain

Creates a Markov chain object from a list of tables.

Prototype

(markov-chain . tables)

Example

(markov-chain
        (list (list 0.1 0.4) (list 0.3 0.3))
	(list (list 0.1 0.4) (list 0.2 0.3))) 

Details

FIXME

parameter-set

Build a parameter set for the MCMC calculation.

Prototype

(parameter-set parameters)
(parameter-set region position population-size data-set tree)
(parameter-set region position population-size cluster)

Example

(define au-h-tree (affected/unaffected-random-tree au-haplotype-set))
(define au-g-tree (affected/unaffected-random-tree au-genotype-set))

(define initial-pos      0.12)
(define initial-pop-size 1000)

(define h-ps
  (parameter-set reg 
		 initial-pos initial-pop-size 
		 au-haplotype-set au-h-tree))

(define g-ps
  (parameter-set reg 
		 initial-pos initial-pop-size 
		 au-genotype-set au-g-tree))

(define au-h-cluster (cluster au-haplotype-set 100))
(define au-g-cluster (cluster au-genotype-set  100))

(define h-c-ps
  (parameter-set reg initial-pos initial-pop-size au-h-cluster))
(define g-c-ps
  (parameter-set reg initial-pos initial-pop-size au-g-cluster))

Details

Build a parameter set for the MCMC algorithm. Different kinds of parameter sets are build dependent on the arguments to this function.

The first argument to the function is the genomic region being analyzed. The second argument is the initial population size. The third argument is either a data-set built using affected/unaffected-haplotype-set or affected/unaffected-genotype-set or a cluster built with the cluster function. If a data set is given, the next parameter is a corresponding tree.

random-tree

Builds a random coalescent tree from an affected/unaffected set.

Prototype

(random-tree au-set)

Example

(random-tree au-haplotype-set)
(random-tree au-genotype-set)

Details

Builds a random coalescent tree from an affected/unaffected set. The set can be either a haplotype set or a genotype set, build using affected/unaffected-haplotype-set or affected/unaffected-genotype-set respectively.

region

Creates a genomic region from a list of markers.

Prototype

(region kappa mu marker-list)

Example

(region kappa mu
        (list
         (marker 0.0 '(0.1 0.9))
         (marker 0.1 '(0.2 0.8))
         (marker 0.2 '(0.1 0.2 0.7))))

Details

Constructs a region from a recombination rate (kappa), a mutation rate (mu), and a list of markers. The region can afterward be used to create haplotypes or genotypes.

The markers in the region must all be on distinct positions. After the region is created, the markers are refered to by their order with relation to their position not the order of the input list given to region. This means that, for instance, when creating a haplotype, the order of the alleles given to the haplotype function must match the order of the markers on the region, ordered by their position.

An optional final argument, a Markov chain for modelling the background haplotypes can be provided. The tables in this Markov chain must match the markers starting from index 1. The ordering must be the sorted ordering of the markers, not any other they might be provided in for this function.

run-mcmc

Run the MCMC algorithm on a parameter set.

Prototype

(run-mcmc parameter-set sampler no-iterations)

Example

(define ps
  (parameter-set reg 
		 initial-pos initial-pop-size 
		 au-genotype-set au-g-tree
		 rho kappa mu))

(define s (sampler (list '(disease-locus 10)
			 '(likelihood 10))))

(run-mcmc ps s 100000)

Details

Run the MCMC algorithm on a parameter set. The parameter set is any parameter set created by the parameter-set function. The appropriate MCMC algorithm will be selected based on the parameter set.

The first argument to the function is the parameter set for the MCMC run. The second argument is a sampler object, determining how parameters should be sampled. The third argument determines the number of iterations to run.

sampler

Construct the sampling hooks for the MCMC iteration.

Prototype

(sampler hook-list)

Example

(sampler (list '(disease-locus 100)
               '(likelihood 100 "likelihood.out")
               '(population-size 100 "population-size.out")
               '(coalescent-tree 1000 "tree.out")))

Details

Construct the sampling hooks for the MCMC iteration.

The sampler is constructed from a list of hooks. The hooks are lists where the first element is a symbol that determines what is to be sampled and the second determines how often that parameter should be sampled. An optional third parameter specifies the filename to write the sampled values to.

The supported values to sample are:

likelihood
The likelihood of the current parameters.
likelihood-curve
the likelihood over the region (the likelihood when varying the locus but keeping all other parameters fixed).
tree-prior
The contribution to the likelihood from the tree prior.
tree-likelihood
The contribution to the likelihood from the tree.
background-likelihood
The contribution to the likelihood from the haplotypes without parents in the tree.
disease-locus
The current disease locus.
population-size
The current population size.
coalescent-tree
The current coalescent tree.
coalescent-tree-height
The height of the current coalescent tree.
coalescent-tree-connectedness
The connectedness (number of nodes with parent-bit set, over number of nodes with parent bit not set) of the current coalescent tree.
mutation-cluster
The haplotypes or genotypes in the mutation cluster.

set-mcmc-option

Set a global MCMC option

Prototype

(set-mcmc-option option value)

Example

(set-mcmc-option 'max-pop-size-change 50)
(set-mcmc-option 'max-allele-freq-change 0.25)

Details

Sets an MCMC option. The options are options to the MCMC runs that have reasonable default values and thus need not be explicitly set in most cases. They are therefore not used as parameters to the MCMC functions, but can be set using this function.

The supported options are:

max-locus-change

Maximum change of the disease locus.

A fraction of the total region size the locus is allowed to move in one step; i.e. if the region has size 2 and max-locus-change is 0.5, the locus can move at most 1.0 in each move.

The default value is 0.3.

min-locus-marker-dist

The closes the disease locus is allowed to be to a marker.

The minimal distance between the disease locus and any marker allowed. If this is zero, the disease locus can be placed on a marker, if it is 0.1 it can never be closer than 0.1 and so on.

By default, 10% of the total region size is remove by making areas around the markers where the locus cannot be placed.

max-allele-freq-change

The maximal change of allele frequencies allowed.

The maximal amount the frequency for a single allele at a marker can change in a single step.

By default, 100% -- i.e. the frequencies can change arbitrary (within the bound that they should be frequencies and sum to 1).

min-pop-size

Minimal population size.

The minimal value the population size is allowed to reach. By default 500.

max-pop-size

Maximal population size.

The maximal value the population size is allowed to reach. By default 100000.

max-pop-size-change

Maximal population size change.

The maximal amount the population size is allowed to change in a single step. By default 1000.

max-waiting-time-change

Maximal population size change.

The maximal amount the waiting time in a coalescent tree can change. By default 0.4.

mcmc-temperature

Temperature for accepting more proposed changes.

The acceptance probability is normally exp(L'-L) where L' is the log-likelihood of the proposed locus and L the log-likelihood of the current position. With a temperature, it will instead be exp((L'-L)/temp). To get a "flatter" curve, use a temp>1.

By default this is 1.0, corresponding to no temperature.

min-recomb-dist

The smalleste interval that can contain a recombination.

The likelihood for the locus can get very low around markers (for some choices of recombination and mutation rates) which results in poor mixing. This variable will set a minimal distance to use to calculate the probability of no recombination between two points (shorter intervals will be set to this point); setting it to a value above 0 (the default value) will smooth out the likelihood around markers.

set-mcmc-weight

Set the weight of a change proposal

Prototype

(set-mcmc-weight change weight)

Example

(set-mcmc-weight 'locus-change 10)

Details

A function for explicitly setting the weighting of the proposed changes. By default they are initialized with default values (mainly) taken from Morris et al. 2002.

The supported changes are:

locus-change

Weight for changing the disease locus

The default value is 1.

allele-frequency-change

Weight for changing the allele frequency of markers.

The default value is the number of markers.

population-size-change

Weight for changing the effective population size.

The default value is 1.

missing-allele-change

Weight for changing the allele at a site where the allele is unknown.

The default value is the number of such sites.

genotype-phase-change

Weight for changing the phase of a genotype.

The default value is the number of heterozygote sites.

branch-position-change

Weight for changing the tree topology by moving a branch.

Default is the number of branches: 2*(no-leaves - 1).

parent-bit-change

Weight for changing the parent bit.

Default is the number of branches: 2*(no-leaves - 1).

event-ordering-change

Weight for changing the ordering of events.

Default is no-leaves - 2.

waiting-time-change

Weight for changing a waiting time.

Default is the number of waiting times: no-leaves - 1.

clustering-change

Weight for changing the mutation cluster.

Default is the size of the cluster.

weighted-distance-tree

Builds a coalescent tree from a set of haplotypes, based on the distance between the haplotypes, weighting alleles on markers close to the disease locus higher than markers farther away.

Prototype

(weighted-distance-tree haplotype-set locus)

Example

(weighted-distance-tree au-haplotype-set locus)
(distance-tree au-genotype-set locus)

Details

Builds a random coalescent tree from a haplotype set, using the distance between the haplotypes to determine the topology, weighting alleles on markers close to the disease locus higher than markers farther away. The first parameter is a set of haplotypes or genotypes, build using affected/unaffected-haplotype-set or affected/unaffected-genotype-set respectively. The second parameter is the locus used to weight the distance.

(generecon MS genotype)

Module containing functionality for manipulating micro-satellite genotype data.

calc-frequencies

Calculates the allele frequencies for each marker in the parameter lists.

Prototype

 (calc-frequencies known-allele-lists . list-of-list-of-allele-lists)

Example

 (use-modules (generecon MS genotype))
 (define known-allele-lists (collect-alleles-lists genotypes-1 genotypes-2))
 (calc-frequencies known-allele-lists genotypes-1 genotypes-2)

Details

Calculates the allele frequencies for each marker in the parameter lists.

Takes as input a number of lists of lists of micro-satellite alles and calculates a list of frequencies for each allele in each position in the lists, discarting missing-data (value -1).

The first argument is a list of alleles for each marker, the allowed alleles on each marker, and is used to order the frequencies; the frequency lists are ordered with relation to the order the alleles appear in in the known-allele-list.

The input need not consist of a single list of allels but can be any number of such lists, e.g. (calc-frequencies known-allele-lists genotypes-1 genotypes-2).

collect-alleles

Collect the distinct alleles from a list of allele pairs.

Prototype

 (collect-alleles allele-pairs)

Example

 (use-modules (generecon MS genotype))
 (define distinct-alleles (collect-alleles '((1 . 23) (43 . 2) (1 . 34) (23 . 43)))) 

Details

Collect the distinct alleles from a list of allele-pairs and return them sorted.

collect-alleles-lists

Collect a list of the the distinct alleles in each column of the input.

Prototype

 (collect-alleles-lists . list-of-list-of-allele-pairs)

Example

 (use-modules (generecon MS genotype))
 (define distinct-alleles-lists 
    (collect-alleles-lists (list '((1 . 1) (34 . 23))
                                 '((1 . 3) (14 . 23))
                                 '((2 . 4) ( 2 . 65))))) 

Details

Collect a list of the the distinct alleles in each colum of the input. That is, the input lists must consist of lists of equal length, and the distinct alleles on each index are collected and listed.

The input need not consist of a single list but can be any number of lists.

genotype-list->genotype-list

Translates a list of list of alleles into a list of genotype objects.

Prototype

 (genotype-list->genotype-list region index-tables genotype-list) 

Example

 (use-modules (generecon MS genotype))
 (define reg (region kappa mu markers))
 (define genotype-list (read-genotype-data file))
 (define index-tables (make-index-tables genotype-list))
 (genotype-list->genotype-list reg index-tables genotype-list) 

Details

Translates a list of list of allels into a list of genotype objects.

This function takes a region, `reg', a list of mappings from alleles to frequency indices in the region, `index-tables', and a list of lists of alleles, `genotype-list', and translate the allele lists into genotype objects over the region.

make-index-table

Make a table of alleles to indices.

Prototype

 (make-index-table allele-pairs)

Example

 (use-modules (generecon MS genotype))
 (define table (make-index-table '((1 . 23) (43 . 2) (1 . 34) (23 . 43)))) 

Details

Make a table of alleles to indices, such that each distinct allele is mapped to its position in the sorted list of distinct alleles.

make-index-tables

Makes a list of alleles->indices tables.

Prototype

 (make-index-tables list-of-list-of-allele-lists)

Example

 (use-modules (generecon MS genotype))
 (define genotype-list (read-genotype-data file))
 (define tables (make-index-tables genotype-list)) 

Details

Makes a list of alleles->index mappings.

From a list of allele-lists, this function makes a table for each marker, mapping the alleles at this marker to indices, such that the numerically least allele is at the first index (0) and the numerically largest allele at the largest index.

The input need not consist of a single list of allels but can be any number of such lists, e.g. (make-index-tables allele-list-1 allele-list-2).

read-affected/unaffected-data

Read affected and unaffected genotypes and calculate their allele frequencies.

Prototype

 (read-affected/unaffected-data affected-file unaffected-file) 

Example

 (use-modules (generecon MS genotype))
 (read-affected/unaffected-data af-filename uaf-filename) 

Details

Read affected and unaffected genotypes and calculate their allele frequencies.

Read a list of affected genotypes from `affected-file' and a list of unaffected genotypes from `unaffected-file' and return them (as lists of lists of alleles) together with a list of the frequencies of each allele at each position. For the frequency list, only the unaffected genotypes are considered. The frequencies in the frequency list are sorted with relation to the numerical value of the alleles (least allele first) and missing data (-1) is ignored in the frequency calculations.

read-distances-affected/unaffected-markers

Read marker distances and affected and unaffected genotypes, calculate their allele frequencies, and build markers from it.

Prototype

 (read-distances-affected/unaffected-markers kappa mu
                                             position-file
                                             affected-file
                                             unaffected-file) 

Example

 (use-modules (generecon MS genotype))
 (read-distances-affected/unaffected-markers 0.01 1e-5 dist-file af-filename uaf-filename) 

Details

Read marker distances and affected and unaffected genotypes, calculate their allele frequencies, and build markers from it.

Read a list of marker-distances from `position-file', a list of affected genotypes from `affected-file' and a list of unaffected genotypes from `unaffected-file'; calculate the allele frequencies in the controls and use these frequencies to construct a list of affected and a list of unaffected markers. The constructed region and the pair of marker-sets is returned as a list

The frequencies in the frequency list are sorted with relation to the numerical value of the alleles (least allele first) and missing data (-1) is ignored in the frequency calculations. The alleles in the created genotypes are re-mapped as indices into the frequency lists.

The first two parameters, kappa and mu, are the recombination rate and mutation rate, respectively. These are used for making the region for the markers.

read-genotype-data

Reads a list of genotypes from a file.

Prototype

 (read-genotype-data file)

Example

 (use-modules (generecon MS genotype))
 (read-genotype-data filename)

Details

Reads a list of genotypes from a file.

The file must contains lines of white-space separated lists of micro-satellite alleles (non-negative numbers or -1, where -1 indicate missing data). Each line is interpreted as one genotype, and all lines must have the same number of alleles.

The function returns the parsed genotypes as a list of lists of alleles. This list can be translated into a list of genotype objects using `genotype-list->genotype-list'.

read-positions-affected/unaffected-markers

Read marker positions and affected and unaffected genotypes, calculate their allele frequencies, and build markers from it.

Prototype

 (read-positions-affected/unaffected-markers kappa mu
                                             position-file
                                             affected-file
                                             unaffected-file) 

Example

 (use-modules (generecon MS genotype))
 (read-positions-affected/unaffected-markers 0.01 1e-5 pos-file af-filename uaf-filename) 

Details

Read marker positions and affected and unaffected genotypes, calculate their allele frequencies, and build markers from it.

Read a list of marker-positions from `position-file', a list of affected genotypes from `affected-file' and a list of unaffected genotypes from `unaffected-file'; calculate the allele frequencies in the controls and use these frequencies to construct a list of affected and a list of unaffected markers. The constructed region and the pair of marker-sets is returned as a list

The frequencies in the frequency list are sorted with relation to the numerical value of the alleles (least allele first) and missing data (-1) is ignored in the frequency calculations. The alleles in the created genotypes are re-mapped as indices into the frequency lists.

The first two parameters, kappa and mu, are the recombination rate and mutation rate, respectively. These are used for making the region for the markers.

remap-genotype

Remaps the alleles in a genotype into indices in a frequency list.

Prototype

 (remap-genotype index-tables genotype)

Example

 (use-modules (generecon MS genotype))
 (define genotype-list (read-genotype-data file))
 (define tables (make-index-tables genotype-list))
 (map (lambda (h) (remap-genotype tables h)) genotype-list) 

Details

Remaps the alleles in a genotype into indices in a frequency list.

From a list of allele-lists, this function makes a table for each marker, mapping the alleles at this marker to indices, such that the numerically least allele is at the first index (0) and the numerically largest allele at the largest index.

The input need not consist of a single list of allels but can be any number of such lists, e.g. (make-index-tables allele-pairs-1 allele-pairs-2).

(generecon MS haplotype)

Module containing functionality for manipulating micro-satellite haplotype data.

calc-frequencies

Calculates the allele frequencies for each marker in the parameter lists.

Prototype

 (calc-frequencies known-allele-lists . list-of-list-of-allele-lists)

Example

 (use-modules (generecon MS haplotype))
 (define known-allele-lists (collect-alleles-lists haplotypes-1 haplotypes-2))
 (calc-frequencies known-allele-lists haplotypes-1 haplotypes-2)

Details

Calculates the allele frequencies for each marker in the parameter lists.

Takes as input a number of lists of lists of micro-satellite alles and calculates a list of frequencies for each allele in each position in the lists, discarting missing-data (value -1).

The first argument is a list of alleles for each marker, the allowed alleles on each marker, and is used to order the frequencies; the frequency lists are ordered with relation to the order the alleles appear in in the known-allele-list.

The input need not consist of a single list of allels but can be any number of such lists, e.g. (calc-frequencies known-allele-lists haplotypes-1 haplotypes-2).

collect-alleles

Collect the distinct alleles from a list of alleles.

Prototype

 (collect-alleles alleles)

Example

 (use-modules (generecon MS haplotype))
 (define distinct-alleles (collect-alleles '(1 23 43 2 1 34 23 43))) 

Details

Collect the distinct alleles from a list of alleles and return them sorted.

collect-alleles-lists

Collect a list of the the distinct alleles in each column of the input.

Prototype

 (collect-alleles-lists . list-of-list-of-allele-lists)

Example

 (use-modules (generecon MS haplotype))
 (define distinct-alleles-lists 
    (collect-alleles-lists (list '(1 1 34 23 43)
                                 '(1 3 14 23  3)
                                 '(2 4  2 65  0)))) 

Details

Collect a list of the the distinct alleles in each colum of the input. That is, the input lists must consist of lists of equal length, and the distinct alleles on each index are collected and listed.

The input need not consist of a single list but can be any number of lists.

haplotype-list->haplotype-list

Translates a list of list of alleles into a list of haplotype objects.

Prototype

 (haplotype-list->haplotype-list region index-tables haplotype-list) 

Example

 (use-modules (generecon MS haplotype))
 (define reg (region kappa mu markers))
 (define haplotype-list (read-haplotype-data file))
 (define index-tables (make-index-tables haplotype-list))
 (haplotype-list->haplotype-list reg index-tables haplotype-list) 

Details

Translates a list of list of allels into a list of haplotype objects.

This function takes a region, `reg', a list of mappings from alleles to frequency indices in the region, `index-tables', and a list of lists of alleles, `haplotype-list', and translate the allele lists into haplotype objects over the region.

make-index-table

Make a table of alleles to indices.

Prototype

 (make-index-table alleles)

Example

 (use-modules (generecon MS haplotype))
 (define table (make-index-table '(1 23 43 2 1 34 23 43))) 

Details

Make a table of alleles to indices, such that each distinct allele is mapped to its position in the sorted list of distinct alleles.

make-index-tables

Makes a list of alleles->indices tables.

Prototype

 (make-index-tables list-of-list-of-allele-lists)

Example

 (use-modules (generecon MS haplotype))
 (define haplotype-list (read-haplotype-data file))
 (define tables (make-index-tables haplotype-list)) 

Details

Makes a list of alleles->index mappings.

From a list of allele-lists, this function makes a table for each marker, mapping the alleles at this marker to indices, such that the numerically least allele is at the first index (0) and the numerically largest allele at the largest index.

The input need not consist of a single list of allels but can be any number of such lists, e.g. (make-index-tables allele-list-1 allele-list-2).

read-affected/unaffected-data

Read affected and unaffected haplotypes and calculate their allele frequencies.

Prototype

 (read-affected/unaffected-data affected-file unaffected-file) 

Example

 (use-modules (generecon MS haplotype))
 (read-affected/unaffected-data af-filename uaf-filename) 

Details

Read affected and unaffected haplotypes and calculate their allele frequencies.

Read a list of affected haplotypes from `affected-file' and a list of unaffected haplotypes from `unaffected-file' and return them (as lists of lists of alleles) together with a list of the frequencies of each allele at each position. For the frequency list, only the unaffected haplotypes are considered. The frequencies in the frequency list are sorted with relation to the numerical value of the alleles (least allele first) and missing data (-1) is ignored in the frequency calculations.

read-distances-affected/unaffected-markers

Read marker distances and affected and unaffected haplotypes, calculate their allele frequencies, and build markers from it.

Prototype

 (read-distances-affected/unaffected-markers kappa mu
                                             position-file
                                             affected-file
                                             unaffected-file) 

Example

 (use-modules (generecon MS haplotype))
 (read-distances-affected/unaffected-markers 0.01 1e-5 pos-file af-filename uaf-filename) 

Details

Read marker distances and affected and unaffected haplotypes, calculate their allele frequencies, and build markers from it.

Read a list of marker-distances from `distances-file', a list of affected haplotypes from `affected-file' and a list of unaffected haplotypes from `unaffected-file'; calculate the allele frequencies in the controls and use these frequencies to construct a list of affected and a list of unaffected markers. The constructed region and the pair of marker-sets is returned as a list

The frequencies in the frequency list are sorted with relation to the numerical value of the alleles (least allele first) and missing data (-1) is ignored in the frequency calculations. The alleles in the created haplotypes are re-mapped as indices into the frequency lists.

The first two parameters, kappa and mu, are the recombination rate and mutation rate, respectively. These are used for making the region for the markers.

read-haplotype-data

Reads a list of haplotypes from a file.

Prototype

 (read-haplotype-data file)

Example

 (use-modules (generecon MS haplotype))
 (read-haplotype-data filename)

Details

Reads a list of haplotypes from a file.

The file must contains lines of white-space separated lists of micro-satellite alleles (non-negative numbers or -1, where -1 indicate missing data). Each line is interpreted as one haplotype, and all lines must have the same number of alleles.

The function returns the parsed haplotypes as a list of lists of alleles. This list can be translated into a list of haplotype objects using `haplotype-list->haplotype-list'.

read-positions-affected/unaffected-markers

Read marker positions and affected and unaffected haplotypes, calculate their allele frequencies, and build markers from it.

Prototype

 (read-positions-affected/unaffected-markers kappa mu
                                             position-file
                                             affected-file
                                             unaffected-file) 

Example

 (use-modules (generecon MS haplotype))
 (read-positions-affected/unaffected-markers 0.01 1e-5 pos-file af-filename uaf-filename) 

Details

Read marker positions and affected and unaffected haplotypes, calculate their allele frequencies, and build markers from it.

Read a list of marker-positions from `position-file', a list of affected haplotypes from `affected-file' and a list of unaffected haplotypes from `unaffected-file'; calculate the allele frequencies in the controls and use these frequencies to construct a list of affected and a list of unaffected markers. The constructed region and the pair of marker-sets is returned as a list

The frequencies in the frequency list are sorted with relation to the numerical value of the alleles (least allele first) and missing data (-1) is ignored in the frequency calculations. The alleles in the created haplotypes are re-mapped as indices into the frequency lists.

The first two parameters, kappa and mu, are the recombination rate and mutation rate, respectively. These are used for making the region for the markers.

remap-haplotype

Remaps the alleles in a haplotype into indices in a frequency list.

Prototype

 (remap-haplotype index-tables haplotype)

Example

 (use-modules (generecon MS haplotype))
 (define haplotype-list (read-haplotype-data file))
 (define tables (make-index-tables haplotype-list))
 (map (lambda (h) (remap-haplotype tables h)) haplotype-list) 

Details

Remaps the alleles in a haplotype into indices in a frequency list.

From a list of allele-lists, this function makes a table for each marker, mapping the alleles at this marker to indices, such that the numerically least allele is at the first index (0) and the numerically largest allele at the largest index.

The input need not consist of a single list of allels but can be any number of such lists, e.g. (make-index-tables allele-list-1 allele-list-2).

(generecon SNP genotype)

Module containing functionality for manipulating SNP genotype data.

calc-frequencies

Calculates the allele frequencies for each marker in the parameter lists.

Prototype

 (calc-frequencies . list-of-list-of-allele-lists)

Example

 (use-modules (generecon SNP genotype))
 (calc-frequencies affected unaffected)

Details

Calculates the allele frequencies for each marker in the parameter lists.

Takes as input a number of lists of lists of SNPs and calculates a list of frequencies for 0 and 1 in each position in the lists, discarting missing-data SNPs (value -1).

For example, (calc-frequencies (list '(0 1 2) '(0 1 1) '(-1 0 0))) will evaluate to a list of three lists, one for each position in the input set: ((1 0) (0.33 0.66) (0.5 0.5)), indicating that on the first postion, all SNPs are 0 (since -1 is not counted), on the second position one third is 0 and two thirds are 1, and on the third position one half 0 and one half is 1, since 0 counts as double 0, 1 counts as double 1, and 2 counts as one 0 and one 1.

The input need not consist of a single list of SNPs but can be any number of such lists, e.g. (calc-frequencies affected-haplotypes unaffected-haplotypes).

genotype-list->genotype-list

Translates a list of list of SNPs into a list of genotype objects.

Prototype

 (genotype-list->haplotype-list region genotype-list) 

Example

 (use-modules (generecon SNP genotype))
 (define reg (region kappa mu markers))
 (define genotype-list (read-genotype-data file))
 (genotype-list->genotype-list reg genotype-list) 

Details

Translates a list of list of SNPs into a list of genotype objects.

This function takes a region, `reg', and a list of lists of SNPs, `genotype-list', and translate the SNP lists into genotype objects over the region.

read-affected/unaffected-data

Read affected and unaffected genotypes and calculate their SNP frequencies.

Prototype

 (read-affected/unaffected-data affected-file unaffected-file) 

Example

 (use-modules (generecon SNP genotype))
 (read-affected/unaffected-data af-filename uaf-filename) 

Details

Read affected and unaffected haplotypes and calculate their SNP frequencies.

Read a list of affected haplotypes from `affected-file' and a list of unaffected haplotypes from `unaffected-file' and return them (as lists of lists of SNPs) together with a list of the frequencies of each SNP at each position. For the frequency list, only the unaffected genotypes are considered.

read-distances-affected/unaffected-markers

Read marker distances and affected and unaffected genotypes, calculate their SNP frequencies, and build markers from it.

Prototype

 (read-distances-affected/unaffected-markers kappa mu
                                             distances-file
                                             affected-file
                                             unaffected-file) 

Example

 (use-modules (generecon SNP genotype))
 (read-distances-affected/unaffected-markers 0.01 1e-5 dist-file af-filename uaf-filename) 

Details

Read marker distances and affected and unaffected genotypes, calculate their SNP frequencies, and build markers from it.

Read a list of marker-distances from `distances-file', a list of affected genotypes from `affected-file' and a list of unaffected genotypes from `unaffected-file'; calculate the SNP frequencies in the controls and use these frequencies to construct a list of affected and a list of unaffected markers. The constructed region and the pair of marker-sets is returned as a list

The first two parameters, kappa and mu, are the recombination rate and mutation rate, respectively. These are used for making the region for the markers.

read-genotype-data

Reads a list of genotypes from a file.

Prototype

 (read-haplotype-data file)

Example

 (use-modules (generecon SNP genotype))
 (read-haplotype-data filename)

Details

Reads a list of genotypes from a file.

The file must contains lines of white-space separated lists of SNPs (0, 1, 2, and -1, where 0 indicates homozygote 0, 1 indicates homozygote 1, 2 heterozygote 0/1, and -1 indicates missing data). Each line is interpreted as one genotype, and all lines must have the same number of SNPs.

The function returns the parsed genotypes as a list of lists of SNPs. This list can be translated into a list of genotype objects using `genotype-list->genotype-list'.

read-positions-affected/unaffected-markers

Read marker positions and affected and unaffected genotypes, calculate their SNP frequencies, and build markers from it.

Prototype

 (read-positions-affected/unaffected-markers kappa mu
                                             position-file
                                             affected-file
                                             unaffected-file) 

Example

 (use-modules (generecon SNP genotype))
 (read-positions-affected/unaffected-markers 0.01 1e-5 pos-file af-filename uaf-filename) 

Details

Read marker positions and affected and unaffected genotypes, calculate their SNP frequencies, and build markers from it.

Read a list of marker-positions from `position-file', a list of affected genotypes from `affected-file' and a list of unaffected genotypes from `unaffected-file'; calculate the SNP frequencies in the controls and use these frequencies to construct a list of affected and a list of unaffected markers. The constructed region and the pair of marker-sets is returned as a list

The first two parameters, kappa and mu, are the recombination rate and mutation rate, respectively. These are used for making the region for the markers.

(generecon SNP haplotype)

Module containing functionality for manipulating SNP haplotype data.

calc-frequencies

Calculates the allele frequencies for each marker in the parameter lists.

Prototype

 (calc-frequencies . list-of-list-of-allele-lists)

Example

 (use-modules (generecon SNP haplotype))
 (calc-frequencies affected unaffected)

Details

Calculates the allele frequencies for each marker in the parameter lists.

Takes as input a number of lists of lists of SNPs and calculates a list of frequencies for 0 and 1 in each position in the lists, discarting missing-data SNPs (value -1).

For example, (calc-frequencies (list '(0 1 0) '(0 1 1) '(-1 0 0))) will evaluate to a list of three lists, one for each position in the input set: ((1 0) (0.33 0.66) (0.66 0.33)), indicating that on the first postion, all SNPs are 0 (since -1 is not counted), on the second position one third is 0 and two thirds are 1, and on the third position one third is 1 and two thirds are 0.

The input need not consist of a single list of SNPs but can be any number of such lists, e.g. (calc-frequencies affected-haplotypes unaffected-haplotypes).

haplotype-list->haplotype-list

Translates a list of list of SNPs into a list of haplotype objects.

Prototype

 (haplotype-list->haplotype-list region haplotype-list) 

Example

 (use-modules (generecon SNP haplotype))
 (define reg (region kappa mu markers))
 (define haplotype-list (read-haplotype-data file))
 (haplotype-list->haplotype-list reg haplotype-list) 

Details

Translates a list of list of SNPs into a list of haplotype objects.

This function takes a region, `reg', and a list of lists of SNPs, `haplotype-list', and translate the SNP lists into haplotype objects over the region.

read-affected/unaffected-data

Read affected and unaffected haplotypes and calculate their SNP frequencies.

Prototype

 (read-affected/unaffected-data affected-file unaffected-file) 

Example

 (use-modules (generecon SNP haplotype))
 (read-affected/unaffected-data af-filename uaf-filename) 

Details

Read affected and unaffected haplotypes and calculate their SNP frequencies.

Read a list of affected haplotypes from `affected-file' and a list of unaffected haplotypes from `unaffected-file' and return them (as lists of lists of SNPs) together with a list of the frequencies of each SNP at each position. For the frequency list, only the unaffected haplotypes are considered.

read-distances-affected/unaffected-markers

Read marker distances and affected and unaffected haplotypes, calculate their SNP frequencies, and build markers from it.

Prototype

 (read-distances-affected/unaffected-markers kappa mu
                                             distances-file
                                             affected-file
                                             unaffected-file) 

Example

 (use-modules (generecon SNP haplotype))
 (read-distances-affected/unaffected-markers 0.01 1e-5 dist-file af-filename uaf-filename) 

Details

Read marker distances and affected and unaffected haplotypes, calculate their SNP frequencies, and build markers from it.

Read a list of marker-distances from `distances-file', a list of affected haplotypes from `affected-file' and a list of unaffected haplotypes from `unaffected-file'; calculate the SNP frequencies in the controls and use these frequencies to construct a list of affected and a list of unaffected markers. The constructed region and the pair of marker-sets is returned as a list

The first two parameters, kappa and mu, are the recombination rate and mutation rate, respectively. These are used for making the region for the markers.

read-haplotype-data

Reads a list of haplotypes from a file.

Prototype

 (read-haplotype-data file)

Example

 (use-modules (generecon SNP haplotype))
 (read-haplotype-data filename)

Details

Reads a list of haplotypes from a file.

The file must contains lines of white-space separated lists of SNPs (0, 1, and -1, where -1 indicate missing data). Each line is interpreted as one haplotype, and all lines must have the same number of SNPs.

The function returns the parsed haplotypes as a list of lists of SNPs. This list can be translated into a list of haplotype objects using `haplotype-list->haplotype-list'.

read-positions-affected/unaffected-markers

Read marker positions and affected and unaffected haplotypes, calculate their SNP frequencies, and build markers from it.

Prototype

 (read-positions-affected/unaffected-markers kappa mu
                                             position-file
                                             affected-file
                                             unaffected-file) 

Example

 (use-modules (generecon SNP haplotype))
 (read-positions-affected/unaffected-markers 0.01 1e-5 pos-file af-filename uaf-filename) 

Details

Read marker positions and affected and unaffected haplotypes, calculate their SNP frequencies, and build markers from it.

Read a list of marker-positions from `position-file', a list of affected haplotypes from `affected-file' and a list of unaffected haplotypes from `unaffected-file'; calculate the SNP frequencies in the controls and use these frequencies to construct a list of affected and a list of unaffected markers. The constructed region and the pair of marker-sets is returned as a list

The first two parameters, kappa and mu, are the recombination rate and mutation rate, respectively. These are used for making the region for the markers.

(generecon common)

Module containing functionality common to all input types and MCMC algorithms supported by GeneRecon.

distances->positions

Creates a list of positions from a list of distances.

Prototype

 (read-numbers-from-port port)

Example

 (use-modules (generecon common))
 (define distances (read-distances distances-file))
 (define positions (distances->positions distances)) 

Details

Creates a list of positions from a list of distances between the positions. The first position is placed at 0.

make-markers

Translates a list of positions and a list of frequency-lists into a list of markers.

Prototype

 (make-markers positions frequency-lists)

Example

 (use-modules (generecon common))
 (make-markers '(0.2 0.5) (list '(0.4 0.6) '(0.2 0.8))) 

Details

Translates a list of positions and a list of frequency-lists into a list of markers.

The `positions' list should be a list of numbers, and the `frequency-lists' list should be a list, of the same length as `positions' where each element is, in turn, a list of frequencies summing to 1.

The markers are created by mapping each position to the corresponding list of frequencies, making the frequencies the allele frequencies for the marker at the given position.

read-distances

Read the distances between marker positions from a file and return them as a list.

Prototype

 (read-distances filename)

Example

 (use-modules (generecon common))
 (read-distances distances-file-name) 

Details

Read the distances between marker positions from a file and return them as a list.

read-distances->positions

Read the distances between marker positions from a file and return them as a list of positions (placing the first marker at 0).

Prototype

 (read-distances->positions filename)

Example

 (use-modules (generecon common))
 (read-distances->positions-from-port distances-filename) 

Details

Read the distances between marker positions from a port and return them as a list of positions (placing the first marker at 0).

read-distances->positions-from-port

Read the distances between marker positions from a port and return them as a list of positions (placing the first marker at 0).

Prototype

 (read-distances->positions-from-port port)

Example

 (use-modules (generecon common))
 (read-distances->positions-from-port (current-input-port)) 

Details

Read the distances between marker positions from a port and return them as a list of positions (placing the first marker at 0).

read-distances-from-port

Read the distances between marker positions from a port and return them as a list.

Prototype

 (read-distances-from-port port)

Example

 (use-modules (generecon common))
 (read-distances-from-port (current-input-port)) 

Details

Read the distances between marker positions from a port and return them as a list.

read-numbers

Read a list of space or newline separated numbers from a file.

Prototype

 (read-numbers filename)

Example

 (use-modules (generecon common))
 (read-numbers numbers-file-name) 

Details

Read a list of space or newline separated numbers from a file.

read-numbers-from-port

Read a list of space or newline separated numbers from a port.

Prototype

 (read-numbers-from-port port)

Example

 (use-modules (generecon common))
 (read-numbers-from-port (current-input-port)) 

Details

Read a list of space or newline separated numbers from a port.

read-positions

Read the marker positions from a file and return them as a list.

Prototype

 (read-positions filename)

Example

 (use-modules (generecon common))
 (read-positions positions-file-name) 

Details

Read the marker positions from a file and return them as a list. The list of positions will be sorted.

read-positions-from-port

Read the marker positions from a port and return them as a list.

Prototype

 (read-positions-from-port port)

Example

 (use-modules (generecon common))
 (read-positions-from-port (current-input-port)) 

Details

Read the marker positions from a port and return them as a list. The list of positions will be sorted.

run-in-subprocess

Run the function `program' in `no-calls' parallel processes.

Prototype

 (run-in-subprocess no-calls program)

Example

 (use-modules (generecon common))

  (define (run-markov-chain id)
    (let* ((au-set (affected/unaffected-haplotype-set
                    reg affected-haplotypes unaffected-haplotypes))
           (au-tree (distance-tree au-set))
           (s (sampler (list (list 'disease-locus 1 locus-file) 
                             (list 'likelihood    1 likelihood-file)))))
      (run-mcmc ps s 10000)))

 (run-in-subprocesses 10 run-markov-chain) 

Details

Creates a number of parallel running processes, executing the suplied function in each of them. The supplied function is called with a single argument, a number between 0 and no-chilren-1, that can be used uniquely identify the process.