Totalt antall sidevisninger

søndag 22. desember 2013

Updated Europe Analysis

(Updated 26/12-2013) This is the first updated Europe analysis using the "superindividual" approach where all individuals not of interest is group into the "other" group. Just as in the Fennoscanda analysis it appear to give added resolution to the results.

The chunkcount heatmap show much the same as in previous analysis but the new cluster seen in the Fennoscandia analysis also appear here. Chunkcounts is the total number of identical and related segments of DNA between any individual. The intensity of the colors on the heatmap and the closeness in the tree branch indicate relatedness to other individuals or groups.

CC Aggregated Europe linked 289k

 CC Raw Europe linked 289k 

The PCA overview of Europe appears to be better than previous analysis by using superindividual grouping for "others". We can now for example see a clear differentiation between Finns and Saamis in dimension 3 horizontal and dimension vertical 4. The differentiation in dimension 4 appears to be that Saamis pull to the same level as the Basque and Sardinians in this axis. 

CC Europe PCA Overview linked 289k

CC Europe PCA Individuals linked 289k

The PCA components presented in geographical heatmaps:

Dimension 1 - external peaks in Finns. Opposite Spanish, Sardinians and Romanians. What source "Others" is is not known but appears to be close.

Dimension 2 - external peaks in Albanians, Mordovians, Vologa Russians and Saamis. Opposite Scandinavians. May suggest a non-European influence that have arrived far north and far south into Europe or/and foundereffect among Scandinavians.

Dimension 3 - internal peaks in Finns and Saamis. Opposite Basque and Sardinians. May reflect expansion of agriculture to Europe.

Dimension 4 - internal peaks in Lithuanians, Mordovians, Belorussians, Ukranians. Opposite Basque Sardinians and Saamis. May reflect Eastern European expansion and/or foundereffect. The ancestry sharing along the Western European shores from Iberians to the Saamis may suggest old coastal migrations from the south to the north.

Dimension 5 - internal peaks among Scandinavians and Saamis. Opposite Basque, Sardinians, Spanish and Finns. This may show an agricultural expansion from South-East Europe to Finland.

Dimension 6 - internal peaks among Basque and secondary Saami. Opposite Sardinians. May suggest a common autosomal link between Saamis and Basque.

Dimension 7 - internal peaks among Saamis and secondary Sardinians. Opposite British and Vologda Russians. Appears specific to Saamis as rest of Europe appears quite uniform except for the Sardinians.

(Analysis in progress)

torsdag 19. desember 2013

Updated Fennoscandia Analysis

(Updated 26/12-13) Its finally time for updated Fennoscandia analysis. Its been over a half year since the last update. We have more than 53 new participants most of these are Swedes.

We do as usually look first at the heatmap that catches the main structure of the data. CC is ChunkCounts and is the total number of shared segments including identical and related segments. CL is ChunkLenght and is the total shared (both identical and related) segments in cM (centimorgans).

CC Linked Aggegated 289k 
 CC Linked Raw 289k  

CL Linked Aggregated 289k

CL Linked Raw 289k

The most different we see from previous Fennoscandia level analysis is the huge influx of Swedes creating a bigger cluster especially for the northern part (Norrländingar) of the country.  This cluster of Swedes at the middle of the heatmap appears to show affiliation to the Southern Saami cluster consisting of SWE7 and SA3. So it appears like there must have been some kind of founder effect and maybe drift in this process that have made them look more related to each other than other Swedes and other Scandinavians from further south. This foundereffect appear to have included a Saami population or Saami like population. 

There is also a secondary cluster consisting of mostly Swedes but also some Norwegians in the upper part of the heatmap below the Saami. These appears to show affiliation to the Northern Swedish cluster, Finns and individuals of mixed Scandinavian and Finnish ancestry.

External influenced varuation

The PCA plots can as seen before be divided into external influenced variation and internal variation. Dimension 1 and 2 appear to be very similar external variations that peaks in "Others" and with Danish and Scandinavians and lowest among Finns. That this variation peaks among Danish suggest a central-European origin of this external influence. These two dimension effectively divide Scandinavians and Finns.

 CC Linked PCA D1-D2 289k 

Dimension 3 on the other hand reflect an external variation from "Others" that peaks among Finns and to large extent Saami and the least among Scandinavians suggesting a Eastern European continental origin for this influence. As we can see also this dimension appear to be effective to differentiate Finns and Saamis from Scandinavians but it doesnt appear so clear cut as dimension 1 and 2 above.

 CC Linked PCA D1-D3 289k 

Internal influenced variation

As dimension 1 horizontal axis appears to differentiate so well the Scandinavians and Finns at each extreme we will continue to use this dimension. All the dimensions above 3 appear to show internal variation. The dimension 4 vertical axis appears to show the Scandinavian-Finn and Saami at each extreme.

CC Linked PCA D1-D4 289k

We clearly see from the new participants that NO25, SWE56, SWE62, SWE53 and SWE52 very clearly pull out of the Scandinavian cluster towards the Saamis and to some lesser degree also SWE59 and SWE55. Most of the appear to belong to the " Norrländingar" cluster.

Of individuals going away from the Finnish cluster towards the Scandinavians we can mention FI35, FI40. Of Scandinavians moving towards Finns we can mention NO21, SWE65, SWE54, NO26, SWE45, SWE55, SWE59, SWE50, SWE68.

As we can see from the heatmap the north-Swedish cluster appear to be much closer related to each than the normal Swede. This also shows in the PCA plot dimension 1-5 with dimension 1 at the horizontal and dimension 5 at the vertical.

 CC Linked PCA D1-D5 289k  

The "Norrländing" cluster appears to peek in Norrbotten area. As we can see the South-Saamis (SWE7+SA3) appears to attract the most of other groups to this cluster.

DImension D1-D6 appears diffcult to explain. It appears to be Finnish variation as its more dispersed among Finns than others.

CC Linked PCA D1-D6 289k

I will need some more research about this dimension. It doesnt seem to reflect the structure of Finnish subgroups in the heatmaps.

CC Linked PCA D1-D7 289k

In this plot the horizontal dimension 1 is the same as above while the dimension 7 appears to peak among the South-Saami individuals SWE7 and SA3. As we can see that both some Finns and Scandinavians climb towards these individuals. However the top other individuals closest to these South-Saamis appears to be from Västergotland and Skania area. On the Finnish side I am not sure but maybe coastal areas of Finland or from Karelia.

This clustering make me unsure and it maybe will not be cleared out properly with higher marker resolution, something I have been planning to do at least for the Fennoscandia analysis to higher power of differentiation.

I have also been looking at dimension 6 and it doesn't make sense to me currently so it add further uncertainty to the interpretation of dimension 7 a higher dimension than dimension 6 as higher dimensions means more uncertainty as individual variation become more visible in the data than in the lower dimension that capture more at the group level. 

(Work in progress)

onsdag 13. november 2013

Ajv70 and modern European variation II

Updated: 20 Nov 2013. This is a updated analysis of the previous Ajv70 analysis. See previous blogpost about the improvements. This analysis is based on 444k SNPs after all filtering (mostly imputed for all the modern populations, for Ajv70 all SNP are actuals).

Ajv70 appears different from Ajv52 who in the previous analysis appeared to be of mixed ancestry between what appeared to be a Saami like and a Baltic like or Eastern European like population. Ajv70 on the other hand in large cluster with Saamis, Vologda Russians and Mordovians and in particular with Saamis.

CC Euro Overview 444k SNP

As we can see the heatmap ancestry profile for Ajv70 and for the other close clustering groups seem resemble each other. Note that Mordovians and Vologda Russians doesnt cluster with Saamis and Finns in Chromopainter-Finestructure linked mode on the heatmaps but with Eastern Europeans. This is likely due to the linked mode reflect more recent ancestry and that the unlinked mode shows more ancient ancestry and Vologda Russians and Mordovians do seem to have more recent Eastern European admixture and in the past being part of the Finnic and Volga Finnic language area. This is seen both in linked and unlinked mode but probably weights more in linked mode than in the unlinked mode making them cluster with Eastern Europeans in the linked mode. The unlinked mode also give a lower resolution as its based only on allele frequencies adding to uncertainty.

The low affiliation for Ajv70 vs the Mediterranean populations on the heatmaps is only matched by Saamis and Finns among the modern populations. Finns appears more influenced by Eastern European populations and Scandinavians than the Ajv70 while Vologda Russians and Mordovians appears more influenced by Eastern European populations. Saamis show less affiliation to Scandinavians and Eastern Europeans than both Finns and Ajv70, but Ajv70 heatmap profile do seem to resemble the Saami most. 

The PCA dimensions dimension 4-5 showing internal European variation does not at first glance support the common heatmap clustering of Ajv70 vs Saamis, Vologda Russians and Mordovians as the Ajv70 appears among Scandinavians on the PCA and the Saami/Finns and Vologda Russians/Mordovians appear on very different locations on the PCA,  However the heatmaps neither indicate any Scandinavian like ancestry for Ajv70 even the individual's variation is plotted in this cluster. What could explain this seemingly contradictory result that Ajv70 resemble Saamis on the heatmap but not on the PCA is the fact that Ajv70 is not close in geneology to the populations it cluster closest with on the heatmap and that the Saamis, Finns, Mordovians and Vologda Russians are more dissimilar to "Others" and similar to "Others" than the others, however what this dissimilarity and similarity could be can be different..

CC Euro PCA 444k D4-D5

First about "not close in geneology". It can be best observed if we on a Europe PCA included related Orcadians and unrelated British. The Orcadians would form a clear outgroup on the PCA from the British even their ancestry profile on the heatmaps for the Orcadians wouldnt look much different from the British except for their internal sharing due to close relatedness and would else branch close to the unrelated British.

For the same reason Ajv70 would not cluster with Saamis and Finns on the PCA plot even they do cluster on the heatmap and else shows very similar profile with Finns and especially Saami. The explanation would therefore be that Ajv70 was not part in the founder effect that made the modern day Saamis and Finnish cluster due to closer relatedness to each other and therefore Ajv70 appears among the next best they could cluster with, the Scandinavians on the PCA plot (In earlier linked analysis Scandinavians shows both Saami and Finnish admixture) compared to continental Europeans so it would make sense.

Second about "dissimilarity" and "similarity" vs external influences or "Others". In dimension 1 and 2 vs the "Others" (the rest of the world panel) we seem to catch variation that could also explain why the Ajv70 heatmap clustering also include Mordovians and Vologda Russians even the European PCA dimension 4-5 do not support this clustering.

CC Euro PCA 444k D1-D2

This PCA plot describe the degree of similarity and dissimilarity vs the "Others". As we can see here Lithuanians, Scandinavians and many more form the upper left extreme and middle of the plot showing variation closest to the "Others" while the Saamis showing variation most distant to the "Others" in the lower right. The Ajv70 is within the range of the Saamis in this plot. There is also one Vologda Russian within this range too else the Vologda Russians and Mordovians is the next to follow up to the left after the Saamis and Ajv70. In PCA dimension 3 we also have a dimension where the Saamis, Vologda Russians, Mordovians and to some extent Finns is closest to the "Others" (not shown). The position on these dimensions can have have multiple explanations whatever each individual have been or been not influenced in various degree by the category "Others" and the very diverse panel included in this. So this means the dissimilarity for Saamis, Mordovians and Vologda Russians not necessarily is the same dissimilarity just more dissimilar than "Others" category, for example the Saamis probably pull high towards "Others" dimension 3 due to what appears as Siberian like minority ancestry while the Ajv70 who shows no such similar pull probably pull towards the "Others" because of what may be minority Afircan like ancestry (may also be erroneous affiliation due to contamination). This could affect the Finestructure tree clustering in unlinked mode giving adding Vologda Russians and Mordovians to the Saami, Finns and Ajv70 cluster.

Conclusion: Ajv70 appear to have the most similar heatmap profile to the Saamis but this individual do not seem to have been part of the same founder effect that made the higher genetic sharing between modern Saamis and Finns and therefore do not cluster together with the Finns and the Saamis on the PCA instead making Ajv70 cluster "incorrect" with their closest neighbors Scandinavians. Ajv70 neither show any significant degree of influence from the Baltic or Eastern European populations like Ajv52 who made Ajv52 appear to shift away into open space from Ajv70 position among Scandinavians closer to the Eastern European populations. The Ajv52's clustering with Scandinavian-Saami mixed individuals when adding some Baltic like admixture seem to further support that Ajv70 mostly resemble a Saami like population. Next it would make sense to check if Ire8 may have been part of the common modern Saami-Finns foundereffect. Earlier analysis may suggest it to certain degree.

Edit 19/11-13 broader overview with more European and Middle-Eastern populations. It is again very clear that Ajv70 cluster with Uralic or earlier Uralic populations and especially the Saami.

 CC Euro Overview extended 444k SNP 

Individual results:

CC Euro haploid 444k

CC Euro diploid 444k

fredag 8. november 2013

Ajv52 and European variation II

This is a updated analysis of the previous Ajv52 analysis. In the previous analysis it was difficult to find any proper affiliation of the Ajv52 individual. The reason for seems now to be that I only did the mapping- and base quality filtering but not the additional filtering needed to remove what could be contamination. This would as suggested to me by the author of Skoglund 2012 make the ancient genomes appear more African like than they really were. So this time I followed the remaining contamination procedure except for gap filtering as this wasn't described in detail in the supplementary and I removed all positions with multiple reads (author randomly chosed one random if multiple). I also this time didn't do any LD prunning in PLINK as the authors of Chromopainter-Finestructure commented LD would be taken into consideration in the Chromopainter unlinked mode. I also grouped all individuals not of interest into superindividuals ("others") in Finestructure also as recommened by the Chromopainter-Finestructure authors. This analysis is therefore based on 261k SNP after contamination filtering.

As we can see below the Ajv52 have made an interesting clustering at large together with other groups of mixed Scandinavian and Saami or Finnish ancestry and in particular to the group of mixed Scandinavian-Saami ancestry (NOR-SAM). This is interesting if one sees this in light of my earlier Ire8 analysis (Ire8 will be reanalyzed using similar approach later) where Ire8 very clearly clustered with the Saami-Finnish group. If looking at the heatmap we can see that Ajv52 shows an increased affiliation to the Baltic populations compared to others in the larger group at the same time as Ajv52 shows increased affiliation to Saamis and Finns the Baltics continental Europeans and even Scandinavians doesn't have.

CC Europe Overview 261k SNP 

This suggest that Ire8 may have represented the receiving population of ancient Gotland who had similarity to modern day Saamis and Finns while Ajv52 may represent a mix with this population and a population migrating across from the Baltic region.

EDIT 9 Nov 13: have added individual results. I have added heatmap results both in haploid and diploid mode as the Ajv52 analysis is Chromopainter have been done in haploid mode (Ajv52 have only "homologus" data). Diploid mode have been made by using superindividuals.

In diploid mode we see that Ajv52 cluster with NO6, NO7 and SWE40. In haploid mode we see Ajv52 cluster with SWE40_A, SWE40_B, NO7_A, NO7_B, NO6_A, NO6_B, SWE7_A and SWE11_A. All these individuals are of mixed Scandinavian-Saami ancestry.

CC Europe Haploid 261k SNP

 CC Europe Diploid 261k SNP 

Edit 11 Nov 13: The PCA plot for dimension 4 and 5 clearly shows that Ajv52 have a shift toward Eastern Europe compared to Ajv70 who is in the Scandinavian cluster. See discussion about Ajv70 position on PCA vs heatmap in the Ajv70 post as it relates to Ajv52 as well.

CC Europe diploid D4-D5 261k SNP

Edit 19 Nov 13: Extended Overview of Ajv52's clustering with European populations. We still see that Ajv52 cluster with Scandinavian-Saami mixed ancestry individuals.

 CC Europe extended Overview 261k SNP  

tirsdag 15. oktober 2013

Digging deeper in Fennoscandian ancestry II

This is a continuation of the previous analysis (not a new analysis run!) and this time we look at the analysis possibilities of differentiating Scandinavians (Swedes, Norwegians, Danes). This time all individuals who shows considerable Saami (both North-Saami like or South-Saami like) or Finnish admixture in the previous analysis was removed from further analysis to keep any outside influence reduced to a minimum and lumped into the "other" group containing the rest of the world.

The data was run through Finestructures clustering processing but it was unable to differentiate the remaining Scandinavians meaning it appeared by the software to be one population. However in the PCA plot it was possible to infer structure. As we so in the previous analysis the first two dimensions of the PCA plot appears to reflect level of external influences in Scandinavians. In both dimensions the Norwegians clustered in the upper left corner closest to the "others" while Swedes clustered in much of the lower right corner but with a huge spread over large part of the plot. The single Danish individual appear to cluster with the Norwegians.

PCA dimension 1(horizontal) and 2 (vertical)

Dimension 3 on the other hand appears to be internal between Norwegians and Swedes, below together with dimension 2 and seem to give a better clustering. The Danish individual still in the Norwegian cluster but closer to the Swedes.

PCA dimension 2(H) and 3 (V)

So the Chromopainter-Finestructure pipeline appears to be able to differentiate Norwegian and Swedish ancestry even only using 289k SNP's. The division isnt entirely clear cut but there have been populations movement between these countries for many centuries so the classification labels may not be entirely correct and some individuals also have mixed backgrounds.

lørdag 5. oktober 2013

Digging deeper in Fennoscandian ancestry I

I have been the last few months become more aware by the authors of Chromopainter-Finestructure that I have not been doing the analysis completely "according to book" even the software manuals have not being saying it explicitly.

I have from the start had this practice of running the Chromopainter-Finestructure analysis with the whole world panel and then later extracted from the output file the subdata from each run into a European panel and into a Fennoscandian panel for more detailed local analysis. However actually you should use the "superindividual" and "continent" functionality in Finestructure to get the analysis right and no file editing necessary and you also need to do a new ChromoCombine run with the forcefiles with these superindividuals.

I have been testing the difference using my earlier practice in the last standard run (this is NOT a new run!) data with these new settings to compare the results. It appears that after the advice from the authors its possible to extract even more information from the run with the current resolution than I was aware of even at Fennoscandiia level. I have in this reanalysis grouped all others than Fennoscandia into the superindividual "Others".

First the heatmaps. As we can see as before the heatmap identify a range of main clusters as before, some may notice that the numbers of identified clusters seem to be lower than from previous runs.

CC Fennoscandia 289k Aggregated

CC Fennoscandia 289k Raw

We then turn the attention to the PCA plots to see if we can infer more resolution than the standard heatmaps. As the superindividual "Others" contain much of the rest of the world it appears quite distant in both heatmap tree and the PCA. The "Other" group dominate the first 3 PCA dimensions as it appears far away from the Fennoscandia group in sum. The direction of "Others" are indicated on the plot.

PCA D1-D2 Fenno vs Others

PCA D1-D3 Fenno vs Others 

As I understand it these PCA D1-D3 reflect level of external influences on Fennoscandinans possibly from continental Europe as DK1 is closest "Others" in both dimensions. Scandinavians are closest to "others" on both dimension 1 and 2 while Finns and Saamis are closest "others" in dimension 3 especially FI18. In any way all these 3 dimensions can be used to differentiate Scandinavians vs Finns and Saamis. We had a similar dimension in the earlier local Fennoscandia analysis.

In dimension 4 we get a dimension that is able to differentiate Saamis from Finns and Scandinavians. We see North-Saami individuals at the extreme. If we combine this dimension with any other two dimensions we would get a plot differentiating Scandinavians, Finns and Saamis. Notice that we now also get a better grouping of the Finns.

PCA D1-D4 Fenno vs Others

At this point from previous analysis we would not get any more information using my earlier method, however using these new setting we can dig even deeper in the PCA plot. This can be shown below as we move to dimension 5.

As we can see below SWE7 a South-Saami individual clearly stand out in a own dimension separately from dimension 4 peaking in North-Saami. What this mean that using this new method the project can explicit differentiate South-Saami ancestry from North-Saami ancestry.

In earlier analysis the South-Saami ancestry appear to blend in somewhere between North-Saami and Scandinavians but as this show they are in a dimension of their own even they do share ancestry with North-Saami in dimension 4. However the North-Saami share very little of this South-Saami specific component but it appears far more common among other Fennoscandians both Scandinavians and even some Finns than the North-Saami specific component.

PCA D1-D5 Fenno vs Others 

This means that we can combine dimension 4 and 5 to map further explicit for Saami ancestry both North and South in Finns and Scandinavians.

PCA D4-D5 Fenno vs Others

There are more dimensions after dimension 5 but they appear to become more unclear and increasingly reflect individual variation. However dimension 6 (vertical axis) may be something worth looking at in the future as it appears populated with what seem to be western Finns, central Swedes and two Saamis depending on where you set the borderline.

 PCA D1-D6 Fenno vs Others 

CONCLUSION: This study shows that using superindividuals one can extract even more detailed ancestry information from autosomal genetic data within Fennoscandia. This new knowledge will be used in future project updates.

onsdag 18. september 2013

Saami ancestry and the MDLP Oracle-x Population Fitting

Update 08.10.2014: The MDLP calculators have been updated and now do not give the results as shown below. This post is then considered outdated and should not be used as guide vs the MDLP calculators.

This is a short test of MDLP Oracle-x Population Fitting at Gedmatch ability to catch Saami ancestry. Only MDLP calculators have been tested as these are the only one with Saami population reference.

As the test shows finding Saami ancestry using these calculators may give very different and even erroneous results (no Saami ancestry or minor Saami ancestry when there is actually major Saami ancestry) but one stand clearly out as the preferred choice. The "test subject" (with consent) is a North-Saami individual participating in the project with mostly Saami ancestry.

Absent = Saami ancestry not detected.
Minor = Saami ancestry detected but as minority ancestry.
Top but minor = Saami ancestry detected as top population but with less than 50%.
Top majority = Saami ancestry detected as top population with more than 50%.

As the result shows using MDLP K=5 Oracle X "Pct. Calc. Option 1" appears to be the preferred choice to detect Saami ancestry.

EDIT 20/9-13

Please note that all the other functionalities of the different versions of the MDLP calculators like Oracle and Oracle-4 was neither able to find that the Saami individual had majority Saami ancestry. Some where able to infer Saami minority ancestry and some didnt detect anything at all. The exception again is the MDLP K=5 that managed in Oracle to get a Saami population as number two and 3 of 4 in Oracle-4. Thie MDLP 27 calculator (not in Gedmatch) managed using Gaussian method 1 population mode to infer the correct population but failed in the 2,3 and 4 approximations.

onsdag 11. september 2013

La Braña 2 and modern European variation

This is a reanalysis of the La Braña's but this time separately. The La Braña 2 matched the 1000 genome reference panel with 56k SNP's. These SNP's was used together with the 288k SNP's from the standard population that match the 1000 genome reference SNP's to impute the missing 56k SNP's from the La Braña as described earlier. These SNP's was then further LD pruned in PLINK to 26k SNP's and then run through the Chromopainter-Finestructure unlinked pipeline using the world panel. The European panel was then later extracted from the Chromopainter output files and run through Finestructure using 21k SNP's.

The heatmap, tree structure and PCA plot below shows somewhat different result than for the La Braña 1 as La Braña 2 appears to have a position that cluster around with the Scandinavian-Saamis (individuals with both Scandinavian and Saami background).

This means that the original analysis of the composite La Braña need to be adjusted after the findings here. La Braña 2 appears most similar to individuals of mixed Scandinavian and Saami ancestry..

CC Euro unlinked 21k

 CC Euro unlinked 21k detailed 

 CC Euro unlinked 21k D1-D2

  CC Euro unlinked 21k D1-D3

EDIT: 20/9-13