We sampled the petrous part of the temporal bone of samples POP23 and POP39 and obtained calibrated radiocarbon dates from the Oxford Radiocarbon Accelerator Unit for POP23 (OxA-378000; OxA-37801, ORAU) and POP39 (OxA-37999, ORAU) using IntCal13 calibration curve45 and OxCal version 4.3.246. We also report the following radiocarbon dates for individuals included in this study: from the Ruđer Bošković Institute for POP33 (Z-5732, IRB) with IntCal13 calibration curve45 and OxCal version 4.2.447, from the Penn State AMS 14C Facility for POP35 (PSUAMS-4444, PSU) with IntCal13 calibration curve45 and OxCal version 4.3.248; from University of California Irvine Keck-Carbon Cycle AMS facility for JAG34 (UCIAMS-233509, UCI KCCAMS) with IntCal20 calibration curve49 and OxCal version 4.448 (Table 1, Fig. 1b, Supplementary Table S1).
Group labels generally follow the format “Country_SiteName/Region_TimePeriod”, with the site name contracted to a short form, for example Croatia_Pop_MN; Pop Popova zemlja, Jag Jagodnjak, Dal Dalmatia. Time periods are labelled as: BA Bronze Age, CA Copper Age, EBA Early Bronze Age, EN Early Neolithic, IA Iron Age, LBA Late Bronze Age, LCA Late Copper Age, MBA Middle Bronze Age, MN Middle Neolithic, N Neolithic and RomanP Roman Period.
We processed all samples in dedicated ancient DNA laboratories at the University College Dublin, Ireland. Petrous bones were UV irradiated for 10 to 15 min on each side followed by light sandblasting of the outer surface to remove loose debris. The cochlea was then excavated from the petrous bone using a sandblaster, and UV irradiated on each side for 10 min before it was finely powdered in a mixer-mill (Retsch).
DNA was extracted from about 50 to 70 mg of bone powder following a modified silica column based method optimised for ancient DNA samples50. In a pre-digestion step aimed at reducing contamination51, the bone powders were digested for one hour at 56 °C without rotation in 1 ml of extraction buffer containing 0.45 M EDTA and 0.25 mg/ml Proteinase K. The bone powder was spun down to a pellet by centrifugation and re-suspended in fresh extraction buffer. Samples were digested for one hour at 56 °C followed by 18 h at 37 °C with rotation. Samples were centrifuged at 13,000 rpm, and the 1 ml supernatant added to the reservoir of a Roche High Pure extender assembly tube containing 13 ml of binding buffer. Binding buffer consisted of 5 M Guanidine Hydrochloride, 40% isopropanol, 90 mM sodium acetate and 0.05% Tween-20. Tubes were centrifuged for four minutes at 1500 rcf, the spin column detached and placed in a collection tube and dry spun at 6000 rpm for one minute to remove remaining binding buffer. After placing the spin column in a fresh collection tube, 650 μl of PE wash buffer was added, centrifuged for one minute at 6000 rpm and the flow-through discarded. This step was repeated once followed by dry spinning at 13,000 rpm to remove remaining wash buffer. The column was placed in a clean Eppendorf tube and the sample eluted with 25 μl TET which had been incubated at 37–56 °C. Samples were then incubated at 37 °C for ten minutes followed by centrifugation for 30 s at maximum speed. The elution step was repeated, resulting in a total volume of 50 μl DNA extract. One negative control containing only extraction buffer was processed for every seven samples.
Library preparation and sequencing
Non-UDG-treated, double-stranded libraries were constructed following52. Blunt-end repair was carried out by adding NEBNext End- Repair module (New England Biolabs) to 12.5 μl of each DNA extract, which was incubated at 25 °C for 15 min, followed by 12 °C for 5 min. Samples were then incubated at 25 °C for 30 min for adapter ligation with T4 DNA Ligase (ThermoFisher Scientific). Adapter fill-in was performed with Bst Polymerase (New England Biolabs) with an incubation at 37 °C for 30 min and 80 °C for 20 min to inactivate the enzyme. Purification steps following blunt-end repair and adapter ligation was performed with the MinElute PCR purification kit from Qiagen. A negative control was processed for every seven samples, and a final library volume of 40 μl obtained.
Single indexing PCR was performed by adding a unique seven-base-pair index to 3 μl of each library using Accuprime Pfx Supermix (Life Technology) and IS4 primer to a total reaction volume of 25 μl. The PCR temperature profile consisted of initial denaturation at 95ºC for five minutes, a further 15 s of denaturation at 95ºC, twelve cycles of annealing at 60 ºC for 30 s, elongation at 68 ºC for 30 s and a final extension at 68 ºC for five minutes. A negative PCR control was included with every batch. The PCR amplification and subsequent clean-up steps were performed in a separate lab in a different part of the building from the clean labs. MinElute PCR purification kit spin columns were used for purification of amplified libraries following Qiagen instructions. Quantification of amplified product was performed using Qubit 2.0 Fluorometer (Thermo Fischer Scientific) and Agilent 2100 Bioanalyser DNA 1000 assay. Single-end shotgun sequencing was performed by pooling samples in equimolar amounts onto an Illumina NextSeq500 platform using 75-cycle kits for 1 × 76 cycles and 1 × 7 cycles for de-multiplexing.
Adapter sequences were trimmed from reads using Cutadapt (version 1.15)53 discarding reads under 17 bp (-m 17) and allowing an overlap of 1 bp between the read and adapter (-O 1). Reads were then mapped to the UCSC genome browser human reference hg19 (GRCh37) to produce BAM files with BWA aln/samse (version 0.7.15-r1140)54, replacing the mitochondrial genome with the revised Cambridge Reference Sequence (rCRS, Gen bank accession no. NC_012920.1)55. Seed length was disabled (–l 1000) and the default number of differences (-n 0.04) and minimum Phred scale mapping quality of 30 (-q 30) were used. PCR duplicates were removed with SAMtools rmdup (v.0.1.19-96b5f2294a)56.
Sites overlapping the ~ 1240 k SNP capture array were used to generate a pileup file for each individual using SAMtools mpileup (version 1.3)56 with the quality flags –q30, -Q30 and –B. This file was used to genotype individuals whereby a single base call was chosen at random from each SNP site to produce pseudo-haploid calls with transversion SNPs only using the flag –t SkipTransitions in pileupCaller (https://github.com/stschiff/sequenceTools) in order to remove variants affected by post-mortem damage in non-UDG treated samples. A second genotype dataset was produced that included all SNPs for use in functional SNP and ROH analyses (see below) (Supplementary Table S1).
We used mergeit (version 2450) from the package ADMIXTOOLS57, to merge the new genotype data to reference datasets2,30,31,35,38,39,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,101, Supplementary Table S2) containing 1250 ancient individua l s genotyped at 1,233,013 SNP sites, of which 140,159–192,648 transversion-only SNP positions are covered by the newly-reported individuals (Supplementary Table S1). This was also merged with diploid genotypes of 26 present-day individuals58,59,60,61, and, for tests involving present-day comparisons, a panel of worldwide present-day populations containing 1311 individuals genotyped at 597,573 nuclear SNP positions on the Affymetrix Human Origins (HO) array30, of which 66,880–94,317 transversion-only SNP positions are covered by the newly-reported individuals (Supplementary Table S1).
We produced a pileup from BAM files using SAMtools mpileup (version 1.3)56 with a minimum mapping and base quality of 30 (-Q 30, -q 30) and –B to turn off base alignment quality for five phenotypically informative SNPs that are included in the 1240 K panel. This included lactase persistence39,40 and skin and eye pigmentation35,36,37. Numbers of reads supporting each allele are reported in Supplementary Table S11.
Coverage on the autosomal and sex chromosomes was calculated using a script available at https://github.com/TCLamnidis/Sex.DetERRmine to determine genetic sex of each individual with standard errors. Males should have an x-rate of 0.5 and y-rate of 0.5. Females should have an x-rate of one and y-rate of zero (Table 1, Supplementary Fig. S1, Supplementary Table S1).
Ancient DNA authentication
X chromosome contamination in males
As males have only one copy of the X chromosome, we measured contamination in males by estimating polymorphism on the X chromosome using ANGSD (version 0.910)63. Results based on new Method1 are reported for a minimum of 200 SNPs on the X chromosome that are covered at least twice (Table 1, Supplementary Table S1).
Estimates of contamination based on comparison of the mitochondrial genome with a database of potential present-day contaminant human mtDNA sequences were obtained with Schmutzi64. To do this, we used EAGER (version 1.92.55)65 to remap all reads for each individual to the mitochondrial rCRS (Gen bank accession no. NC_012920.1)55 with CircularMapper (version 1.93.4)65, filtering on a minimum mapping quality of 30 and removing duplicates. Schmutzi confidence intervals are given as est.high and est.low (Table 1, Supplementary Table S1).
Mitochondrial haplogroup assignment
We used Schmutzi64 to reconstruct consensus mtDNA sequences for each individual from the remapped reads, which we then imported into Haplogrep266 (https://haplogrep.i-med.ac.at/) for automated mitochondrial haplogroup assignment based on phylotree mtDNA tree build 17 (http://www.phylotree.org/) (Table 1, Fig. 4a, Supplementary Table S1). We manually checked aligned mtDNA sequences for individuals possessing the same haplogroup, which revealed that the pair of first degree relatives, JAG58 and JAG06, possessed identical mutations for the T2b branch. Variants at unreliable polyC stretch positions were disregarded: 518, 309.1C(C), 315.1C, AC indels at 515–522, 16093C, 16182C, 16183C, 16,193.1C(C) and 16,519.
Y chromosomal haplogroup assignment
We used Yleaf (version 1.0)67 to infer the Y chromosomal haplogroup in males in an automated way based on haplogroup-defining SNP positions in the ISOGG 2016 nomenclature. We filtered results on derived alleles and transversion-only SNPs, and the most downstream haplogroup was selected (Table 1, Fig. 4a, Supplementary Table S1). Haplogroups were also inferred manually using SAMtools mpileup –q30 –Q30 with concordant results (version 1.3)56. We used Integrative Genomics Viewer (Broad Institute)68 to visually inspected reads in order to verify if defining variants were in the middle or at the end of a read to assess reliability, and to confirm mutations among individuals with shared haplotypes. Four of the Jagodnjak males share the same mutations (G2a2a-Z31430) while no reads covered the defining position for this haplotype in the fifth male, JAG82.
Consanguinity up to two degrees of relatedness was assessed by calculating pairwise mismatch rates from autosomal pseudo-haploid genotype data filtered on transversion-only SNP sites from the 1240 K SNP panel, using pMMRCalculator (https://github.com/TCLamnidis/pMMRCalculator) and READ69 with default parameters. READ provides an upper and lower Z score to help assess the certainty of the results, with the upper Z score indicating the distance to a lower degree of relationship, and a lower Z score indicating the distance to a higher degree of relationship. There was a minimum overlap of 92,000 SNPs between pairs of individuals, and both methods produced comparable results (Fig. 4a, Supplementary Fig. S8, Supplementary Tables S1 and S9), although READ produces slightly lower mismatch rates than pMMRCalculator, meaning individuals are estimated to be slightly more closely related than pMMRCalculator estimates. READ’s binned approach with sliding windows may contribute to this discrepancy. A pair of first degree relatives is expected to have a pairwise mismatch rate that is halfway between the baseline for unrelated and identical individuals. Coefficients of relatedness can be somewhat inflated for individuals with high inbreeding coefficients however. No relatedness was found between POP39 and a previously published individual I3499 from the same site and similar radiocarbon date. Where first degree relatives were identified, one individual from the pair was excluded from population-wide analyses, in this case JAG06. The same analyses were performed on a genotype dataset that included all SNP sites in order to assess the effects of damage on kinship estimates (Supplementary Fig. S8). Transversion-only genotypes shift the pairwise mismatch rates downwards, which means individuals are estimated as more closely related than when using all SNPs. This data is produced from non-UDG-treated DNA libraries, therefore this could indicate that ancient DNA damage can lead to an under-estimate of true relatedness.
Runs of homozygosity
Runs of Homozygosity (ROH) greater than four centimorgans (cM) were identified with the Python package hapROH34 (https://test.pypi.org/project/hapsburg/) using default parameters for the dataset containing all SNPs. A global dataset of 5008 haplotypes were used as a reference panel taken from the 1000 Genomes Project. The total sum ROH is reported for each length category of 4-8 cM, 8-12 cM, 12-20 cM and > 20 cM (Fig. 4a,c, Supplementary Table S10).
Principle components analysis
We used smartpca (version 16,000) in the EIGENSOFT package (version 6.0.1)70 with a set of 59 present-day west Eurasian populations from the Human Origins dataset30 to construct the first two principal components, and projected the ancient genomes with options lsqproject:YES, shrinkmode:YES and outliermode:2 (Fig. 2). Present-day populations in the HO dataset used for computing the principal components included: Abkhasian, Adygei, Albanian, Armenian, Balkar, Basque, BedouinA, BedouinB, Belarusian, Bulgarian, Canary_Islander, Chechen, Chuvash, Croatian, Cypriot, Czech, Druze, English, Estonian, Finnish, French, Georgian, Greek, Hungarian, Icelandic, Iranian, Italian_North, Italian_South, Jew_Ashkenazi, Jew_Georgian, Jew_Iranian, Jew_Iraqi, Jew_Libyan, Jew_Moroccan, Jew_Tunisian, Jew_Turkish, Jew_Yemenite, Jordanian, Kumyk, Lebanese, Lezgin, Lithuanian, Maltese, Mordovian, North_Ossetian, Norwegian, Orcadian, Palestinian, Polish, Russian, Sardinian, Saudi, Scottish, Sicilian, Spanish, Spanish_North, Syrian, Turkish, Ukrainian.
UMAP was run with the R package umap (version 0.2.3.1)71 using default parameters (Fig. 3c). Input was provided from the first ten principal components computed by PCA for 128 individuals from 13 present-day HO popula tions (Albanian, Romanian, Bulgarian, Cypriot, Greek, Italian_North, Italian_South, Maltese, Sicilian, Czech, Hungarian, German, Croatian) and 47 ancient individuals (Germany_Untetice_EBA, Hungary_BA, Montenegro_LBA, Romania_CA, Bulgaria_BA, Bulgaria_IA, Croatia_Dal_BA and the newly-sequenced individuals.
We performed unsupervised admixture analysis with ADMIXTURE (version 1.3.0)72 (Supplementary Fig. S2) on 2,361 ancient and present-day individuals (see Datasets section in Methods) by first using PLINK (version 1.90b5.3)73 to remove variants that had a minor allele frequency below 0.01, and to prune the dataset for strong linkage disequilibrium with parameters –indep-pairwise 200 25 0.4. We then ran five replicates for K4 to K17 with a random seed and cross-validation (Supplementary Fig. S2), and the highest likelihood replicate was chosen.
We used a set of packages in ADMIXTOOLS57 for performing f-based statistics. Outgroup f3-statistics was calculated with qp3Pop (version 435) (Supplementary Fig. S3a-b, Supplementary Fig. S7, Supplementary Table S3), qpDstat (version 751) was used to calculate f4-statistics with the option f4Mode: YES (Supplementary Fig. S6, Supplementary Table S7), and qpWave (version 410) and qpAdm (version 810) were used with option allsnps: YES for estimating mixture proportions (Fig. 3a-b, Fig. 4b, Supplementary Fig. S4, Supplementary Table S4). The option Chr: 23 was added to qpAdm for computing results based on the X chromosome in analyses testing for sex-bias (Supplementary Table S8). Following the method outlined in2, we calculated a Z score for each ancestry component to measure the difference in admixture proportions between the autosomes and X chromosome, where a positive Z score indicates more admixture on the autosomes and therefore male-biased ancestry. Mbuti.DG was used as an outgroup for all statistics. For qpAdm, right populations included Mbuti.DG, Ust_Ishim_HG_published.DG, Ethiopia_4500BP.SG, Russia_MA1_HG.SG, Italy_Villabruna, Papuan.DG, Onge.DG, Han.DG. qpWave was used to check the outgroup populations could successfully distinguish the ancestries present in the sources. Rather than identifying the specific source populations and admixture events that occurred, qpAdm models help to ascertain the type ancestry that would have contributed to the gene pool of the target population via admixture.
We used DATES (https://github.com/priyamoorjani/DATES)74 to estimate the age of past population admixture events between two source populations by inferring time since mixture from the average size of ancestry blocks, assuming a generation time of 29 years (Supplementary Table S5). Decay curves are reported in Supplementary Fig. S5. Estimates can contain some noise due to later admixture events, and this model does not take into account multiple admixture events or admixture of already admixed populations.