Genomic insights into the formation of human populations in East Asia

Abstract
The deep population history of East Asia remains poorly understood owing to a lack of ancient DNA data and sparse sampling of present-day people1,2. Here we report genome-wide data from 166 East Asian individuals dating to between 6000 bc and ad 1000 and 46 present-day groups. Hunter-gatherers from Japan, the Amur River Basin, and people of Neolithic and Iron Age Taiwan and the Tibetan Plateau are linked by a deeply splitting lineage that probably reflects a coastal migration during the Late Pleistocene epoch. We also follow expansions during the subsequent Holocene epoch from four regions. First, hunter-gatherers from Mongolia and the Amur River Basin have ancestry shared by individuals who speak Mongolic and Tungusic languages, but do not carry ancestry characteristic of farmers from the West Liao River region (around 3000 bc), which contradicts theories that the expansion of these farmers spread the Mongolic and Tungusic proto-languages. Second, farmers from the Yellow River Basin (around 3000 bc) probably spread Sino-Tibetan languages, as their ancestry dispersed both to Tibet—where it forms approximately 84% of the gene pool in some groups—and to the Central Plain, where it has contributed around 59–84% to modern Han Chinese groups. Third, people from Taiwan from around 1300 bc to ad 800 derived approximately 75% of their ancestry from a lineage that is widespread in modern individuals who speak Austronesian, Tai–Kadai and Austroasiatic languages, and that we hypothesize derives from farmers of the Yangtze River Valley. Ancient people from Taiwan also derived about 25% of their ancestry from a northern lineage that is related to, but different from, farmers of the Yellow River Basin, which suggests an additional north-to-south expansion. Fourth, ancestry from Yamnaya Steppe pastoralists arrived in western Mongolia after around 3000 bc but was displaced by previously established lineages even while it persisted in western China, as would be expected if this ancestry was associated with the spread of proto-Tocharian Indo-European languages. Two later gene flows affected western Mongolia: migrants after around 2000 bc with Yamnaya and European farmer ancestry, and episodic influences of later groups with ancestry from Turan.