There seems to be quite a lot of demand for a list of the top 10 languages of the world, but in scouring the internet I have discovered that such top 10 lists are wont to disagree on the precise rankings, although they generally agree on which languages should be included. Aside from the internet I did find one top 10 list in something called a book which I found in my bookcase: The Cambridge Factfinder, edited by linguist David Crystal, the 4th Edition of which gives mother tongue speakers in millions from the "early 1990s" as follows:
However, I think it's far more useful and interesting to look at the total number of speakers, rather than just the mother tongue (Muttersprache or L1) speakers, since it gives you a better idea of the number of people you could potentially talk to in a particular language. However, there are several general problems with compiling such a list, many of which also apply to L1 numbers:
Some of this can, of course, be ironed out by using data on population growth rates, but actually, some languages are getting more and more popular while the use of other languages is declining, and so you are on shaky ground if you resort to this measure.
Below is my own top 10 list, with links to comments about how I arrived at the value (which is in millions). I have consulted a number of sources. When I refer to "Weber", I mean George Weber's article on the 10 Most Influential Languages from 1997. This, as well as the Encarta 1998 data that I refer to, can be found on this nice little page. When I refer to "Leclerc", I mean Jacques Leclerc's website, which I have found very useful. I have also used data from the CIA World Factbook. When I refer to the "1989 source", I mean a list that appears near the end of this book by Barry Farber. I also make reference to the Eurobarometer survey on the language skills of people in the EU.
I have put Arabic, Bengali and Portuguese in that order because that's the order they were in in the list by native speaker populations.
German and Japanese would come next, but I'm not sure which has the higher figure out of these two! I will leave that as an exercise for the reader.
Getting data for the Muttersprache population is quite easy - it's about 420 million - but getting estimates for the number of non-native speakers is extremely difficult. The estimates vary wildly from about 300 to 1100 million. The above-mentioned general problems are all represented in English: lack of reliable, up-to-date information for non-native speakers, no way of knowing which varieties of English are included or should be included (most notably, Nigerian Pidgin English - include it, or not?), and no way of knowing just how well someone can speak the language. After all, a large number of people in tourist areas will have learnt some sort of tourist English that allows them to sell things, but these people might be out of their depth in a different situation. So should they be counted?
After doing a lot of research, I think it's very likely that the total number of English speakers is at least in the region of 1000 million, but the uncertainties are great. In Africa, English may be the official language of several countries, but it's surprising how much the take-up of English varies. The literacy rates and school drop-out rates vary considerably, and a great number of creoles and pidgins complicate the matter entirely. The Eurobarometer survey reports that 38% of the EU25 region can speak English well enough to have a conversation (excluding the native speakers) - so that makes about 175 million, but it also says that 30% of respondents said that their knowledge of English was "basic". Obviously some people will have underestimated their own ability, and some of them may have overestimated their ability, so the water just gets muddier and muddier. A poster from the Goethe-Institut claims that 75% of the world speaks no English, so that would give 1675 million English speakers. Let's go with 1500 million and be done with it.
The problem here is similar. We must realise that "Chinese" refers to a so-called "macrolanguage" - that is to say, a family of languages descended from one earlier language. One of these languages is Mandarin Chinese, and it happens to be the largest and most prestigious of the Chinese languages, although it is by no means true that everyone in China speaks it. The Chinese languages are also not the only languages of China - others, such as Zhuang, Hmong, Yi, Uyghur and Mongolian, exist - but Chinese languages do form a majority. Anyway, we must be sure only to accept estimates that refer to Mandarin, and not to "Chinese", which is ambiguous.
As for the figure that we're actually going to use, we can observe that the number of speakers will certainly be more than the 885 million given in the Crystal estimate above. The population of China according to the CIA July 2008 estimate is 1330 million. The Singapore, Taiwan and diaspora populations are probably not likely to have a big impact on the final number. Weber claims 1.1 billion first language and 20 million second language speakers, roughly corroborated by this, but that doesn't have a definitive date for the estimate. But if Weber's estimate is for 1997, it's only 210 million short of the 2008 estimate for the total population of China, which seems a bit suspect. Even so, let's go for a figure of 1100 million.
The complication here is already apparent by the section title. Hindi and Urdu are the same language at their core, but there are a few distinctions: (i) in higher registers, Hindi borrows terminology mainly from Sanskrit; Urdu, from Persian/Arabic, (ii) Hindi is written in the Devanagari script; Urdu, in the Perso-Arabic script, (iii) Hindi is used chiefly by Hindus; Urdu, by Muslims, (iv) they have a different name. But the problems don't end there, because there is also a problem with dialects of Hindi, some of them being wildly different from standard Hindi. The problem is probably worse than with other cases of this because the number of people speaking a particular dialect is usually huge, so its inclusion or exclusion does have a significant effect on the final figure. The Census of India, although commendable for its delightful inclusion of a state-by-state breakdown of the number of people who speak a given language, still lumps together quite a lot of Hindi dialects - notably Rajasthani dialects and Bihari dialects, which are purportedly very different. This is compounded, however, by the fact that many people in Rajasthan, Bihar and elsewhere are bilingual in their native dialect and in standard Hindi (diglossia again). Now, going by the wonderful census, it seems that Urdu is used by Muslims all over India, but Urdu is mostly recognised as the language of Pakistan, and here we come to another complication, because it turns out that Urdu is a minority Muttersprache in Pakistan (Punjabi being the largest Muttersprache [45%]), but it is known as a second language. But how many people know it as a second language? Such information is very difficult to obtain, and the answer could be quite crucial, since Pakistan's population is enormous.
And the number we'll use? Well, the Indian census of 2001 reported 422 million Hindi speakers and 51.6 million Urdu speakers. It seems that Rajasthani and Bihari account for around 90 million of the 422 million, but let's assume that they all speak Hindi as a second language anyway. Ethnologue does estimate 120 million second language speakers, citing the 1999 World Almanac. The Encarta 1998 estimate for native Hindi speakers is 333 million, which probably fits, and the total speaker estimates for Hindi and Urdu in the 1989 source are 352 and 92 million respectively. Strangely, in 1997, Weber gives a combined estimate of 250 million Hindi and Urdu speakers and simply doesn't know how many second language speakers there are. If we take 90 million away from 422 and then add on 120, assuming that the Bihari and Rajasthani bilinguals are therein included, and then add on a very conservative 100 for Urdu (given the population growth), we come to 552 million, so let's round that up to 560 million and call it a day.
Again, Spain is very much overshadowed by Hispanic non-Spain in terms of population. When it comes to it, however, there are few complications here, but the numbers still vary. The 1989 total estimate is 341 million; Encarta 1998 gives 332 million for native speakers; Weber gives 300 million and another 20 million second. At first, it's tempting to think that all of South and Central America except for Brazil is Spanish-speaking, but actually there's French Guiana, Guyana (English), Belize (slight English majority over Spanish) and Suriname (no majority really) and Leclerc cites 1992 statistics purporting that only 55.2% of Paraguay speaks Spanish at all. The situation in Bolivia is not clear, but it does have sizeable Quechua and Aymara populations. There are also about 30 million Spanish speakers in the USA, and Eurobarometer thinks about 30 million able-to-hold-a-conversation-ers in the EU, which leads me to think that we are looking at at least 400 million total speakers nowadays. This article from 2007 even goes as far as to say "...dentro de diez años serán más de 30 millones de brasileños los que hablen español y se sumen así a los actuales 500 millones de hispanohablantes en Latinoamérica y España. Lo cual lo que lo sitúa como cuarta lengua del mundo después del chino, el inglés y el indio." So let's welcome Brazilian Hispanophones to the party and call it 430 million.
It's not too difficult to count the number of Russian speakers in the region that was formerly the USSR. There are some speakers outside this region, and that's where the problems start, although they are not too significant. Russian speakers are apparently declining overall, even in Russia (which has a shrinking population).
Weber gives 160 million native speakers and 125 million second language speakers. The 1989 source gives 293 million total, which is OK, because we expect a decline since the Union's collapse. This article gives 163.8 native and 114 million second, which makes 277.8 million, so let's say 275 million and call it quits.
This one doesn't appear in top 10s or even in top 20s if they're going by native speakers, because we have another instance of diglossia. But the extent to which people are native in Indonesian or natively bilingual and the extent to which it is their second language is contestable, if you're growing up in a bilingual environment. There is no definitive source for the information, but it seems that just about everyone in Indonesia does speak the language, which is, as it happens, practically the same as the Malay language. Going by CIA 2008 estimates again, Indonesia has a population of 237.5 million; Malaysia, 25 million; Brunei, 0.38mil; and there are 0.65 million Malay speakers in Singapore. If we assume that they all know Malay/Indonesian, this makes about 263 million. Let's round that down to 250 million, since it's probably an overestimate.
The problem here is that there are many varieties of Arabic. As an official language, "Arabic" refers to "Modern Standard Arabic" (MSA), which is a Zweitsprache for anybody who knows it. In the Arab world there is a situation known as diglossia, where the people speak one language in their day-to-day affairs, but are bilingual in MSA, which is used in literature and the media and for communication between people who natively speak different versions of Arabic. (Imagine if everyone in Francophonic countries, Spain, Portugal, Italy and Romania all learnt Latin, and that would be a similar situation, including the fact that French, Spanish, Portuguese, Italian and Romanian are different languages but all descended from Latin.) Note, however, that speakers of different Arabic languages generally consider these languages to be dialects of Arabic, but we need to be as consistent as possible when selecting what counts as a single item in the list, so we will be sure to count only people who know MSA, something that often isn't made clear in other lists.
Weber claims that there are 200 million speakers of "Arabic" and 21 million second language speakers. Now, if we look at the CIA World Factbook's population estimates for the relevant countries, we can see that there are about 330 million people living in the major Arabic countries. Unfortunately, not all of them //do// know MSA, this being dependent on their education, since you do have to //learn// it specifically. The Ethnologue claims that 100.5 million people in Arab states do not know MSA. Ordinarily we'd ignore this, but if we take 100 away from 330 we get 230, which meshes nicely with Weber's estimates and with most other figures that you generally find for Arabic speakers.
The only slight complication here is the presence of Portuguese-based creoles, such as Crioulo spoken in ~Guinea-Bissau, but with a total population of about 1.5 million, it's hardly going to make a big difference. As it stands, Brazil is the major contributor to the figure for Portuguese (or maybe we should start calling it Brazilian?!), but, together, Portugal, Angola and Mozambique do make a sizeable contribution, while ~Guinea-Bissau, São Tomé e Príncipe, Cape Verde and East Timor are possibly not worth counting. This site is not alone in quoting 240 million as the total figure. However, if we add up the //total// populations for Brazil, Portugal, Angola, Mozambique and ~Guinea-Bissau, going by the CIA 2008 estimates, we get to around 240 million, but this is an overestimate; yes, Portuguese accounts for practically 100% in the first two, and very, very high in Angola, but I've seen estimates of 27% (CIA) and 40% (National Institute of Statistics, via Wikipedia) for Mozambique, and 10% (Leclerc) for ~Guinea-Bissau. But the estimates from Ethnologue, Encarta 1998 and Weber are all below the current population of Brazil, so we can't use them reliably at all. Eurobarometer gives us 0% for knowledge of Portuguese amongst non-natives in ~EU25, and I don't think Galician is included somehow. The result of this debate is crucial, of course, because if we use 240 million, then Portuguese just about wins over Bengali. If we go by the population data and use 40% for Mozambique, then we get to about 230 million, so let's use that for now.
Bengali is the national language of Bangladesh and the state language of West Bengal, in India. Statistics for both aren't too difficult to come by, and so the only complication is with two "dialects" of Bengali called Chittagonian and Sylheti. Should we include them? And if not, then how are we supposed to know how many people to subtract, since we cannot guarantee that they are all bilingual in Bengali?
Weber estimated 185 million (1997) and the Encarta 1998 estimate is 189 million. Ethnologue again cites the 1999 World Almanac with a figure of 211 million but this time it includes second language speakers. Now, the Indian census tells us that 8.1% of the population of India spoke Bengali in 2001, and if we assume that the percentage remained the same, we can use CIA's 2008 population estimate to get 93 million Bengali-speaking Indians (the population of West Bengal is about 80.5 million though). It also estimates that Bangladesh has a population of 153.5mil. Adding those together gives 246.5 million, which suggests that the estimate of 230 million provided here is probably about right. Given that the assumption that 100% of Bangladesh speaks Bengali is probably wishful thinking, let's go with 230 million.
French gets promoted from its rather shameful 14th-ish place going by native speakers, and arrives in the top 10 by total speakers by virtue of its second language speakers. Again, knowledge of French in African countries where it is an official language varies quite considerably, with Cameroon (~78%), Morocco and the Democratic Republic of the Congo (~24 million) being quite good, but Burkina Faso, Senegal and Guinea having pretty small numbers of Francophones, but it's still difficult to get reliable estimations for the usual reasons. Leclerc gives a rather ambiguous account of the situation in Côte d'Ivoire: "On estime qu'environ les deux tiers de la population âgée de six ans et plus pratique «une forme de français»". Incidentally, it also appears that men are more likely to learn a colonial language as a vehicular tongue than women; Leclerc reports, for example, 15-20% of men in Senegal knowing French, but only 1-2% of women. Anyway, the 2006-07 estimate of the Haut Conseil de la Francophonie gives 200 million as the total number of French speakers, stating that this includes 72 million "francophones partiels", which seems to indicate French speakers with limited language skills. There also seems to be a vague estimate of "francisants et d’apprenants de français" in the region of 100 million. The Eurobarometer survey says that 14% of the EU25 - i.e. about 64 million people - know French well enough for a conversation (about what though? Nuclear physics?) and 46% said their knowledge of French was basic. Now, I'd sooner expect an overestimate from La Francophonie than an underestimate, but it seems that their figures are admirable for distinguishing between different classes of francophones, though this does make the task of selecting a definitive figure quite difficult. Let's just say 200 million and leave it at that.