HTTP_ACCEPT_LANGUAGE, Accept Language, php and the parsing

14 March 2013, Comments: 0

You are about to implement or understand a (php) web site which is working with multiple languages (aka multilingual support). But there are still some open questions about how to understand or parse the by the browser delivered accepted language.

In fact, on the site for reserved variables on php.net we can figure out, that the variable HTTP_ACCEPT_LANGUAGE is parsed from the content of the – by the HTTP protocol delivered – “Accept-Language:” header.

Let’s say you wrote an application and you echo the variable $_SERVER['HTTP_ACCEPT_LANGUAGE']. As a result you could find different results from different browsers. Some browsers will deliver such string as a request:

de,en-us;q=0.7,en;q=0.3

Other browsers will deliver other results such as:

de-DE,de;q=0.9,en;q=0.8

Or even:

es-us

But what does it all mean?

Let’s take a deeper look. Sure the 2 digit country code ‘de‘ stands for German, ‘en‘ for English and ‘es‘ for Spanish. Nevertheless we still don’t know what the combo ‘de-DE‘, ‘es-us‘ and the weird number q=0.9 means, or even why sometime we have appearantly one language given in a non redundant manner by once e.g. the ‘de‘ and second by a ‘de-DE‘ code. Moreover why there are commas and semi colons and what do they mean?

I did a little research and of course, hidden through the jungle of RFCs you might find an answer.

Of course as you might expect, we can find the answer in the RFC2616, titled: “Hypertext Transfer Protocol — HTTP/1.1″ sec14.4:

“The Accept-Language request-header field is similar to Accept, but restricts the set of natural languages that are preferred as a response to the request.” Hereby we can find the following definition:

     Accept-Language = "Accept-Language" ":"
                         1#( language-range [ ";" "q" "=" qvalue ] )
       language-range  = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) | "*" )

By the given explanation that,

da, en-gb;q=0.8, en;q=0.7

would mean: “I prefer Danish, but will accept British English and other types of English.” we can figure out, that we have the language once in a general and the other time in a regional definition. Thus the comma “,” is then actually separating the different language ranges.

In fact, by using:

explode(',' $_SERVER['HTTP_ACCEPT_LANGUAGE']

We could get each language range and then parse each one.

But still then, what about the q? In the definition above, we find ‘q’ defined as ‘qvalue’, separated to the language-range with an ‘;’. The answer lies in the following:

“Each language-range MAY be given an associated quality value which represents an estimate of the user’s preference for the languages specified by that range. The quality value defaults to “q=1″.

In fact, we could say, that the quality value aka qvalue the with q=1 is the strongest and most imporant value. It’s the default. And as we have seen, that all other numbers smaller then 1 are less important, but also accepted by the browser. That means we “could” say, that in the example above the British English is to 80% considered as being the right language for the user and General English by 70%. Of course, since the browser settings are delivering those values and the user might never have seen or touched – those settings, might be a good start to use this value for your application. Moreover the fluent French and Spanish speaking user could have also lend the computer from his Portuguese friend or have been in an internet café  in Turkey – then those values wouldn’t give us any valuable information at all. So better give the user the chance to select another language on your website.

I hope this is valuable content. Since I just did not find a good explaination directly regarding to php about the exact information on the internet, what the ‘q’ value means and people just estimated what this value meant.

Leave a Reply