c++ - Swedish characters don't compare correctly -
for reason if/else statements isn't working correctly me in c++
the problem when variabel equal right (höger), won't output if statement, instead go on else statement. if replace letter 'ö' 'o' becomes 'hoger' instead, if statement work. whenever write word 'höger' won't go if statement, instead go else statement. if make variabel equal 'hoger', , write 'hoger', work. how can make possible writing 'höger' if statement recognizes instead? it's if swedish letters don't work.
my code this:
#include <iostream> #include <string> using namespace std; int main() { setlocale(lc_all,""); string test; // define variabel cout << " höger elle vänster"<<endl; // right or left cin >> test; if(test == "höger") { // if right, output this. cout <<"du valde höger"<<endl; } else if(test == "vänster") { // if left, output cout <<"du valde vänster"<<endl; } else { // } }
the problem encodings.
the c/c++ language specs not automatically handle other 7 bit ascii. o-umlaut character outside range, , exact behaviour depends on encoding of source code file.
the possibilities iso 8859-1, windows ansi-1252, utf-8 or windows oem 850. first 2 encode character same, in each of others different.
with bit more information encoding , tool set using may possible provide more specific diagnosis , advice.
[and way, if/else statements in c/c++ work fine, thank you.]
if assume moment windows , visual c++, you're dealing with.
- source code written inside visual studio: code page 1252. code point o-umlaut character 0xf6.
- keyboard input read console: code page 850. code point o-umlaut character 0x94.
obviously not match. however, visual studio can quite happily edit source code files in many encodings including utf-8 (with byte mark), utf-16 (wide characters) , code page 850. so:
- source code written inside visual studio: code page 850. code point o-umlaut character 0x94. works.
you can change code page console using chcp command.
- change console chcp 1252 , works.
the behaviour of compiler when reading source code obliged standard consistent execution character set. see n3797 s2.2.5:
each source character set member in character literal or string literal, each escape sequence , universal-character-name in character literal or non-raw string literal, converted corresponding member of execution character set
s2.3/3:
the basic execution character set , basic execution wide-character set shall each contain members of basic source character set, plus control characters representing alert, backspace, , carriage return, plus null character (respectively, null wide character), representation has 0 bits. each basic execution character set, values of members shall non-negative , distinct 1 another. in both source , execution basic character sets, value of each character after 0 in above list of decimal digits shall 1 greater value of previous. execution character set , execution wide-character set implementation-defined supersets of basic execution character set , basic execution wide-character set, respectively. values of members of execution character sets , sets of additional members locale-specific.
n3797 s2.14.3/1:
a character literal not begin u, u, or l ordinary character literal, referred narrow-character literal. ordinary character literal contains single c-char representable in execution character set has type char, value equal numerical value of encoding of c-char in execution character set.
n3297 s2.14.5/6:
a string literal not begin encoding-prefix ordinary string literal, , initialized given characters.
the execution character set implementation-defined. microsoft's statement reqarding implementation-defined behaviour c compiler here: http://msdn.microsoft.com/en-us/library/hx3yt8af.aspx. [i can't find separate 1 c++, assume applies both.]
the source character set set of legal characters can appear in source files. microsoft c, source character set standard ascii character set.
sorry language-lawyer stuff, says msvc compiler independent of locale/encoding , implements 8-bit ascii, code page unspecified. standard library functions may need know encoding various purposes, whole other story.
as final point, microsoft c compiler dates around 30 years, since before windows. has been possible write source code in code page 850 , have run correctly on console, subject careful handling of extended (8-bit) characters. many people still do. problem here source code written in windows-ansi or unicode , keyboard input oem (cp850) console. change either 1 work correctly.
Comments
Post a Comment