"Plan" for unicode

What do you want to see in Armagetron soon? Any new feature ideas? Let's ponder these ground breaking ideas...
User avatar
Z-Man
God & Project Admin
Posts: 11255
Joined: Sun Jan 23, 2005 6:01 pm
Location: Cologne, Jabber: [email protected]
Contact:

Post by Z-Man »

Using wchars is the wrong way :) One, wchars are system dependant and often not wide enough to hold all characters, and two, the apparent benefit that one wchar corresponds to one character printed is none, there still are those composed characters (base letter plus accent). Throwing away memory when storing ASCII strings is just a minor detail. Internally working with UTF8 is the right way, it also makes the least amount of work. In case the font library can't handle UTF8, we should convert it to wchars or whatever the library demands at the last possible moment.

Yes, all the current string transfer quirks can be fixed in a backwards compatible way.

User avatar
wrtlprnft
Reverse Outside Corner Grinder
Posts: 1679
Joined: Wed Jan 04, 2006 4:42 am
Location: 0x08048000
Contact:

Post by wrtlprnft »

*bump*

The utf8 branch is in /armagetronad/armagetronad/branches/utf8/.
The font system now treats all incoming text as utf-8 and displays it correctly, the menu system converts input to utf8 too (though SDL returns a 0 as codepoint for chars not in latin-1), and the language files are converted to utf-8.

Still missing the color codes and network translation.
There's no place like ::1

Luke-Jr
Dr Z Level
Posts: 2246
Joined: Sun Mar 20, 2005 4:03 pm
Location: IM: [email protected]

Post by Luke-Jr »

Please don't introduce yet another dependency when it is totally unnecessary. The UNIX98 standard defines the iconv(3) API for translation between different character encodings.

User avatar
wrtlprnft
Reverse Outside Corner Grinder
Posts: 1679
Joined: Wed Jan 04, 2006 4:42 am
Location: 0x08048000
Contact:

Post by wrtlprnft »

It isn't a dependency. It's a 20K header file in src/thirdparty, that's it. It has some nice c++ goodness to make dealing with utf-8 easier, it's NOT just for conversion.
To use iconv to convert from utf-16 to utf-8 I'd basically need to allocate a string three times the length of the original string which would probably be way too big in most cases... We can use iconv for translating between utf-8 and latin-1 though, I guess.
There's no place like ::1

Luke-Jr
Dr Z Level
Posts: 2246
Joined: Sun Mar 20, 2005 4:03 pm
Location: IM: [email protected]

Post by Luke-Jr »

wrtlprnft wrote:It isn't a dependency. It's a 20K header file in src/thirdparty, that's it.
Yes, but how much does the binary size grow? Our binaries are huge, and make certain desirable targets (OpenWrt, handheld gaming devices) impractical.
wrtlprnft wrote:It has some nice c++ goodness to make dealing with utf-8 easier, it's NOT just for conversion.
I have a bunch of really simple dealing-with-utf8 code anyway used for MOO. IIRC, some is already being used.
wrtlprnft wrote:To use iconv to convert from utf-16 to utf-8 I'd basically need to allocate a string three times the length of the original string which would probably be way too big in most cases...
Why? iconv converts one character at a time. For conversion to utf-8, you could use either a char[8] output buffer or malloc a char* with the utf-16 size divided by two and realloc it if you run out of room.

User avatar
Tank Program
Forum & Project Admin, PhD
Posts: 6704
Joined: Thu Dec 18, 2003 7:03 pm

Post by Tank Program »

Luke-Jr wrote:handheld gaming devices
The larger lack of OpenGL on such devices is a bigger issue.
Image

Luke-Jr
Dr Z Level
Posts: 2246
Joined: Sun Mar 20, 2005 4:03 pm
Location: IM: [email protected]

Post by Luke-Jr »

Tank Program wrote:
Luke-Jr wrote:handheld gaming devices
The larger lack of OpenGL on such devices is a bigger issue.
You assume they in fact do lack OpenGL. I know at least one of the modern portable gaming devices has a GL library... Also, I doubt our GL code is the majority-- stripping it and replacing it with something else (2D or such) won't save much binary size, so the issue remains.

User avatar
Z-Man
God & Project Admin
Posts: 11255
Joined: Sun Jan 23, 2005 6:01 pm
Location: Cologne, Jabber: [email protected]
Contact:

Post by Z-Man »

Epic bump :)

I restarted the branch, merging it with trunk and committing the result into a new branch. It was hell. Anyway, now we can finish this. The restarted branch is a full one (with winlibs and stuff, it's zero cost in svn) so it can be mirrored to bzr and merged there.

I'm no longer convinced the current "convert strings on the network layer in the current network system" is the correct approach. It's simply too fragile, one can't know for sure in which format the sender formats its strings in. Instead, I'd say we just stick to sending strings in latin1 in the current network system all of the time. I'm actively thinking about the google pattern buffers thing, that would send utf8 strings with new color codes. If we do things this way, all that's left to do before we can merge is to have the network code always convert to and from latin1 instead of doing it selectively.

Oh, and the font files have diverged and can't be merged. Luke added some characters on the Trunk. I don't know what wrtl did on the branch. What shall we do there?

Edit: because the branch version uses wchar strings to communicate with FTGL, it also isn't affected by the recent silent addition of utf8 "support" there (which apparently can't be disabled and crashes when fed with invalid strings).

User avatar
wrtlprnft
Reverse Outside Corner Grinder
Posts: 1679
Joined: Wed Jan 04, 2006 4:42 am
Location: 0x08048000
Contact:

Post by wrtlprnft »

Z-Man wrote:Oh, and the font files have diverged and can't be merged. Luke added some characters on the Trunk. I don't know what wrtl did on the branch. What shall we do there?
Heh, I also added a couple of new characters there… I'll try to have a look at it.
Edit: because the branch version uses wchar strings to communicate with FTGL, it also isn't affected by the recent silent addition of utf8 "support" there (which apparently can't be disabled and crashes when fed with invalid strings).
That's quite inefficient though because it has to convert all characters everytime a string is rendered… I'd prefer depending on (or possibly including) the newer version and always using utf-8 internally.
There's no place like ::1

User avatar
Z-Man
God & Project Admin
Posts: 11255
Joined: Sun Jan 23, 2005 6:01 pm
Location: Cologne, Jabber: [email protected]
Contact:

Post by Z-Man »

The branch arrived on launchpad: https://code.launchpad.net/~armagetrona ... ronad-work
wrtlprnft wrote:I'd prefer depending on (or possibly including) the newer version and always using utf-8 internally.
I'm cool with that. FTGL is statically linked anyway in our autopackages, so there's no backward compatibility headache there. And apparently, I already changed the include files on the trunk so they require the new version :)

So here's our options dealing with this branch, 0.3.1 and the FTGL crash:
a) stay calm, make it so that 0.3.1 can't link with the broken new FTGL, distribute versions of 0.3.1 statically linked with a good FTGL.
b) on the utf8 branch, make it so that the new FTGL is required, get rid of the conversion to wstrings, always convert to/from latin1 in the netcode, merge it to 0.3.1 (could be a tad difficult because there's changes in utf8 now we don't want in 0.3.1) and release that.
c) same as b), but leave the conversion to wstrings on rendering intact, thus not requiring bleeding edge FTGL.

I'm leaning towards a) and d), which is the same as b), just merging into the trunk instead of 0.3.1.

User avatar
Lucifer
Project Developer & Local Moonshiner
Posts: 8610
Joined: Sun Aug 15, 2004 3:32 pm
Location: Republic of Texas
Contact:

Post by Lucifer »

As far as 0.3.1 is concerned, I say don't merge. If we need to, we can always pinch of 0.3.2 a few weeks after to release utf8 support.
Image

Be the devil's own, Lucifer's my name.
- Iron Maiden

User avatar
wrtlprnft
Reverse Outside Corner Grinder
Posts: 1679
Joined: Wed Jan 04, 2006 4:42 am
Location: 0x08048000
Contact:

Post by wrtlprnft »

The font files are merged now… Please consider the trunk one outdated and don't change it anymore. Adding new characters is pointless, anyways, as the trunk can't display them unless ftgl is messing up.
There's no place like ::1

User avatar
Z-Man
God & Project Admin
Posts: 11255
Joined: Sun Jan 23, 2005 6:01 pm
Location: Cologne, Jabber: [email protected]
Contact:

Post by Z-Man »

Think we should handle FTGL flexibly? I smell people complaining if 0.3.1 doesn't build/work with FTGL >= 2.1.3 and the trunk doesn't build with FTGL < 2.1.3. Shall I try whether one can avoid the uft8->wstring conversion selectively, depending on the version of FTGL compiled against, without too much chaos?

User avatar
wrtlprnft
Reverse Outside Corner Grinder
Posts: 1679
Joined: Wed Jan 04, 2006 4:42 am
Location: 0x08048000
Contact:

Post by wrtlprnft »

if you think it's not a huge hack, knock yourself out :-)
There's no place like ::1

User avatar
Z-Man
God & Project Admin
Posts: 11255
Joined: Sun Jan 23, 2005 6:01 pm
Location: Cologne, Jabber: [email protected]
Contact:

Post by Z-Man »

Oh yeah, another thing: I think we should leave the trunk language files that are also on 0.2.8 in latin1. reason being merge hell, of course. I'll make it so that the loader is flexible.

Post Reply