[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [cobalt-users] last msg on .med tld



DK> Date: Thu, 27 Feb 2003 13:52:27 -0500
DK> From: Dan Kriwitsky


DK> I thought those non-ASCII domains translated to ASCII? IIRC,
DK> someone figured out how it would work before they were
DK> released and registered some nonsensical ASCII domain name
DK> that translated into some popular Chinese word.

My recollection (might be good to read up on IDNA again) is that
NetSol captured "invalid" .com and .net queries at the gTLD
servers, then returned a special A RR with a plugin for download.
IDNA was debated before release...

IMHO, the biggest problem in encoding is case insensitivity; DNS
itself can carry eight-bit payloads.  Remember, a DNS label is:

* A length between 0 and 63 (0x00 and 0x3f), followed by that
  number of characters.  This is why domain names have a 63-char
  limit.

* A special code between 64 and 255 (0x40 and 0xff) indicating
  other action.  For instance, DNS's primitive compression uses
  0xMM 0xNN, where MM is 0xc0-0xff and NN is any valid byte.

The actual contents of the label can be any valid character --
even a null byte.  UTF-8, Unicode, UCS-4, or any other encoding
_could have been_ used at the protocol level.  But how do we know
what chars are UC/lc variants of each other?  Remember, libc is
for ASCII-encoded text.

I think you're right that IDNA specifies some special escape
sequence and encoded characters, all in ASCII, simplifying case
comparison.  When NetSol (.com/.net gTLD operator) receives
non-ASCII characters, then send the client to a special "download
a patch for your browser" kludge page.

This does nothing for other protocols, other gTLD operators, et
cetera, though.

Like I said, maybe I should review the details of the encoding.
That would make everything perfectly clear.  Even without reading
up, it's obvious that different protocols now are using different
encodings.  Ughh.

I contend the _right way_ would have been to adopt a standard
encoding across all protocols.  If daemons need libc replacement
code, so be it.  IDNA wasn't deployed overnight; software authors
would have had time to implement a standard encoding, and patches
could have been released for older OSes.

Compatibility is nice, but I view IDNA as added complexity just
to ensure the most backwards and broken of software will
interoperate.

End rant.


Eddy
--
Brotsman & Dreger, Inc. - EverQuick Internet Division
Bandwidth, consulting, e-commerce, hosting, and network building
Phone: +1 (785) 865-5885 Lawrence and [inter]national
Phone: +1 (316) 794-8922 Wichita

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 21 May 2001 11:23:58 +0000 (GMT)
From: A Trap <blacklist@xxxxxxxxx>
To: blacklist@xxxxxxxxx
Subject: Please ignore this portion of my mail signature.

These last few lines are a trap for address-harvesting spambots.
Do NOT send mail to <blacklist@xxxxxxxxx>, or you are likely to
be blocked.