krb5 commit: Modernize UTF-8/UCS-2 conversion code

Discussion:

Stefan Metzmacher

2017-04-17 19:24:40 UTC

Hi Robbie,

is there a reson why you still limit this to USC-2 and not use full UTF16
support?

We had a lot of trouble with the limited unicode support of krb5 libraries
in Samba recently, see https://bugzilla.samba.org/show_bug.cgi?id=12262

For now we worked arround it in Samba by limiting random machine passwords
to unicode codepoints up to 0xffff.

metze

https://github.com/krb5/krb5/commit/c4e8d444632140ecb47f31df133c0657f07f9be0
commit c4e8d444632140ecb47f31df133c0657f07f9be0
Date: Thu Apr 6 12:15:39 2017 -0400
Modernize UTF-8/UCS-2 conversion code
Remove unused entry points as we only need to convert between
little-endian UCS-2 byte buffers and UTF-8. Rename and simplify the
remaining two function contracts. Avoid pointer alignment and
endianness issues by operating on byte buffers and using store_16_le()
and load_16_le(). Avoid two-pass operation using k5buf.
names and contracts; rewrote commit message]

Robbie Harwood

2017-04-17 19:37:16 UTC

Permalink

Post by Stefan Metzmacher
is there a reson why you still limit this to USC-2 and not use full UTF16
support?

I'm not sure what you're asking. The commit in question fixes several
code hygene problems; it does not add any new interfaces, and the only
fixes are related to memory safety.

Post by Stefan Metzmacher
We had a lot of trouble with the limited unicode support of krb5 libraries
in Samba recently, see https://bugzilla.samba.org/show_bug.cgi?id=12262
For now we worked arround it in Samba by limiting random machine passwords
to unicode codepoints up to 0xffff.

I guess the answer is that I didn't know there was a bug? Have you
filed something in the krb5 bugtracker? I don't see any linked from
your bugzilla, just the workaround.

Greg Hudson

2017-04-17 19:53:52 UTC

Permalink

Post by Stefan Metzmacher
is there a reson why you still limit this to USC-2 and not use full UTF16
support?

This commit was just trying to eliminate warnings, not change the
functionality.

This code originally came from OpenLDAP's utf-8-conv.c, and was
incorporated with the goal of Windows compatibility (for PACs and RC4
string-to-key). If Windows now supports UTF-16 then we could consider
extending it, but I wouldn't want to do so without confirmation of that.

Post by Stefan Metzmacher
We had a lot of trouble with the limited unicode support of krb5 libraries
in Samba recently, see https://bugzilla.samba.org/show_bug.cgi?id=12262

Thanks for the pointer.
_______________________________________________
krbdev mailing list ***@mit.edu
https://mailman.mit.edu/mailman/listinfo/krbdev

Andrew Bartlett

2017-04-17 21:25:16 UTC

Permalink

Post by Greg Hudson

Post by Stefan Metzmacher
is there a reson why you still limit this to USC-2 and not use full UTF16
support?

This commit was just trying to eliminate warnings, not change the
functionality.
This code originally came from OpenLDAP's utf-8-conv.c, and was
incorporated with the goal of Windows compatibility (for PACs and RC4
string-to-key). If Windows now supports UTF-16 then we could
consider
extending it, but I wouldn't want to do so without confirmation of that.

Windows has been UTF-16 for quite a long time now.

Andrew Bartlett

--
Andrew Bartlett http://samba.org/~abartlet/
Authentication Developer, Samba Team http://samba.org
Samba Developer, Catalyst IT http://catalyst.net.nz/services/samba

_______________________________________________
krbdev mailing list ***@mit.edu
https://mailman.mit.

Stefan Metzmacher

2017-04-18 08:32:06 UTC

Permalink

Post by Greg Hudson

Post by Stefan Metzmacher
is there a reson why you still limit this to USC-2 and not use full UTF16
support?

This commit was just trying to eliminate warnings, not change the
functionality.

Ok, "Modernize UTF-8/UCS-2 conversion code" indicated a more or less
rewrite of the code to me. As it's not you may just ignore this topic
for now.

Post by Greg Hudson
This code originally came from OpenLDAP's utf-8-conv.c, and was
incorporated with the goal of Windows compatibility (for PACs and RC4
string-to-key). If Windows now supports UTF-16 then we could consider
extending it, but I wouldn't want to do so without confirmation of that.

I think Windows always supported UTF-16 instead of USC-2, at least it
did for the last 10 years (at least for passwords).

Post by Greg Hudson

Post by Stefan Metzmacher
We had a lot of trouble with the limited unicode support of krb5 libraries
in Samba recently, see https://bugzilla.samba.org/show_bug.cgi?id=12262

The core of the problem is that Windows uses an random buffer as machine
password,
and interprets it as UTF16 (which is not always valid UTF-16).
The MD4 of the raw buffer is the NTHASH/RC4-key.
In order to calculate the AES-keys the buffer is converted to (valid)
UTF-8 first.
The following rules are applied to the raw buffer first:
1) any 0x0000 characters are mapped to 0x0001
2) convert any instance of 0xD800 - 0xDBFF (high surrogate)
without an immediately following 0xDC00 - 0x0xDFFF (low surrogate) to
U+FFFD (OBJECT REPLACEMENT CHARACTER).
3) the same for any low surrogate that was not preceded by a high surrogate.

Windows stores the raw buffer as master copy of the password,
while Samba currently stores the UTF-8 password.

If we pass the UTF-8 to the krb5 libraries we may calculate a different
NTHASH,
because the transition from the random (UTF-16-like) buffer to valid UTF-8
and back to valid UTF-16 looses information.

So our current solution is limiting the machine password to valid utf-8,
with codepoints up to 0xffff.

In future we may try to do pass a keytab instead of password to the krb5
libraries
and calculate the keys ourself in order to avoid these problems.

So there's currently no immediate action required.

metze

Greg Hudson

2017-04-18 15:49:43 UTC

Permalink

Post by Stefan Metzmacher
Ok, "Modernize UTF-8/UCS-2 conversion code" indicated a more or less
rewrite of the code to me. As it's not you may just ignore this topic
for now.

Yeah, in this case "Modernize" referred to coding standards, not
functionality. We're not always consistent about that (e.g. a recent
commit "Modernize UTF-8 conversions" was about the maximum Unicode code
point), but we can't always be perfectly clear in the summary line as we
need to keep it short.

I have written up UTF-16 support for these conversion functions, but
we'll need test cases in order to be confident that they work. Do you
or anyone else know of a body of test cases that I might crib from?

Post by Stefan Metzmacher
I think Windows always supported UTF-16 instead of USC-2, at least it
did for the last 10 years (at least for passwords).

Wikipedia thinks it added UTF-16 support in Windows 2000, so yeah, quite
some time.

Post by Stefan Metzmacher
The core of the problem [...]

Thanks for that explanation.
_______________________________________________
krbdev mailing list ***@mit.edu
https://mailman.mit.edu/mailman/listinfo/krbdev

Simo Sorce

2017-04-25 17:31:25 UTC

Permalink

Post by Stefan Metzmacher

Post by Greg Hudson

Post by Stefan Metzmacher
is there a reson why you still limit this to USC-2 and not use full UTF16
support?

This commit was just trying to eliminate warnings, not change the
functionality.

Ok, "Modernize UTF-8/UCS-2 conversion code" indicated a more or less
rewrite of the code to me. As it's not you may just ignore this topic
for now.

I think Windows always supported UTF-16 instead of USC-2, at least it
did for the last 10 years (at least for passwords).

Post by Greg Hudson

Post by Stefan Metzmacher
We had a lot of trouble with the limited unicode support of krb5 libraries
in Samba recently, see https://bugzilla.samba.org/show_bug.cgi?id=12262

The core of the problem is that Windows uses an random buffer as machine
password,
and interprets it as UTF16 (which is not always valid UTF-16).
The MD4 of the raw buffer is the NTHASH/RC4-key.

Would it make sense to provide an alternative interface to calculate the
RC4-key that takes a random buffer instead of utf-8, so we do not have
to deal with awkward conversions or restricting how we generate this
buffer ?

Simo.

Post by Stefan Metzmacher
In order to calculate the AES-keys the buffer is converted to (valid)
UTF-8 first.
1) any 0x0000 characters are mapped to 0x0001
2) convert any instance of 0xD800 - 0xDBFF (high surrogate)
without an immediately following 0xDC00 - 0x0xDFFF (low surrogate) to
U+FFFD (OBJECT REPLACEMENT CHARACTER).
3) the same for any low surrogate that was not preceded by a high surrogate.
Windows stores the raw buffer as master copy of the password,
while Samba currently stores the UTF-8 password.
If we pass the UTF-8 to the krb5 libraries we may calculate a different
NTHASH,
because the transition from the random (UTF-16-like) buffer to valid UTF-8
and back to valid UTF-16 looses information.
So our current solution is limiting the machine password to valid utf-8,
with codepoints up to 0xffff.
In future we may try to do pass a keytab instead of password to the krb5
libraries
and calculate the keys ourself in order to avoid these problems.
So there's currently no immediate action required.
metze
_______________________________________________
https://mailman.mit.edu/mailman/listinfo/krbdev

--
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

_______________________________________________
krbdev mailing list ***@mit.edu
https://mailman.mit.edu/mailman/listinfo/krbdev

Stefan Metzmacher

2017-04-26 06:27:02 UTC

Permalink

Post by Simo Sorce

Post by Stefan Metzmacher

Post by Greg Hudson

Post by Stefan Metzmacher
is there a reson why you still limit this to USC-2 and not use full UTF16
support?

This commit was just trying to eliminate warnings, not change the
functionality.

Ok, "Modernize UTF-8/UCS-2 conversion code" indicated a more or less
rewrite of the code to me. As it's not you may just ignore this topic
for now.

I think Windows always supported UTF-16 instead of USC-2, at least it
did for the last 10 years (at least for passwords).

Post by Greg Hudson

Post by Stefan Metzmacher
We had a lot of trouble with the limited unicode support of krb5 libraries
in Samba recently, see https://bugzilla.samba.org/show_bug.cgi?id=12262

The core of the problem is that Windows uses an random buffer as machine
password,
and interprets it as UTF16 (which is not always valid UTF-16).
The MD4 of the raw buffer is the NTHASH/RC4-key.

I think using a client side keytab is the way to avoid the conversation.

metze