Negative caching of unknown principals

Discussion:

David Woodhouse

2014-07-31 20:52:50 UTC

There are certain sites on our Intranet where unless you run 'kdestroy'
before pointing firefox at them, you end up waiting literally *minutes*
for pages to load.

Because it uses multiple connections (and because there are scripts
doing lots of little fetches), it's going back to the KDC over and over
and over again to attempt to obtain a ticket for the *same* server, and
quite predictably getting KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN every time.

This horrid proof-of-concept hack makes Firefox load it in about the
same time whether we have a Kerberos TGT or not.

I know this has been mentioned in the past, and kind of went nowhere
after a discussion about how we could maintain a negative cache in
persistent storage. But it's not entirely clear that's necessary; an
in-memory cache like this one is perfectly sufficient. A single failure
from any individual client is fine; it's the dozens of consecutive
attempts which are a real problem.

diff --git a/src/lib/gssapi/krb5/init_sec_context.c b/src/lib/gssapi/krb5/init_sec_context.c
index dc47053..d463314 100644
--- a/src/lib/gssapi/krb5/init_sec_context.c
+++ b/src/lib/gssapi/krb5/init_sec_context.c
@@ -113,6 +113,14 @@
at some point */
int krb5_gss_dbg_client_expcreds = 0;

+static struct bad_server {
+ krb5_gss_name_t server;
+ krb5_error_code code;
+ /* Expire? */
+ struct bad_server *next;
+} *bad_servers = NULL;
+static k5_mutex_t bad_servers_lock = K5_MUTEX_PARTIAL_INITIALIZER;
+
/*
* Common code which fetches the correct krb5 credentials from the
* ccache.
@@ -126,9 +134,10 @@ static krb5_error_code get_credentials(context, cred, server, now,
krb5_timestamp endtime;
krb5_creds **out_creds;
{
- krb5_error_code code;
+ krb5_error_code code = 0;
krb5_creds in_creds, evidence_creds, *result_creds = NULL;
krb5_flags flags = 0;
+ struct bad_server *bad;

*out_creds = NULL;

@@ -139,6 +148,17 @@ static krb5_error_code get_credentials(context, cred, server, now,

assert(cred->name != NULL);

+ k5_mutex_lock(&bad_servers_lock);
+ for (bad = bad_servers; bad; bad = bad->next) {
+ if (kg_compare_name(context, server, bad->server)) {
+ printf("Already bad\n");
+ code = bad->code;
+ break;
+ }
+ }
+ k5_mutex_unlock(&bad_servers_lock);
+ if (code)
+ goto cleanup;
/*
* Do constrained delegation if we have proxy credentials and
* we're not trying to get a ticket to ourselves (in which case
@@ -194,6 +214,21 @@ static krb5_error_code get_credentials(context, cred, server, now,

code = krb5_get_credentials(context, flags, cred->ccache,
&in_creds, &result_creds);
+ if (0 && code == KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN) {
+ bad = malloc(sizeof(*bad));
+ if (!bad)
+ goto cleanup;
+ if (kg_duplicate_name(context, server, &bad->server)) {
+ free(bad);
+ goto cleanup;
+ }
+ bad->code = code;
+ k5_mutex_lock(&bad_servers_lock);
+ bad->next = bad_servers;
+ bad_servers = bad;
+ k5_mutex_unlock(&bad_servers_lock);
+ goto cleanup;
+ }
if (code)
goto cleanup;

--
dwmw2

Greg Hudson

2014-08-01 21:25:19 UTC

Permalink

Post by David Woodhouse
There are certain sites on our Intranet where unless you run 'kdestroy'
before pointing firefox at them, you end up waiting literally *minutes*
for pages to load.

I agree that we ought to do something about this. (We also ought to do
something about the related HTTP case where krb5 authentication is
successful, but credential delegation is enabled and we fetch a separate
TGT for every connection.) I also agree that an in-memory negative
cache is probably the right answer. A few notes:

OS X through 10.6 had negative cache entries represented in the ccache.
This could be disruptive at times; you could fail to authenticate to a
principal after it was created, and not be able to do much about it
besides re-running kinit to through out your ccache.

OS X 10.7+ has an in-memory cache implemented in the krb5
gss_init_sec_context. I don't see any evidence that the cache entries
expire, but it does use an OS X specific (or maybe Mach-specific)
notification mechanism to invalidate the negative cache whenever any
ccache changes within the kcm daemon or when the system clock changes.
That isn't portable, but expiring negative cache entries after five
seconds or so might be sufficient.

A negative cache could interfere with an administrative operation which
checks for the existence of a principal, creates it if it doesn't exist,
and then uses it. An in-memory negative cache wouldn't affect such a
procedure if it is implemented by script, but would if the existence
test and the use of the principal are all within the same process. I
think the real-life HTTP cases are more important than this
hypothetical, but we would want some programmatic way of disabling the
negative cache.

If we implement a negative cache in libkrb5, I would prefer to hang it
on the krb5_context than to use mutex-locked global state. But that
wouldn't help the HTTP use case unless we also implement per-thread krb5
contexts for GSSAPI--something I've wanted to do for a while.

If we implement a negative cache in the krb5 GSS mech, then it pretty
much needs to live in global state; there isn't anything else to hang it
off of.

Nico Williams

2014-08-01 21:46:27 UTC

Permalink

IMO a negative cache belongs in the ccache, with some TTL, and with
kvno(1) always (or optionally) ignoring NAKs.

Simo Sorce

2014-08-02 12:01:26 UTC

Permalink

Post by Nico Williams
IMO a negative cache belongs in the ccache, with some TTL, and with
kvno(1) always (or optionally) ignoring NAKs.

I agree you want to avoid all involved processes in a script to see
negative caches.
And perhaps add a kdestroy switch that just remove negative entries ?
This would make it possible for admins to deal with bad negative entries
during administrative tasks without having to throw away the ccache
entirely.

Simo.

--
Simo Sorce * Red Hat, Inc * New York

Benjamin Kaduk

2014-08-02 21:03:17 UTC

Permalink

Post by Simo Sorce

Post by Nico Williams
IMO a negative cache belongs in the ccache, with some TTL, and with
kvno(1) always (or optionally) ignoring NAKs.

I agree you want to avoid all involved processes in a script to see
negative caches.

I'm failing to parse this sentence.

Post by Simo Sorce
And perhaps add a kdestroy switch that just remove negative entries ?
This would make it possible for admins to deal with bad negative entries
during administrative tasks without having to throw away the ccache
entirely.

This makes it sound like if I stopped after "I agree" in the above
sentence, I would be on the right track.

-Ben

Simo Sorce

2014-08-03 17:04:59 UTC

Permalink

Post by Benjamin Kaduk

Post by Simo Sorce

Post by Nico Williams
IMO a negative cache belongs in the ccache, with some TTL, and with
kvno(1) always (or optionally) ignoring NAKs.

I agree you want to avoid all involved processes in a script to see
negative caches.

I'm failing to parse this sentence.

uhmm I think there is a "to avoid" that doesn't belong here.
What I meant is that is you have a shell script you want all processes
that may be invoked by it to see the same negative cached elements, and
the only way to do it is by storing them in a file, the ccache.

Post by Benjamin Kaduk

This makes it sound like if I stopped after "I agree" in the above
sentence, I would be on the right track.

Probably.

Simo.

--
Simo Sorce * Red Hat, Inc * New York

David Woodhouse

2015-10-24 11:52:00 UTC

Permalink

Post by Nico Williams
IMO a negative cache belongs in the ccache, with some TTL, and with
kvno(1) always (or optionally) ignoring NAKs.

I agree you want [...] all involved processes in a script to see
negative caches.
And perhaps add a kdestroy switch that just remove negative entries ?
This would make it possible for admins to deal with bad negative entries
during administrative tasks without having to throw away the ccache
entirely.

Hm, I thought we had consensus that doing the negative caching in krb5
(either in memory or in the ccache, probably the latter) was the best
approach.

In https://bugzilla.redhat.com/show_bug.cgi?id=981477#c15 you now seem
to be suggesting a wildly different approach, where each application
doing SPNEGO must keep track of which underlying mechanism actually
*worked* for a given service, and then restrict future SPNEGO attempts
to use only that mechanism.

So, for example, in the example which led to that bug being filed,
Firefox would *notice* that SPNEGO ended up falling back to GSS-NTLMSSP
and would thus restrict the mechanisms used by SPNEGO for future
authentication (for how long?) to the same host.

Please correct me if I'm misunderstanding.

Can we continue that discussion here? I'm not sure I like this new
approach, but if there's a clear agreement from the krb5 side that this
is how it should be done by all applications, and a comprehensive
description of *how* applications should behave, we can potentially set
about fixing them all to do it as you envisage...

TBH I much prefer having a negative cache on the krb5 side, as
demonstrated by my hackish proof of concept. But I'll defer to the
collective expertise of this list...

--
David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation

Simo Sorce

2015-10-24 23:12:59 UTC

Permalink

Post by David Woodhouse

Post by Nico Williams
IMO a negative cache belongs in the ccache, with some TTL, and with
kvno(1) always (or optionally) ignoring NAKs.

I am not proposing a general approach in that bug, but one specific to
HTTP/SPNEGO in Firefox, the reason there is simple, NTLMSSP has no
ccache, hence no place where to store negative caching. I think it
appropriate in that case that Firefox keep the caching, mostly because
that allows yet another optimization that the krb5 mechanism cannot
provide, and that is providing right away the first leg of a SPNEGO
token without the initial roundtrip that returns the 401-Negotiate error.

Post by David Woodhouse
Can we continue that discussion here? I'm not sure I like this new
approach, but if there's a clear agreement from the krb5 side that this
is how it should be done by all applications, and a comprehensive
description of *how* applications should behave, we can potentially set
about fixing them all to do it as you envisage...
TBH I much prefer having a negative cache on the krb5 side, as
demonstrated by my hackish proof of concept. But I'll defer to the
collective expertise of this list...

I am fine with a negative cache for krb5 ccaches, but it is a separate
concern from Browser specific information caching (which is partly
positive and partly negative as explained above) IMO.

Simo.

--
Simo Sorce * Red Hat, Inc * New York
_______________________________________________
krbdev mailing list ***@mit.edu
https://mailman.mit.edu/mailman/listinfo/krbdev

David Woodhouse

2015-10-25 04:53:42 UTC

Permalink

Post by Simo Sorce
I am not proposing a general approach in that bug, but one specific to
HTTP/SPNEGO in Firefox, the reason there is simple, NTLMSSP has no
ccache, hence no place where to store negative caching. I think it
appropriate in that case that Firefox keep the caching, mostly because
that allows yet another optimization that the krb5 mechanism cannot
provide, and that is providing right away the first leg of a SPNEGO
token without the initial roundtrip that returns the 401-Negotiate error.

Firefox is basically dead code. It's not receiving maintenance â even
the trivial patch to make it look for /usr/bin/ntlm_auth for automatic
NTLM authentication instead of looking only in the current directory
has been languishing in bugzilla for years.

That aside, I'm not sure we *care* about negative caching for NTLMSSP.
It isn't NTLMSSP that is causing the problems â it's repeated attempts
to obtain a krb5 ticket, over and over and over and over again.

As for eliminating the initial roundtrip... I'm not even sure that's a
valid optimisation, is it? You're basically suggesting that we cache
the WWW-Authenticate: headers that the server gave us for one request,
and *assume* that it'll give us the same options for a subsequent
request?

Even if that is an optimisation we want to make in Firefox, and even if
we can somehow get traction and make such changes in dead code, it
seems to be an orthogonal issue. We could do that *without* having to
keep track of specific mechanisms used *within* Negotiate/SPNEGO
authentication.

Post by Simo Sorce

I am fine with a negative cache for krb5 ccaches, but it is a separate
concern from Browser specific information caching (which is partly
positive and partly negative as explained above) IMO.

OK. There are perhaps other things we could explore on the browser side
too â for example, if a server offers 'WWW-Authenticate: NTLM' as well
as Negotiate then *disable* GSS-NTLMSSP in SPNEGO because if we *do*
fall back to using NTLM then we are better off with connection-based
authentication rather than per-request.

But seriously, I don't hold out much hope of doing *anything* sane in
Firefox, and the auth code in Chrome is utterly painful to change too
(it *still* doesn't do SSO for WWW-Authenticate: NTLM, and there's been
patches for *that* for years too).

Doing the negative cache for Kerberos service tickets seems like the
only thing that's actually tractable in a reasonable amount of time.

--
dwmw2

Isaac Boukris

2015-10-25 15:10:32 UTC

Permalink

Hi,

Firefox is basically dead code. It's not receiving maintenance — even
the trivial patch to make it look for /usr/bin/ntlm_auth for automatic
NTLM authentication instead of looking only in the current directory
has been languishing in bugzilla for years.
That aside, I'm not sure we *care* about negative caching for NTLMSSP.
It isn't NTLMSSP that is causing the problems — it's repeated attempts
to obtain a krb5 ticket, over and over and over and over again.
As for eliminating the initial roundtrip... I'm not even sure that's a
valid optimisation, is it? You're basically suggesting that we cache
the WWW-Authenticate: headers that the server gave us for one request,
and *assume* that it'll give us the same options for a subsequent
request?
Even if that is an optimisation we want to make in Firefox, and even if
we can somehow get traction and make such changes in dead code, it
seems to be an orthogonal issue. We could do that *without* having to
keep track of specific mechanisms used *within* Negotiate/SPNEGO
authentication.

Post by Simo Sorce

I am fine with a negative cache for krb5 ccaches, but it is a separate
concern from Browser specific information caching (which is partly
positive and partly negative as explained above) IMO.

OK. There are perhaps other things we could explore on the browser side
too — for example, if a server offers 'WWW-Authenticate: NTLM' as well
as Negotiate then *disable* GSS-NTLMSSP in SPNEGO because if we *do*
fall back to using NTLM then we are better off with connection-based
authentication rather than per-request.

Side note - I think the above assumption isn't correct as Negotiate
(SPNEGO) will actually behave as connection-based when the elected
underlying mechanism was NTLMSSP.
As a matter of fact, even when KRB5 is negotiated it may still be
connection-based according to server side settings (lookup
'Persistent-Auth' header).

Have a look at MS specification of the Negotiate protocol:
https://msdn.microsoft.com/en-us/library/ee393311.aspx

Regards,
Isaac B.

_______________________________________________
krbdev mailing list ***@mit.edu
http

Simo Sorce

2015-10-25 21:16:30 UTC

Permalink

It's a pretty good assumption for the same URL, and it is harmless.

Even if that is an optimisation we want to make in Firefox, and even if
we can somehow get traction and make such changes in dead code, it
seems to be an orthogonal issue. We could do that *without* having to
keep track of specific mechanisms used *within* Negotiate/SPNEGO
authentication.

If you combine that with a libkrb5 negative cache perhaps.

Post by Simo Sorce

I am fine with a negative cache for krb5 ccaches, but it is a separate
concern from Browser specific information caching (which is partly
positive and partly negative as explained above) IMO.

I am not sure ehy you would disable GSS-NTLMSSP in this case, but this
is probably not the right place to discuss NTLM specific stuff.

But seriously, I don't hold out much hope of doing *anything* sane in
Firefox, and the auth code in Chrome is utterly painful to change too
(it *still* doesn't do SSO for WWW-Authenticate: NTLM, and there's been
patches for *that* for years too).
Doing the negative cache for Kerberos service tickets seems like the
only thing that's actually tractable in a reasonable amount of time.

Understood.

Simo.

--
Simo Sorce * Red Hat, Inc * New York
_______________________________________________
krbdev mailing list ***@mit.edu
https://mail

Nico Williams

2014-08-04 17:32:33 UTC

Permalink

Post by Nico Williams
IMO a negative cache belongs in the ccache, with some TTL, and with
kvno(1) always (or optionally) ignoring NAKs.

It'd be nice if the KDC could advertise a TTL for this.

Also, ideally such ccache entries should be like cc config entries, and
they should have a fixed-sized timestamp that can be overwritten to
immediately expire or refresh it as desired without having to enlarge
the ccache.