faces performance issues & database lookup

John Mackin (john@syd.dit.csiro.au)
Thu, 4 Apr 91 14:28:46 +1000

Indented material is by Rich, and he's quoting Steve.

} Is a bottleneck to faces' speed the search for the appropriate
} face file?

Yes. The Bell Labs V8/9 vismon program (correct me if I'm wrong
John), used just 48x48x1 format. It used to request faces from a
face server (like you suggest below).

You're not wrong (Narelle). Presotto's face server was happy to provide faces
in quite a few different formats, but the only format that vismon itself used
was 48x48x1. I don't have a copy handy, but I believe Pike and Presotto's
paper ``Face the Nation'' described the face server in some detail, for
those who may be interested. Perhaps Rich can post the reference; I have
a feeling it was a USENIX paper.

Anyway, this is _the absolute biggie_ performance problem in faces. I made
some fast measurements many versions ago that confirmed this, which is
intuitively clear anyway; all those stats, right, and each stat means
another namei, and we all know what a bad move namei is. Worse still,
and this is a true hasslebad for people who don't use -a and do let
mail build up in their mailbox (I don't do the former, and I don't do
the latter any more -- but I used to... and of course, there are those
times when you have a week off, right, and come back to 212 messages...),
is the fact that faces does the icon lookup for _every message_, not just
for the ones it is going to display icons for.

A huge performance win for people in the above situation can be had in one
of two ways: (1) read and scan the mail file backwards, or (2) scan all the
messages, but just cache the info for locating the face for the last -c
number of messages, and defer looking up icons until you hit EOF. (1) is
harder to program neatly and efficiently, but much cuter; it does, of course,
defeat the kernel's read-ahead, but what the hell.

(For those of you who have never tried starting faces on a substantial
mailbox, try it sometime. Have a pillow and blankets handy.)

These, however, are just kludges. They may be good kludges; they may be
kludges that we should implement -- but they remain kludges nonetheless.
They won't help people who use -a, since in that case every icon needs to
be looked up anyway. I _by no means_ intend to suggest that we shouldn't
be using a totally different access method. After all, ``Don't diddle
code to make it faster -- find a better algorithm.'' [K&Plauger.]

Another kludge (again, I'm _not_ saying we should be kludging) which
could cut down the namei overhead substantially is to chdir() to the
top of each facepath element and do the searching of that part of
the tree with relative pathnames from there, rather than using
absolute pathnames all the way from the root each time. chdir()'ing
is often very bad practice, but in an application that can't possibly
become interactive we don't have to be concerned.

John Mackin and I have been discussing a few of these ideas
off-line; perhaps it's time to put our ideas to the list.

Hey! Here it comes man!

} If so, have these alternatives been considered?
} - a dbm database of face locations that can be searched faster than
} the directory structures; or even loaded and kept in core by the
} faces program for very fast searches; something would have to update
} the database every once in a while

This is one of the ideas I really like. Here are some of my thoughts:
[...]
Dbm format would probably work. I seem to remember that DBM(3X) had some
stupid upper limit of 1000 records or something. I'm sure that can be
worked around; there are probably better database manipulation packages
anyway. I believe Henry Spencer and Jeff Collier put one together for C
news.

I don't mind this idea, but... dbm will NOT work, and ndbm will NOT work,
and we have to remember that not all UNIX variants have dbm at all. Both
dbm and ndbm share Rich's ``stupid upper limit'', and it's very bad. The
relevant manual entry quote is this:

The sum of the sizes of a key/content pair must not exceed
the internal block size (currently 1024 bytes). Moreover
all key/content pairs that hash together must fit on a sin-
gle block. store() will return an error in the event that a
disk block fills with inseparable data.

The second sentence kills n/dbm for all serious applications. I've looked
at this and it doesn't work. It can't be worked around.

What would be nice was if there was a public domain reimplementation of
n/dbm that didn't have this problem. I don't know if this exists or
not; I haven't looked through the archives. Does anyone on the list
know? I haven't looked at what C news uses; perhaps someone who knows
can tell us about it. Whatever we use, it would need to be adopted as
part of the faces distribution. There's nothing I hate worse than
grabbing something off the net and finding that it isn't self-contained,
and I am sure others share this feeling.

Converting a large directory structure to .dbm format once a month seems
acceptable to me. The database would use only one inode. That would be
nice.

Well, two :-).

} - a face location daemon that could be asked for the appropriate face
} for the given user/host

This is in the spirit of the Real True vismon(8). I like this idea as well.

(vismon(9), you trivia fans.)

Damn right! There's a hassle, though. The face server was implemented
using a mounted process and the file system switch, not things we can rely
on having on our common-or-garden UNIX boxes. Sure, we can make some
bizarre RPC format for serving faces (or some such -- personally I don't
like RPC much, since you never know if you are going to get a reply or
not -- if we do go this route, and I suspect it may be the route to go,
I want to see a connection-oriented, TCP-based service, since that way
you _know_ the other end of the connection is still there and it _will_
reply and you should wait for it -- if it goes away, you find out).

The thing about this, though, is that it isn't conceptually clean.
_Appearing_ to use the directory structure of the filesystem to
represent the database is beautiful. _Actually_ doing so is a
performance disaster, as we have found out.

Still, what the hell. Maybe I can get an RFC out of this! Let's
see... SIPP, Simple Icon Provision Protocol; sounds fine to me...

} We should be ready for good performance with databases of hundreds of
} hosts and thousands of users (much sooner than we probably realize)

Indeed, Steve. Since you announced your great effort of converting the
FaceSaver database not too long after sending this, we know what you
meant. Thanks a _lot_ for doing that, by the way; I've been meaning to
do it for ages, and you saved me the trouble.

} and eventually thousands of hosts and tens of thousands of
} users. Unless X-face:s become more important than the on-line
} databases ...

Which leads me to refer to Mark's comments about this. He is 100%
correct, in an ideal world. Unfortunately, that's not the world
we've got. At the moment, we need the databases, a lot.

With any luck, the Faces Project will forge on ahead now that these
great databases are available, and now that we have the mailing list.
And, of course, as more of the user community in general have bitmapped
graphics in front of them. This may mean that we'll see X-Face:
headers more, and eventually the databases may not be so important.
For the moment, though, they really do matter.

Agreed. If/when I/we get all the bugs out of the looking up of
aliases in the people.tab and machine.tab files, then we'll need to
improve their handling too. There are also bugs in the handling of
multiple face directory structures.

Absolutely. This needs to be fixed up.

Whatever solution is chosen, it'll be a fair bit of work to write and
implement. It's a project that could be done in isolation, then integrated
with the latest faces when it's found to be an acceptable solution. If
anybody wishes to volunteer to prototype some of these ideas, I'd love to
hear from you.

I agree with Rich. Let's all hear from you; post to the list! Hell,
talk first, implement later, when it's a biggie like this. Just don't
let the talking exceed the implementing...

John.