[xplc-general] module loader

Post by Pierre Phaneuf
I think I will be directing my efforts on the module loader in the
coming times, unless someone tells me that I'm a idiot and that
something else urgently needs help... :-)

Darn IFactory... :)

Post by Pierre Phaneuf
I'm thinking of having *two* version numbers in ModuleInfo, one being
the version number of the structure used in this module, and the second
being the minimum version number that the module loader should support.
For example, if we add interface definition information, we would bump
the version number, but not the minimum version number, because older
module loaders can safely ignore this information. But if we added
something like the unloadModule pointer at a later time, this would
require bumping the minimum version as well, because a module that would
try to veto the change and gets ignored by the loader (because it
doesn't even get called) might crash or otherwise be unstable. We could
just have a single version number, and make the module loader ignore
version numbers newer than the one it knows about, but this isn't too
hot for backward compatibility...

Instead of a "max supported version" number, you should just use
sizeof(struct). This is a time-honoured technique used in kernel syscalls
that seems to work pretty well. It's based on the assumption that structs
seem to grow, rather than shrink, rather monotonically over time. Since I
don't know any structs that haven't, I guess it must be true :)

Because people are essentially stupid and lazy, I also suggest just
including the "current XPLC module system version" or something as the
"minimum" version in the struct automatically. That means anyone using your
module has to have at least as high a version of the module loader as you
had when you compiled it... which is not that big a deal, since the module
loader is (I hope) a shared object that can be upgraded anytime. The module
may need some recent XPLC feature anyway, and it's just a pain if you have
to check to see if even XPLC supports all the things you're looking for at
runtime.

The combination of these two things eliminates the need for manually
supplying *both* version numbers.

Post by Pierre Phaneuf
Something I don't really like (but have a feeling it might be a thing
that Avery might like!) is that of not knowing what UUIDs the module can
provide (for caching reasons, mostly, so that we don't have to load the
modules all the time to figure out where to get a component). I was
thinking of having this list of UUIDs in ModuleInfo, like mentioned
earlier, but that would require module writers to keep the list in sync
with what the module really does. If a UUID is supported and isn't in
the list, it might work the first time (because the module is not in the
cache), then never again. Maybe I could cut down on the non-determinism
by checking the list first, so that it would never work, consistently.

There should only be *one* list of available UUIDs, or they're doomed to get
out of sync as you describe. A simple array mapping UUID to function
pointer would be pretty easy.

Post by Pierre Phaneuf
I think Avery wanted to be able to answer to whatever UUIDs he felt
like, for some reason, maybe you could explicitly state that you want to
be called all the time by putting the NULL UUID in your list.

I don't really need this - loadModule/unloadModule sound like they'd let me
register things whenever I want. I'm pretty sure that just answering to
random UUIDs at runtime sounds a bit dangerous.

I guess I would then need some control over when you *load* the module,
though, which might be similar. If I don't provide any UUIDs, then always
load me?

Post by Pierre Phaneuf
A variant that I thought of, since I don't like typing in the same thing
twice (and having to maintain both), was to not have the getObject
function pointer at all, but instead to have a list of a small structure
that would have a UUID and a function pointer. The NULL UUID trick would
still apply I suppose.

This sounds harmless, and degrades nicely into the one-element array
containing (NULL, getObject), so it's a superset of the previous suggestion.

Post by Pierre Phaneuf
If that last option is the one, I'm not sure whether the function should
get the UUID or not. Why should it be passed, if the function will only
be good for one?

What do you have against passing parameters to functions, anyway? Passing
the UUID costs you only a tiny amount of effort, and gives you free
features, such as being able to use the same getObject-type function for two
UUIDs in your list that are very similar, but do one thing slightly
differently depending on the object. Why not allow it?

...

By the way, keep in mind that I will generally want to be obtaining objects
by moniker, not by UUID at all, so you need to be able to cache that
information somehow as well.

Have fun,

Avery

Pierre Phaneuf

2004-01-16 17:40:04 UTC

Avery Pennarun wrote:

I'll split my message in parts, to make it easier to track separate
issues...

Post by Pierre Phaneuf
I think I will be directing my efforts on the module loader in the
coming times, unless someone tells me that I'm a idiot and that
something else urgently needs help... :-)

Darn IFactory... :)

Hehehe!

I already told you, you don't need to use this interface, I'm trying to
make it easier to not use it if you don't want to (with
IMPLEMENT_IOBJECT, for example, which allows you to have constructors on
your classes more easily).

--
Pierre Phaneuf
http://advogato.org/person/pphaneuf/
"I am denial, guilt and fear -- and I control you"

Pierre Phaneuf

2004-01-16 18:18:02 UTC

Post by Avery Pennarun
Instead of a "max supported version" number, you should just use
sizeof(struct). This is a time-honoured technique used in kernel
syscalls that seems to work pretty well. It's based on the
assumption that structs seem to grow, rather than shrink, rather
monotonically over time. Since I don't know any structs that
haven't, I guess it must be true :)

That's a good idea, but it doesn't seem to be that much less trouble
than having a #define somewhere. I, for one, like cutting fat, so
shrinking isn't out of the question, so having a soname-like number in a
header might actually be safer.

Not really a big deal, though. Apache uses the date (20040116, for
example), which isn't bad either. This number isn't supposed to change
every other day, and you're supposed to use it through a #define (or
some automated thing like sizeof) anyway...

Post by Avery Pennarun
Because people are essentially stupid and lazy, I also suggest just
including the "current XPLC module system version" or something as
the "minimum" version in the struct automatically. That means anyone
using your module has to have at least as high a version of the
module loader as you had when you compiled it... which is not that
big a deal, since the module loader is (I hope) a shared object that
can be upgraded anytime. The module may need some recent XPLC
feature anyway, and it's just a pain if you have to check to see if
even XPLC supports all the things you're looking for at runtime.

Even if it is really easy to upgrade the module loader, it's always even
easier *not* to!

Requiring at least the same version that you have on the system
compiling the module is something that I specifically wanted to avoid.
That kind of behavior has been my bane when building packages for
Quadra, I actually had to keep a dual boot on my machine with an oldish
Red Hat, just to build Quadra binaries that would then run on a large
variety of Linux distros (because a lot of the libraries and distros
have backward compatibility, but just about nothing has *forward*
compatibility). That's a feature that they have on Windows, and that
there is really no reason not to have it (if only to get rid of my old
Red Hat partition!).

After checking out Apache's "module" structure, I think this would be
more clearly expressed if I used the terms "major" and "minor" versions.
If you have a module that says version 2.4, it means that module loader
2.0 and up could load it, and a 1.x loader wouldn't do. We'd increase
the major number only on incompatible change, and the lower on
compatible changes.

An example of a compatible change that requires a different version
number is adding metainformation that allows scripting languages to call
interfaces. If that feature has been added in 2.3, and you have a 2.4
module, there's really no problem loading that module with the 2.0
loader, you'll just won't be able to use it from a scripting language,
but it will otherwise work just fine.

Post by Avery Pennarun
The combination of these two things eliminates the need for manually
supplying *both* version numbers.

Manually supplying what what what?! I'll have none of that, thank you
very much! ;-)

Here is for example the STANDARD_MODULE_STUFF that Apache modules put at
the beginning of their "module" struct:

#define STANDARD_MODULE_STUFF MODULE_MAGIC_NUMBER_MAJOR, \
MODULE_MAGIC_NUMBER_MINOR, \
-1, \
__FILE__, \
NULL, \
NULL, \
MODULE_MAGIC_COOKIE

So you don't have to type in any of this, you just do this:

module MODULE_VAR_EXPORT example_module =
{
STANDARD_MODULE_STUFF,
example_init,
...
};

And all the magic is taken care of!

In XPLC, I'd put at least the entine first line of of the
"example_module" definition in a macro, I think, since it'd always be
the same for every modules, there's no sense in having the developer
type it in again and again.

--
Pierre Phaneuf
http://advogato.org/person/pphaneuf/
"I am denial, guilt and fear -- and I control you"

Pierre Phaneuf

2004-01-16 19:32:04 UTC

Post by Avery Pennarun
There should only be *one* list of available UUIDs, or they're doomed
to get out of sync as you describe. A simple array mapping UUID to
function pointer would be pretty easy.

Yeah, agreed. That sounds like the ticket (the simple struct with a UUID
and a function pointer).

Post by Pierre Phaneuf
I think Avery wanted to be able to answer to whatever UUIDs he felt
like, for some reason, maybe you could explicitly state that you
want to be called all the time by putting the NULL UUID in your
list.

I don't really need this - loadModule/unloadModule sound like they'd
let me register things whenever I want. I'm pretty sure that just
answering to random UUIDs at runtime sounds a bit dangerous.

Yeah, it does sound dangerous, but so does a lot of stuff that you do! ;-)

Post by Avery Pennarun
I guess I would then need some control over when you *load* the
module, though, which might be similar. If I don't provide any
UUIDs, then always load me?

You're right on. The loadModule isn't as useful as you'd think once you
consider I'm trying very hard to avoid loading modules if at all
possible. I remember loading plugins as being one of the slowest thing
in the startup of Netscape browsers and even worse in Mozilla.

A module that doesn't provide UUIDs, but does anyway, that kinda weird...

I don't really understand why you want to register stuff by hand, when
it'd actually be easier to use the array. You can still return a NULL
pointer if you don't feel like giving that object right now, you know?

Doesn't really matter, you should be able to do it the way you want!

Post by Pierre Phaneuf
A variant that I thought of, since I don't like typing in the same
thing twice (and having to maintain both), was to not have the
getObject function pointer at all, but instead to have a list of a
small structure that would have a UUID and a function pointer. The
NULL UUID trick would still apply I suppose.

This sounds harmless, and degrades nicely into the one-element array
containing (NULL, getObject), so it's a superset of the previous suggestion.

Exactly.

Post by Pierre Phaneuf
If that last option is the one, I'm not sure whether the function
should get the UUID or not. Why should it be passed, if the
function will only be good for one?

Because I like the advice I got from these two Kernighan and Pike
fellows, that by making errors impossible to express, you avoid those
errors.

It's not really a big deal, I think I'll do it like you say, but having
a UUID parameter kind of implies that it might be different from one
call to the other, right? That's what parameters are for! So if you have
this function that just ignores the UUID parameter and gives back an
object and you happen to put it in *two* slots by accident, you'll have
funny behavior. You might put an if() in the function to check for that
mistake, but you'd be specifying the UUID in two places, which we agreed
is bad. If you start using a single function for multiple UUIDs, then it
looks all right if you have four UUIDs in a row in the array with the
same function pointer, but only the devil knows that the fourth one is
wrong!

But I think this is a case of "as simple as possible, but no simpler":
leaving out the UUID parameter starts causing headaches for the case
where you want to use the "always load and ask me" feature. I could have
a separate getObject function pointer that would have the extra
parameter, but I like the idea of that feature being somewhat hidden and
implicit rather than an explicit field you could set (people are stupid
and will just set this field, knowing no better and trying to use all
the features).

Post by Avery Pennarun
By the way, keep in mind that I will generally want to be obtaining
objects by moniker, not by UUID at all, so you need to be able to
cache that information somehow as well.

Yeah, I'm keeping this in mind, but I don't know how I'll be doing this
at the moment... But you're right, this is absolutely needed, and if you
have a good idea, come forth!

--
Pierre Phaneuf
http://advogato.org/person/pphaneuf/
"I am denial, guilt and fear -- and I control you"

Avery Pennarun

2004-01-16 19:47:02 UTC

Post by Avery Pennarun
I don't really need this - loadModule/unloadModule sound like they'd
let me register things whenever I want. I'm pretty sure that just
answering to random UUIDs at runtime sounds a bit dangerous.

Yeah, it does sound dangerous, but so does a lot of stuff that you do! ;-)

Well, in this particular case, I'm willing to be subjected to IAGNI (I
aren't gonna need it :))

Post by Pierre Phaneuf
I don't really understand why you want to register stuff by hand, when
it'd actually be easier to use the array. You can still return a NULL
pointer if you don't feel like giving that object right now, you know?

Yeah, due to my inability to come up with good counterexamples, I'm slowly
coming around to your way of thinking, especially if making my information
table won't be very much typing.

Post by Pierre Phaneuf
So if you have this function that just ignores the UUID parameter and
gives back an object and you happen to put it in *two* slots by accident,
you'll have funny behavior.

I think if your bug is something like "factory returns wrong kind of object"
and you don't notice it right away, your life is just too complicated. Of
course, bonus points for helping the factory's return value be typesafe, as
WvMoniker does. Templates are kind of gross, but maybe you can do something
with auto-type-upcasting.

Post by Avery Pennarun
By the way, keep in mind that I will generally want to be obtaining
objects by moniker, not by UUID at all, so you need to be able to
cache that information somehow as well.

Yeah, I'm keeping this in mind, but I don't know how I'll be doing this
at the moment... But you're right, this is absolutely needed, and if you
have a good idea, come forth!

One obvious way would be to have another table mapping strings to function
pointers. (Notice how I've cleverly avoided mentioning another alternative,
which is a table mapping strings to UUIDs.)

I still have a feeling that it may be worthwhile to make your caching more
"generic" - so that we can add more "moniker-like things" to cache later on
without trouble. That is, somebody might want to load a module by looking
for a particular kind of Fooizer, rather than a string or a UUID.

This becomes increasingly clear when you realize that different
monikerspaces have to be totally different: you can register one global UUID
space, but I want to request the UUID for WvStreams (perhaps) and then ask
it for a tcp: or ssl: moniker. This means WvStreams needs to know how to
ask the module loader for the module corresponding to *its* ssl: moniker
string. You can do this with a table of (uuid,string,func), but a generic
caching system could work more like this:

- load each module
- module registers itself with WvStreams, telling it about its monikers
- WvStreams asks to cache tuples of the form (uuid,string->module), so that
when it asks for (uuid,string) in the future, it gets (module).

"string" could instead be a "blob", allowing totally generic caching. I
think the uuid should probably be mandatory, so you're guaranteed to not
have cache namespace conflicts.

Then again, maybe the only *real* use for this is monikers, and a straight
static table will save time/space/effort. YAGNI?

Have fun,

Avery

Pierre Phaneuf

2004-01-16 20:11:01 UTC

Post by Avery Pennarun
I don't really need this - loadModule/unloadModule sound like
they'd let me register things whenever I want. I'm pretty sure
that just answering to random UUIDs at runtime sounds a bit
dangerous.

Yeah, it does sound dangerous, but so does a lot of stuff that you do! ;-)

Well, in this particular case, I'm willing to be subjected to IAGNI
(I aren't gonna need it :))

LOL!

Post by Pierre Phaneuf
I don't really understand why you want to register stuff by hand,
when it'd actually be easier to use the array. You can still return
a NULL pointer if you don't feel like giving that object right now,
you know?

Yeah, due to my inability to come up with good counterexamples, I'm
slowly coming around to your way of thinking, especially if making my
information table won't be very much typing.

Hehe! I was just thinking about this the other day, how you have this
rare combination of both "architect" and "coder", but that it seems I
have a bit more of the "architect" than you do, and you more than make
up for it with the extra "coder" (maybe your designs are slightly less
elegants than mine, but you MAKE THEM, so they're infinitely more useful)!

It's cool working with people having different strengths and leveraging
them. For example, I'd never trust our good Andrew to architect anything
to save his life, but he'll make anything work!

Post by Pierre Phaneuf
So if you have this function that just ignores the UUID parameter
and gives back an object and you happen to put it in *two* slots by
accident, you'll have funny behavior.

I think if your bug is something like "factory returns wrong kind of
object" and you don't notice it right away, your life is just too
complicated. Of course, bonus points for helping the factory's
return value be typesafe, as WvMoniker does. Templates are kind of
gross, but maybe you can do something with auto-type-upcasting.

Yeah, you're right, but you do see my attempts at trying to make wrong
things impossible, making it easy on the stupid users. :-)

I'm not too sure what you mean with the templates and
auto-type-upcasting bit though... The factories always return IObject*,
but if you're smart, you'll assign this right into an
xplc_ptr<IMyOwnStuff> and everything will be type-safe. Is that what you
mean?

Post by Avery Pennarun
By the way, keep in mind that I will generally want to be
obtaining objects by moniker, not by UUID at all, so you need to
be able to cache that information somehow as well.

Yeah, I'm keeping this in mind, but I don't know how I'll be doing
this at the moment... But you're right, this is absolutely needed,
and if you have a good idea, come forth!

One obvious way would be to have another table mapping strings to
function pointers. (Notice how I've cleverly avoided mentioning
another alternative, which is a table mapping strings to UUIDs.)

Yeah, I knew you'd avoid these. ;-)

Putting a function pointer forces loading the module (maybe the moniker
points at something else). Maybe it's all right, but I have a bigger
problem.

While there's a top-level, standard XPLC moniker service, you probably
don't want everyone dumping their crap at that level. For example, you'd
probably set up your own moniker service for WvStreams and register it
in the standard moniker service as "wvstream:" or something. Then, when
you register "ssl:", it has to be registered with the proper moniker
service.

I think it'll be something like a triple:

- UUID of the moniker service
- moniker string to register
- UUID (or function pointer, depending how hard you beat me up) ;-)

Post by Avery Pennarun
I still have a feeling that it may be worthwhile to make your caching more
"generic" - so that we can add more "moniker-like things" to cache later on
without trouble. That is, somebody might want to load a module by looking
for a particular kind of Fooizer, rather than a string or a UUID.
This becomes increasingly clear when you realize that different
monikerspaces have to be totally different: you can register one global UUID
space, but I want to request the UUID for WvStreams (perhaps) and then ask
it for a tcp: or ssl: moniker. This means WvStreams needs to know how to
ask the module loader for the module corresponding to *its* ssl: moniker
string. You can do this with a table of (uuid,string,func), but a generic

Well, it seems you saw the problem I just talked about (the
"monikerspaces").

Post by Avery Pennarun
Then again, maybe the only *real* use for this is monikers, and a
straight static table will save time/space/effort. YAGNI?

There's actually another thing with similar caching needs, categories,
so it's not as YAGNI as it seems. They might be the only two things
though, and maybe having a static table for each is just the ticket, but
the fact that there's two is somewhat unsettling and makes it work
thinking a bit more about this, IMHO...

--
Pierre Phaneuf
http://advogato.org/person/pphaneuf/
"I am denial, guilt and fear -- and I control you"

Avery Pennarun

2004-01-16 20:33:05 UTC

Post by Pierre Phaneuf
Yeah, you're right, but you do see my attempts at trying to make wrong
things impossible, making it easy on the stupid users. :-)
I'm not too sure what you mean with the templates and
auto-type-upcasting bit though... The factories always return IObject*,
but if you're smart, you'll assign this right into an
xplc_ptr<IMyOwnStuff> and everything will be type-safe. Is that what you
mean?

Oops, I meant compile-time typesafe, which is almost as good as K&P's "prevent
you from doing something stupid by making it impossible to express" idea.
If you look at WvMoniker, it uses a template wrapper to force your factory
to return a particular kind of object that you're looking for (IWvStream,
for example), and yet actually deals in IObjects.

Of course, there's not necessarily any particular guarantee that a
particular monikerspace is guaranteed to return a particular kind of object
- I imposed the relationship between namespace and interface myself by
making the code artificially restrictive. However, it's a convenient
assumption to be able to make.

Post by Pierre Phaneuf
Putting a function pointer forces loading the module (maybe the moniker
points at something else).

That's fine with me. If monikers are first-class notation in XPLC, then we
can expect the provider of a particular service to generally be the one
registering its moniker.

Post by Pierre Phaneuf
- UUID of the moniker service
- moniker string to register
- UUID (or function pointer, depending how hard you beat me up) ;-)

<thump!> <wap!> <BIFF!>