Meliae dump for licornd tonight… A first win, and a good but difficult perspective
On my production server, I noticed that licornd consumes 218Mib of RAM. Granted, this is the stable version (thus, the WMI still lies in licornd) and it holds 214540 directory watches, but I still find this is too much.
I thus gave Meliae a try, starting from this perfect example-based howto. Here is the result:
Total 961436 objects, 261 types, Total size = 197.5MiB (207061468 bytes)
Index Count % Size % Cum Max Kind
0 214540 22 117567920 56 56 548 Watch
1 466050 48 65864052 31 88 6976 str
2 2571 0 14768280 7 95 6291592 dict
3 216166 22 2593992 1 96 12 int
4 3291 0 1020956 0 97 178172 unicode
5 23461 2 939256 0 97 908 tuple
6 338 0 766784 0 98 6304 module
7 5554 0 377672 0 98 68 code
8 841 0 376768 0 98 448 type
9 1864 0 305696 0 98 164 _RLock
10 5321 0 297976 0 98 56 function
11 166 0 282200 0 99 1700 Group
12 508 0 278384 0 99 548 Machine
13 4474 0 210876 0 99 9336 list
14 811 0 154508 0 99 548 Enumeration
15 737 0 120868 0 99 164 _Condition
16 55 0 93500 0 99 1700 User
17 557 0 91348 0 99 164 _Event
18 165 0 90420 0 99 548 Distribution
19 2765 0 88480 0 99 32 builtin_function_or_method
Seems like the optimization I submitted to pyinotify is not installed on this machine. A quick approximation suggests we can go down to 130Mib with the patched pyinotify. After having installed it and restarted licornd, I was right. Now licornd eats only 132Mib. Yeah! That's a first 40% cut off :-)
The next question is: how to reduce all these str? They are most probably the pathnames of the watches (given the numbers). A first part of the optimization will be easy: they seem to be stored twice. I'll have to hunt this, but the answer is somewhere in the INotifier. Perhaps we store the pathnames, and so does pyinotify.
The second optimization will probably not be that easy: I assume the best way to reduce the remaining str will be to reimplement pyinotify with a tree (mimicing the FS) to avoid storing the dirname() part of every entry, instead of a list/dict of full pathnames.
This will surely help on the performance side too, in the critical process of comparing new events pathnames against already watched full pathnames as dict keys during watch creation / deletion. Currently, 10 minutes after the daemon launched, there are only 30K watches installed out of 214K. This is poor. OK, my HDD are slow and my machine is 5 years old — and loaded besides of that — but we can do better for sure.
*Morning note:* 7 hours after launch, there are still only 135K watches installed out of 214K. This is very poor. The tree is really needed. This helps explaining why licornd misses inotify events when it's fully loaded with a bunch of watches, and why untarring a kernel on a test machine with nothing watched works like a charm.
History note: I started from stackoverflow, and Meliae seemed to be the best tool to use. The nice fact is that in Licorn®, it's ultra simple:
get in
from meliae import scanner
scanner.dump_all_objects('/tmp/licornd.json')
del scanner
And then, after having patched Meliae against #876810:
ipython
from meliae import loader
om = loader.load('/tmp/licornd.json')
s = om.summarize(); s
Remote Interactive Console
Yes. Finally, this long-awaited feature is here: you can now attach to an already-forked daemon, and type commands inside it, remotely. Just enter get in (from "get inside") on your prefered CLI prompt.
I have been thinking about this remote debugger for a while. Crunching rlcompleter, readline and a derived InteractiveConsole in my head for days, It seemed really feasible and nearly easy, but I still missed a key to write it. I wasn't far from the result, which I found nearly complete in the rfoo python package.
Reading rfoo code inspired to me the #645, because it seems to have everything we need, while beeing much more shorter code-wise, and perhaps much faster than Pyro: the hard part of rfoo is implemented in C (via Cython). When performance problem arise in the cluster, I think we *will* have to switch to rfoo.
The easy part of the switch is that rfoo is "call compatible" with Pyro (both are completely transparent to python programs). Only the PyroProxyWithAttr cannot be implemented in rfoo, which seems wise enough from the security standpoint (everything in Licorn® will be reimplemented via methods to fullfit the security decorators, anyway).
All I have implemented ahead of Pyro seems to be re-usable "out-of-the-box" with rfoo, which makes me think the security decorators I'm thinking about will be, too.
On the road, while reading and derivating rconsole code for Licorn®, I found the fix for #582. I will surely offer myself a beer, or perhaps my collegues will offer it to me, If they read this. But the persons who really owe a beer are the rfoo developpers.
A big "thank you" to them. They saved me a great time, and showed me how to fix one the most obscure bugs.
NOTE: as a consequence, the local interactive console has vanished from the daemon. We can still interact with it, but this is becoming more and more useless, as much more things are available remotely (get status, get events, get inside), and usable more than once, in parallel, with different options.
Daemon monitor and live status
Recently I added 2 new functionnalities to the Licorn® daemon and the CLI interface. The first is a simple extension of the existing get status. There is now a -m CLI switch, which implies that the CLI process stays connected to the daemon, and updates the status on the terminal. The refresh delay and various parameters (fixed refresh count / maximum refresh time / clear screen between outputs) are configurable via other cli switches; see get status --help for details. Just remember this command:
get status -m
Default parameters are cool anyway (refresh time: 1 second, clear screen: yes, refresh count: infinite, max. refresh time: infinite). Just hit Control-C to stop the status updates when you're done.
The second functionnality could be seen as an internal sniffer, in the daemon. It's called monitor mode. With a new CLI command (get events, not to name it), you can access any internal event, decision or message generated in the daemon. This monitor mode will eventually supersede the LTRACE facility, but it's not very clear at this point because the monitor could impact the daemon performance much more than LTRACE (it's always compiled in, whereas LTRACE is usable only when the Python interpreter is in debug mode).
You use the monitor mode like you use LTRACE (levels are the same for both). E.g. to display std (standard) events:
get events
Or to dump only the inotifier related events:
get events -f inotifier
You can combine them to your liking, just the same you would do with LTRACE:
get events -f 'core^system|daemon^threads'
The bonus is the full sniffing mode: just add 'logging' to the facilities you want to follow, and every output message sent by the daemon to any other CLI command (run by you or by others) will be pushed to your local monitor session:
get events -f 'core|logging'
The bonus: you can even monitor a remote daemon from your local machine. Just set the special environment variable before launching the command:
export LICORN_SERVER='IP_or_hostname' get events
And you're done! Happy monitoring, and remember: these things can easily flood your terminal!
NOTE: currently, only inotifier and logging are fully converted to monitor mode. Other parts will soon follow. The rest of the code still uses the LTRACE facility.
New Event Manager
Every part of Licorn® can now emit events, and any other part of it can setup a callback to receive the event arguments, to do what is needed. Events callbacks can be run synchronously (with method L_event_run()) or not (with method L_event_dispatch()). In this case, there is no guarantee of callback order call, jobs can be parallel because they are handled by the service facility. Example:
Emitter side:
from licorn.daemon import priorities, InternalEvent
# L_event_dispatch is a builtin, defined by the daemon at start
# no need to import anything, to use it.
L_event_dispatch(priorities.NORMAL,
InternalEvent('main_configuration_file_changed'))
Receiver side:
[...]
def main_configuration_file_changed_callback():
""" this method will be auto-collected if the surrounding object
is a controller, or a CoreUnitObject whose controller defines
self.__look_deeper_for_callbacks (typically ModulesManager
does it). """
#
# do whatever needed with LMC.configuration, whose
# contents already reflect the change.
The EventManager is ready to operate even before the INotifier, and just after the LMC is setup. It will collect all callbacks of the controllers, backend and extensions on start.
It is stopped last, at the end of the daemon life.
Note: the InternalEvent instance can be given a callback argument (any callable() can apply for the job), to resync the Emitter at the end of event callback execution.
New inotifier (v4) finished, and a bunch of (small but cool) new features
I'm proud to announce the new inotifier rewrite (and its bunch of small enhancements), internally and lovely named "hopefully-this-one-will-work-as-expected" (private joke to me). It's shorter than previous version in terms of codelines, albeit more complex when dealing with special cases (large directories, multiple concurrent accesses to same files, re-born just-deleted files or dirs, etc). The new version is many times faster than all previous ones (including the external C-implemented gamin one). When you untar an archive, you can expect more or less the time of the untar process, after it finished, for complete ACLs application. Previously, it could take minutes to do the same (specifically when untarring the linux kernel in a shared dir). licornd is also very smart when talking about resources-consumption: it takes the CPU for ACLs intensive tasks (but only ONE CPU), and doesn't take it long. For what it has to do, I find it well balanced from the functionnality/resource point of view.
The new inotifier and related core.classes additions allow users homes to be watched now, and offer dedicated functionnalities to handle configuration files, and report *real* changes to them (not 'all access', generating a lot of false positives).
dnsmasq backend, privileges directly benefit from this new functionnalities. shadow configuration files watch is more robust and verifies everything when they reload (one could create inconsistencies, editing the files manually; this is taken in account).
There are still some rough edges and evil sub-sonic bugs (perhaps they are all the same, I can't hunt it down for now), but only on very-very heavily loaded systems, where users and groups pop in an out very fast. I will fix them in the next coding cycle.
Hopefully, you won't need the chk group command anymore. If you do, please provide a full trace:
export LTRACE=std licornd -rvD <whatever command in your other terminal>
In the new-but-small-but-cool features category, you'll find the command fuzzy matching:
get u get us get usr get users (and so one, with identical counterparts for add/mod/del/chk)
Will bring you the list of users. In the same kind:
get g -> groups get pro -> profiles get pri -> privileges get kw -> keywords
And so on. Everything is computed when you type it, there are no so-called "fixed values".
In the not-so-small-but-very-cool category, you will find that every part of Licorn® is now fully multi-lingual, on-the-fly: the daemon starts in the system lang, but every thread inside of it can switch to another language, and the client languages are pulled in from the web headers or the calling CLI environment. This makes everything dynamic, at will.
Documentation has been updated for permissions parts.
French tranlation is progressing notably: WMI part is finished, CLI is 90% done, and the rest is more or less 70% done (it doesn't matter anyway, as no user really sees it in real life).
I voluntarily don't mention the core object rewrite. It's very technical and doesn't bring new end-users functionalities, but guarantees that everything is cleaner and easier to extend inside licornd, regarding the users/groups/profiles/privileges/machines point of view.
I probably forgot many things here, but if I had written a book, you won't have read it anyway. Code and *use* the code is better. Many bugs have been fixed, and the code is generally more pythonic and lighter tht before: there are more generators, less hard-coded things, and abstractions (when necessary) got in the right places. At least, this how I wanted to implement them.
Enjoy,
User checks customization
Thanks to Robin who did all the hard work, chk can now be customized for users. This allow to avoid ACLs for a certain file hierarchy, or force some custom ones on another. For instance, I use these, in my ~/.licorn/check.conf:
source NOACL build NOACL Projects NOACL
This allow debian tools not to crash when building packages or scanning source trees.
There is a little loss in performances, but the win is clear compared to before. We will optimize the code on next performance run (there is a dedicated ticket for that).
System ACLs rules can be fully customized (with only a few sane exceptions) by administrators. For more details, there's a bunch of related documentation!
Licornd is now fully interactive
You can debug licornd in a live interactive session. Just start it in the foreground (licornd -D), and press 'i' . You can access every object and do whatever you want (trigger a method, dump an object, really whatever the python language allows you ; so be carefull, you're root...).
Bonus: everything is auto-completed with <TAB>, like any interactive shell can be, and your command history is saved to ~/.licorn/licornd_history (thanks to readline).
Press Control-D when you are done to leave interactive mode and return to standard command mode.
The interactive session is implemented in a separate thread. The licorn daemon stays fully functionnal during it. Stopping and restarting individual things manually will probably come in the near future, but as of now if your display gets corrupted with other daemon output, just hit Control-L to clear your screen.
Example session:
olive@desktop-001 ~/licorn @ licornd -D
* [2010/04/12 01:01:44.8412] licornd/master@server(5124): starting all threads.
* [2010/04/12 01:01:44.8454] licornd/wmi(5129): started, waiting for master to become ready.
* [2010/04/12 01:01:44.9980] licornd/master@server(5124): all threads started, going to sleep waiting for signals.
* [2010/04/12 01:01:45.4224] licornd/wmi(5129): ready to answer requests at address http://localhost:3356/.
* [2010/04/12 01:01:47.6714] Entering interactive mode. Welcome into licornd's arcanes…
Licorn® @DEVEL@, Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39) [GCC 4.4.5] on linux2
licornd> LMC.users.keys()
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 33, 34, 38, 39, 41, 300, 113, 100, 101, 102, 103, 1000, 105, 106, 107, 108, 109, 110, 111, 112, 104, 114, 115, 116, 117, 65534]
licornd> dqueues
{'pyrosys': <Queue.Queue instance at 0x9c7a18c>, 'reverse_dns': <Queue.Queue instance at 0x9c7a04c>, 'arppings': <Queue.Queue instance at 0x9c71eec>, 'pings': <Queue.Queue instance at 0x9c71dac>}
licornd>
* [2010/04/12 01:02:24.0383] Leaving interactive mode. Welcome back to Real World™.
* [2010/04/12 01:02:25.3743] licornd/master@server: signal 2 received, shutting down…
* [2010/04/12 01:02:25.3748] licornd/wmi: signal 15 received, shutting down…
* [2010/04/12 01:02:25.3751] licornd/wmi: exiting.
* [2010/04/12 01:02:25.5468] licornd/master@server: exiting (up 40 secs).
PS: i really love Python. These kind of things are just amazing.
PS2: the implementation can be incomplete, I didn't really test every object. Just report any bug you find and I will fix it ASAP.
Testing client/server communication
Defining LICORN_SERVER environment variable (in latest stable code) provides an easy way to test bi-directionnal communication between 2 or more Licorn® daemons installed on distinct machines.
WARNING: this is meant for testing and development purposes only. In production environments, there should be a DHCP server on the Licorn® machine with the SERVER role, and the CLIENTs will discover it automatically.
To use:
- be sure all machines are on the same IP network.
- set up one of the machine with licornd.role = SERVER in /etc/licorn/licorn.conf.
- find the IP address of this machine and remember it.
- set up all other machines with licornd.role = CLIENT
- open a Terminal on each, and run export LICORN_SERVER=<IP_address> with the IP address of your server.
- start all licornd, ending with the server one (else it won't detect the clients because they currently don't push their status to the server).
- alternatively, if you want to force detection of new clients started after the server, you can run add machines --discover <IP_subnet/mask> on the server, it will scan the subnet and register all new machines.
Besides this, be sure to set experimental.enabled = True on the SERVER machine, to get the Machines tab in the WMI.
You can now enjoy remotely shutting down client machines.

rss