dev.licorn.org migrated to a dedicated server, Licorn® project using Git and GitFlow
After some years of mutualized hosting, dev.licorn.org is now living on its own on a brand new Ubuntu 12.04 LTS server. It lives in a dedicated LXC container; the hosting machine is fully dedicated to the Licorn® project, running docs.licorn.org in another LXC, and many more things I wanted the project to have (a build machine, etc).
Meanwhile, we have switched from darcs to git. Having encountered the fatal #1520 bug has urged us to do so, because this bug just prevents us from working. But it's really the tip of the iceberg, because I wanted to switch to Git for a long time.
Even better, we now officially use GitFlow to follow the famous successful Git branching model, and I'm very happy with this clean approach.
Things will only go better now ;-)
Twisted web, thread pools and daemonization
The title of this blog post could have been a ticket summary like: WMI takes 100% of CPU when daemonized (with a lot of "WSGI application error" and EBADF crashes), but everything is OK when it stays attached to the current terminal. Luckily I found the solution before posting a new ticket and pushing changes to the branch ;-)
It made me suffer from a headache, because there is no documentation on this subject on Twisted website. Twisted folks tend to think everyone will use their twistd for daemon-purposes, which is a broken assumption when Twisted must just fit into an existing architecture.
I searched a lot on the daemonization side and cleaned my foundations.process.daemonize() function a little but this didn't help, anyway. Then Rob Golding gave me an hint about it, via a very generic « twisted django wsgi » google search with a very lot of luck.
I just moved the twisted imports after the daemonization call, and everything went fine again. Hopefully, importing twisted.web.version is still possible at the beginning of my wmi.py file without breaking this thread-pool thing, which makes the "please install twisted.web" message still possible too. Nice!
Benchmarking the WMI2 HTTPS
I wanted to know if the WMI2 can handle some correct traffic. It's a management interface, so I don't expect it to undergo real pressure, but we need a minimum.
Testing was performed on my development Virtual Machine, running on my MacbookAir? (late 2010):
cat /proc/cpuinfo
processor : 1
model name : Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz
cpu MHz : 1860.000
cpu cores : 2
free
total used free shared buffers cached
Mem: 1536928 1453196 83732 0 8500 101312
-/+ buffers/cache: 1343384 193544
Swap: 1570812 648604 922208
I first tried Apache Benchmark, but it showed a lot of SSL read failed - closing connection without any reason: all seems OK on the server side. Anyway, the results:
ab -n 1000 -c 50 -r https://localhost:3356/login/
[...]
SSL read failed - closing connection
SSL read failed - closing connection
[ more SSL errors ]
Completed 1000 requests
Finished 1000 requests
Server Software:
Server Hostname: localhost
Server Port: 3356
SSL/TLS Protocol: TLSv1/SSLv3,AES256-SHA,2048,256
Document Path: /login/
Document Length: 0 bytes
Concurrency Level: 50
Time taken for tests: 35.158 seconds
Complete requests: 1000
Failed requests: 1000
(Connect: 0, Receive: 0, Length: 1000, Exceptions: 0)
Write errors: 0
Total transferred: 4601000 bytes
HTML transferred: 4075000 bytes
Requests per second: 28.44 [#/sec] (mean)
Time per request: 1757.888 [ms] (mean)
Time per request: 35.158 [ms] (mean, across all concurrent requests)
Transfer rate: 127.80 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 129 1332 275.7 1381 1896
Processing: 14 387 278.8 373 1171
Waiting: 14 387 278.8 372 1171
Total: 373 1719 302.8 1740 2464
Percentage of the requests served within a certain time (ms)
50% 1740
66% 1821
75% 1908
80% 1945
90% 2073
95% 2172
98% 2283
99% 2431
100% 2464 (longest request)
Reading this, I decided to switch to HTTPerf, and it worked a lot better:
httperf --max-connections 50 --max-piped-calls 10 --num-conns 1000 --ssl \
--server localhost --port 3356
httperf --client=0/1 --server=localhost --port=3356 --uri=/ \
--max-connections=50 --max-piped-calls=10 --send-buffer=4096 \
--recv-buffer=16384 --ssl --num-conns=1000 --num-calls=1
httperf: warning: open file limit > FD_SETSIZE; limiting max. # of open files to FD_SETSIZE
Maximum connect burst length: 1
Total: connections 1000 requests 1000 replies 1000 test-duration 62.415 s
Connection rate: 16.0 conn/s (62.4 ms/conn, <=1 concurrent connections)
Connection time [ms]: min 39.5 avg 62.4 max 155.0 median 59.5 stddev 12.3
Connection time [ms]: connect 15.8
Connection length [replies/conn]: 1.000
Request rate: 16.0 req/s (62.4 ms/req)
Request size [B]: 62.0
Reply rate [replies/s]: min 13.8 avg 15.9 max 17.2 stddev 1.2 (12 samples)
Reply time [ms]: response 7.3 transfer 39.4
Reply size [B]: header 226.0 content 0.0 footer 2.0 (total 228.0)
Reply status: 1xx=0 2xx=0 3xx=1000 4xx=0 5xx=0
CPU time [s]: user 12.99 system 39.78 (user 20.8% system 63.7% total 84.5%)
Net I/O: 4.5 KB/s (0.0*10^6 bps)
Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0
So, these are just bare numbers for a no-real-at-all test, but at least the server responded successfully without crashing, which is fine for a quick test in a development branch. I can continue coding!
Meliae dump for licornd tonight… A first win, and a good but difficult perspective
On my production server, I noticed that licornd consumes 218Mib of RAM. Granted, this is the stable version (thus, the WMI still lies in licornd) and it holds 214540 directory watches, but I still find this is too much.
I thus gave Meliae a try, starting from this perfect example-based howto. Here is the result:
Total 961436 objects, 261 types, Total size = 197.5MiB (207061468 bytes)
Index Count % Size % Cum Max Kind
0 214540 22 117567920 56 56 548 Watch
1 466050 48 65864052 31 88 6976 str
2 2571 0 14768280 7 95 6291592 dict
3 216166 22 2593992 1 96 12 int
4 3291 0 1020956 0 97 178172 unicode
5 23461 2 939256 0 97 908 tuple
6 338 0 766784 0 98 6304 module
7 5554 0 377672 0 98 68 code
8 841 0 376768 0 98 448 type
9 1864 0 305696 0 98 164 _RLock
10 5321 0 297976 0 98 56 function
11 166 0 282200 0 99 1700 Group
12 508 0 278384 0 99 548 Machine
13 4474 0 210876 0 99 9336 list
14 811 0 154508 0 99 548 Enumeration
15 737 0 120868 0 99 164 _Condition
16 55 0 93500 0 99 1700 User
17 557 0 91348 0 99 164 _Event
18 165 0 90420 0 99 548 Distribution
19 2765 0 88480 0 99 32 builtin_function_or_method
Seems like the optimization I submitted to pyinotify is not installed on this machine. A quick approximation suggests we can go down to 130Mib with the patched pyinotify. After having installed it and restarted licornd, I was right. Now licornd eats only 132Mib. Yeah! That's a first 40% cut off :-)
The next question is: how to reduce all these str? They are most probably the pathnames of the watches (given the numbers). A first part of the optimization will be easy: they seem to be stored twice. I'll have to hunt this, but the answer is somewhere in the INotifier. Perhaps we store the pathnames, and so does pyinotify.
The second optimization will probably not be that easy: I assume the best way to reduce the remaining str will be to reimplement pyinotify with a tree (mimicing the FS) to avoid storing the dirname() part of every entry, instead of a list/dict of full pathnames.
This will surely help on the performance side too, in the critical process of comparing new events pathnames against already watched full pathnames as dict keys during watch creation / deletion. Currently, 10 minutes after the daemon launched, there are only 30K watches installed out of 214K. This is poor. OK, my HDD are slow and my machine is 5 years old — and loaded besides of that — but we can do better for sure.
*Morning note:* 7 hours after launch, there are still only 135K watches installed out of 214K. This is very poor. The tree is really needed. This helps explaining why licornd misses inotify events when it's fully loaded with a bunch of watches, and why untarring a kernel on a test machine with nothing watched works like a charm.
History note: I started from stackoverflow, and Meliae seemed to be the best tool to use. The nice fact is that in Licorn®, it's ultra simple:
get in
from meliae import scanner
scanner.dump_all_objects('/tmp/licornd.json')
del scanner
And then, after having patched Meliae against #876810:
ipython
from meliae import loader
om = loader.load('/tmp/licornd.json')
s = om.summarize(); s
WMI2 branch works on Debian
After a week-end of work, I finally managed to get Licorn® running on Debian Squeeze. Everything is not yet functionnal. Most notably the ServiceExtension class needs a little more love: because it assumes upstart is installed, which is obviously not the case on Debian ;-).
Basic users and groups management works in the WMI. I still need to test a bit more the whole things, but it goes well via the developer installation.
From another Licorn® machine running Ubuntu, the « Debian » thing is clearly visible (in the Machines tab of the screenshot) ;-)
Related screenshots:
Licorn® development report for period 2010-2011
I've been working hard lately to finish a report summing up my last two years of development on Licorn®. The report is in french because a great part of the development was sponsored by ADEME (see also Energy Efficiency in Europe), which is a French organization.
Focuses of the document :
- brief presentation,
- Licorn® structural advantages / weaknesses,
- what we have accomplished in two years,
- the current architecture
- still open developments
If you can read french and want to know more about the internals and externals of Licorn®, this is a must read. The document is far from exhaustive, but gives a good insight of how Licorn® works. If you prefer to learn what Licorn® do, head up to the documentation website.
Download the report (PDF, french, 55 pages, 22Mb).
WMI2 final architecture
I've finally managed to finish the WMI2 schemas. I created them for the needs of a scientific report, sponsored by ADEME ( Agence de l'Environnement et de la Maîtrise de l'Énergie; Agency for Environment and Energy proficiency). 2 schemata representing licornd internal architecture, before and after the work partially sponsored by ADEME and META IT (the company I work for).
Architecture of Licorn®, 1 year ago:
Architecture of Licorn®, as in the WMI2 development branch (undergoing review for merge in the stable tree). This schema is an enhanced version of the one published in the previous blogpost:
The schemata still need to be annontated to be fully understandable. They are not strictly formal in any way. They are just meant to illustrate how licorn works internaly to be understandable by a newcomer. At some time, they will be — and are already, partially — explained on our documentation website.
Upcoming WMI2 internals (and new repo in Trac)
The following schema tries to explain how the future WMI2 is working. This architecture is already fully functionnal in our development repository and will enter stable branch soon (official due date: March 5th, 2012). We are currently in the process of polishing it before releasing the patch.
The new WMI will be a huge step forward interactive and web-2.0-like interface ; everything can be handled asynchronously, and the WMI can update any part of its interface without refreshing the whole page. It is even a fully-featured Django + Jinja2 + jQuery application, which runs on top of our new webserver. The webserver is fully WSGI compliant, and built on top of the great gevent co-routine-based library.
As a side note, I've now integrated the WMI2 repository into Trac to help follow the changes and global Licorn® activity. Without this, one could think that Licorn® development is halted, because it all happens in this separate branch, until its merged into the stable one.
Easily push development code
This makes team-development easier, by pushing patches very quick between our repos. In your .bashrc, insert this code, and adapt $REPO to fit your needs :
function push () {
REPO="dev.licorn.org:/home/groups/licorn.wmi2"
make darcs_record_prehook
darcs wh -l
echo -n "OK to record? ([additionnal record message] + [Enter] or Control-C to quit): "
read -e MESSAGE
if [ -n "${MESSAGE}" ]; then
darcs rec -a -m "Work in progress `date '+%Y-%m-%d %H:%M:%S'` (${MESSAGE})."
else
darcs rec -a -m "Work in progress `date '+%Y-%m-%d %H:%M:%S'`.";
fi
echo -n "OK to push? ([Enter] or Control-C to abort): "
read DUMMY
darcs push -a ${REPO}
}
D3 and consequent graphical updates in the WMI
I just found D3, while researching for various things related to Licorn® evolutions. I'm pretty impressed (the word is weak).
I just experimented a little with it, and the result is very cool. See before:
And After:
What you can't see it that the graph is completely dynamic, made of SVG, and refreshed every 5 seconds with smooth animations, scale morphing, label smart-placing and other cool things. There are still some rough edges (in the way scale lines come in), but this is just an experiment. Which is very very positive, BTW.
Hope you like it.
Documentation updates (modules states, small reorganization)
I've just updated the Licorn® documentation:
- the modules states are now clearly defined; the code is not yet up-to-date, but it's the next thing to do on core, backends and extensions.
- backends documentation is a little more clean than before. The URL to access it is better spelled and doesn't display a 403 HTTP error anymore: http://docs.licorn.org/core/backends.
- French translation has progressed a little, on the related subjects.
Remote Interactive Console
Yes. Finally, this long-awaited feature is here: you can now attach to an already-forked daemon, and type commands inside it, remotely. Just enter get in (from "get inside") on your prefered CLI prompt.
I have been thinking about this remote debugger for a while. Crunching rlcompleter, readline and a derived InteractiveConsole in my head for days, It seemed really feasible and nearly easy, but I still missed a key to write it. I wasn't far from the result, which I found nearly complete in the rfoo python package.
Reading rfoo code inspired to me the #645, because it seems to have everything we need, while beeing much more shorter code-wise, and perhaps much faster than Pyro: the hard part of rfoo is implemented in C (via Cython). When performance problem arise in the cluster, I think we *will* have to switch to rfoo.
The easy part of the switch is that rfoo is "call compatible" with Pyro (both are completely transparent to python programs). Only the PyroProxyWithAttr cannot be implemented in rfoo, which seems wise enough from the security standpoint (everything in Licorn® will be reimplemented via methods to fullfit the security decorators, anyway).
All I have implemented ahead of Pyro seems to be re-usable "out-of-the-box" with rfoo, which makes me think the security decorators I'm thinking about will be, too.
On the road, while reading and derivating rconsole code for Licorn®, I found the fix for #582. I will surely offer myself a beer, or perhaps my collegues will offer it to me, If they read this. But the persons who really owe a beer are the rfoo developpers.
A big "thank you" to them. They saved me a great time, and showed me how to fix one the most obscure bugs.
NOTE: as a consequence, the local interactive console has vanished from the daemon. We can still interact with it, but this is becoming more and more useless, as much more things are available remotely (get status, get events, get inside), and usable more than once, in parallel, with different options.
My Darcs boring file
This file contains filenames that darcs should not care about. So here we go (only the relevant part), in _darcs/prefs/boring on every of your repos:
^tests/data/scenarii ^tests/data/\..*_status ^tests/data/\.owner ^tests/data/wmitest docs/_build ^locale/fr ^interfaces/wmi/.*_donnut.png ^interfaces/wmi/js/json \.pye$ ^tailoritself\. \.mo$
Note for later: picklable bound methods
I don't know if it's interesting, but still a worthly read.
Daemon monitor and live status
Recently I added 2 new functionnalities to the Licorn® daemon and the CLI interface. The first is a simple extension of the existing get status. There is now a -m CLI switch, which implies that the CLI process stays connected to the daemon, and updates the status on the terminal. The refresh delay and various parameters (fixed refresh count / maximum refresh time / clear screen between outputs) are configurable via other cli switches; see get status --help for details. Just remember this command:
get status -m
Default parameters are cool anyway (refresh time: 1 second, clear screen: yes, refresh count: infinite, max. refresh time: infinite). Just hit Control-C to stop the status updates when you're done.
The second functionnality could be seen as an internal sniffer, in the daemon. It's called monitor mode. With a new CLI command (get events, not to name it), you can access any internal event, decision or message generated in the daemon. This monitor mode will eventually supersede the LTRACE facility, but it's not very clear at this point because the monitor could impact the daemon performance much more than LTRACE (it's always compiled in, whereas LTRACE is usable only when the Python interpreter is in debug mode).
You use the monitor mode like you use LTRACE (levels are the same for both). E.g. to display std (standard) events:
get events
Or to dump only the inotifier related events:
get events -f inotifier
You can combine them to your liking, just the same you would do with LTRACE:
get events -f 'core^system|daemon^threads'
The bonus is the full sniffing mode: just add 'logging' to the facilities you want to follow, and every output message sent by the daemon to any other CLI command (run by you or by others) will be pushed to your local monitor session:
get events -f 'core|logging'
The bonus: you can even monitor a remote daemon from your local machine. Just set the special environment variable before launching the command:
export LICORN_SERVER='IP_or_hostname' get events
And you're done! Happy monitoring, and remember: these things can easily flood your terminal!
NOTE: currently, only inotifier and logging are fully converted to monitor mode. Other parts will soon follow. The rest of the code still uses the LTRACE facility.
rsyncd server basic configuration on Licorn®
In a few words, on the server side (Ubuntu Maverick):
/etc/rsyncd.conf:
[Sauvegardes] # not beiing root seems safer to me. uid = rsync # the gid is the tricky part and must be forced, else tranfers fail. # rsp-* implies the daemon on the server side has always full permissions. gid = rsp-Sauvegardes path = /home/groups/Sauvegardes comment = Sauvegardes read only = no
/etc/default/rsync:
RSYNC_ENABLE=true RSYNC_OPTS='--address=192.168.111.1' RSYNC_NICE='' RSYNC_IONICE='-c3'
in the root shell:
# not that needed, but seems clean to me add group --system rsync # this one is really needed add user --system rsync --force # not that needed too, but seems clean to me add user rsync rsp-Sauvegardes
On the client side:
rsync -a <other options> dir1 dir2 ... leto::Sauvegardes/
Speed note: my server Leto beiing a "poor" and old monocore AMD Sempron, using rsyncd permits to speed up the transfers up to the client disk reading speed (roughly 25Mb/s in best moments) using only 30% CPU, instead of being CPU clamped to 100% with ~8Mb/s maximum speed when wrapped inside SSH (which is totally useless on a trusted local network, IMHO).
Smarter (fancyer?) console output for CLI tools
You may have noticed or not, but recent changes in the core lead to enhanced console output for get CLI tool. We use utf-8 as much as possible, to include icons and special characters and colors which give a bunch of informations without cluttering your terminal with things to read.
Thanks to the recent core rewrite, we have now high-level methods and properties for building these output informations, and get has an internal cache / invalidation mechanism to avoid wasting CPU cycles.
New things coming for the TestSuite
I'm currently reading a nice TS-framework-related article, and some others (search for " nose versus py.test with google"), and I'm seriously thinking about (sort of) getting rid of, or enhancing a lot our home-made testsuite.
Sure, our testsuite has advantages of very-small-scenarii writing, very-high-level (command combinations, varying context, etc) testing, auto-included teardown commands and much-much-beautiful output (very human friendly IMHO). But we now need low-level unit testing in Licorn®, and the TS can't handle this without a consequent rewrite.
There is currently no way to test something as simple as a method raising an exception when given bad input, and I miss this. BTW, now that our code base is increasing fast, I'm looking for a way to split core.py into smaller files, and spread the test methods in the relevant part of Licorn®. Neither to say that auto-collecting test methods would be more clever than needing to chain them in core.py.
Perhaps we will acheive some sort of combinations between the two worlds. Beiing capable to test things in the background while hand-checking a current failure is something I'm not yet ready to loose.
The new ConfigurationFile class (currently beiing written and tested, thus the research and this blog entry), based on the pygments parser is something that needs to be very-very carefully tested before use, and the current TS can't do this easily.
More to comes in the very-near future, both works are progressing alongside.
PS: additionnaly, have a look at django specific testing. Generaly speaking, This might me a worth read, too.
Mounting all volumes
With #500 implemented, Licorn® now supports mounting any kind of volumes on the local machine, provided you add extensions.volumes.mount_all_fs = True in your /etc/licorn/licorn.conf file.
This makes licornd provide the same functionnalities as udisks, allowing to avoid it completely when you run licornd on your laptop or desktop machine.
End-User-formatted (vfat/nfts/iso9660/udf) volumes will be mounted with the UID of the console user. Other types (ext*/btrfs/reiserfs/xfs/jfs) will be mounted the standard way, with acl and user_xattr options.
If for any reason you plug your device in before logging in on your machine, just launch the following commands:
del vol -a add vol -a
And licornd will remount the volumes accordingly for you.
Note: I didn't test the behavior when logged in remotely. I don't think an ssh remote login will be treated like a normal local login. This will probably be implemented in a future enhancement if needed.
New Event Manager
Every part of Licorn® can now emit events, and any other part of it can setup a callback to receive the event arguments, to do what is needed. Events callbacks can be run synchronously (with method L_event_run()) or not (with method L_event_dispatch()). In this case, there is no guarantee of callback order call, jobs can be parallel because they are handled by the service facility. Example:
Emitter side:
from licorn.daemon import priorities, InternalEvent
# L_event_dispatch is a builtin, defined by the daemon at start
# no need to import anything, to use it.
L_event_dispatch(priorities.NORMAL,
InternalEvent('main_configuration_file_changed'))
Receiver side:
[...]
def main_configuration_file_changed_callback():
""" this method will be auto-collected if the surrounding object
is a controller, or a CoreUnitObject whose controller defines
self.__look_deeper_for_callbacks (typically ModulesManager
does it). """
#
# do whatever needed with LMC.configuration, whose
# contents already reflect the change.
The EventManager is ready to operate even before the INotifier, and just after the LMC is setup. It will collect all callbacks of the controllers, backend and extensions on start.
It is stopped last, at the end of the daemon life.
Note: the InternalEvent instance can be given a callback argument (any callable() can apply for the job), to resync the Emitter at the end of event callback execution.

rss




.png)

