December 07, 2021

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

Rblpapi 0.3.12: Fixes and Updates

The Rblp team is happy to announce a new version 0.3.12 of Rblpapi which just arrived at CRAN. Rblpapi provides a direct interface between R and the Bloomberg Terminal via the C++ API provided by Bloomberg (but note that a valid Bloomberg license and installation is required).

This is the twelveth release since the package first appeared on CRAN in 2016. Changes are detailed below and include both extensions to functionality, actual bug fixes and changes to the package setup. Special thanks goes to Michael Kerber, Yihui Xie and Kai Lin for contributing pull requests!

Changes in Rblpapi version 0.3.12 (2021-12-07)

  • bdh() supports new option returnAs (Michael Kerber and Dirk in #335 fixing #206)

  • Remove extra backtick in vignette (Yihui Xie in #343)

  • Fix a segfault from bulk access with bds (Kai Lin in #347 fixing #253)

  • Support REQUEST_STATUS in bdh (Kai Lin and John in #349 fixing #348)

  • Vignette now uses simplermarkdown (Dirk in #350)

Courtesy of my CRANberries, there is also a diffstat report for the this release. As always, more detailed information is on the Rblpapi page. Questions, comments etc should go to the issue tickets system at the GitHub repo.

If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

07 December, 2021 01:39PM

Russell Coker


The IBM i operating system on the AS/400 is a system that runs on PPC for “midrange” systems. I did a bit of reading about it after seeing an AS/400 on ebay for $300, if I had a lot more spare time and energy I might have put in a bid for that if it didn’t look like it had been left out in the rain. It seems that AS/400 is not dead, there are cloud services available, here’s one that provides a VM with 2GM of RAM for “only EUR 251 monthly” [1], wow. I’m not qualified to comment on whether that’s good value, but I think it’s worth noting that a Linux VM running an AMD64 CPU with similar storage and the same RAM can be expected to cost about $10 per month.

There is also a free AS/400 cloud named pub400 [2], this is the type of thing I’d do if I had my own AS/400.

07 December, 2021 03:08AM by etbe

December 06, 2021

hackergotchi for Jonathan Dowland

Jonathan Dowland

Sixth Annual UK System Research Challenges Workshop lightning talk

me looking awkward, thanks [Mark Little](

me looking awkward, thanks Mark Little

Last week I attended the UK Systems Research 2021 conference in County Durham, my first conference in nearly two years (since FOSDEM 2020, right on the cusp of the Pandemic). The Systems conference community is very pleasant and welcoming and so when I heard it was going to take place "physically" again this year I was so keen to attend I decided to hedge my bets and submit two talk proposals. I wasn't expecting them both to be accepted…

As well as the regular talks (more on those in another post) there is a tradition for people to give short, impromptu lightning talks after dinner on the second night. I've given two of these before, and I'd been considering whether to offer to one this time or not, but with two talks to deliver (and finish writing) I wasn't sure. Usually people talk about something interesting that they have been doing besides their research or day-jobs, but the last two years have been somewhat difficult and I didn't really think I had a topic to talk about. Then I wondered if that was a topic in itself…

During the first day of the conference (and especially one I'd got past one of my talks) I started to outline a lightning talk idea and it seemed to come out well enough that I thought I'd give it a go. Unusually I therefore had something written down and I was surprised how well it was received, so I thought I'd share it. Here it is:

I was anticipating the lightning talks and being cajoled into talking about something. I've done it twice before. So I've been racking my brains to figure out if I've done anything interesting enough to talk about.

in 2018 I talked about some hack I'd made to the classic computer game Doom from 1993. I've done several hacks to Doom that I could probably talk about except I've become a bit uncomfortable about increasingly being thought of as "that doom guy". I'd been reflecting on why it was that I continued to mess about with that game in the first place and I realised it was a form of expression: I was treating Doom like a canvas.

I've spent most of my career thinking about what I do in the frame of either science or engineering. I suffer from the creative urge and I've often expressed (and sated) that through my work. And that's possible because there's a craft in what we do.

In 2019 I talked about a project I'd embarked on to resurrect my childhood computer, a Commodore Amiga 500, in order to rescue my childhood drawings and digital paintings. (There's the artistic thing again). I'd achieved that and I have ambitions to do some more Amiga stuff but again that's a work in progress and there's nothing much to talk about.

In recent years I've been thinking more and more about art and became interested in the works and writings of people like Grayson Perry, Laurie Anderson and Brian Eno. I first learned about Eno through his music but he's also a visual artist. and a music producer. As a producer in the 70s he co-invented a system to try and break out of writer's block called "oblique strategies": A deck of cards with oblique suggestions written on them. When you're stuck, you pull a card and it might help you to reframe what you are working on and think about it in a completely different way.

I love this idea and I think we should use more things like that in software engineering at least.

So back to casting about for something to talk about. What have I been doing in the last couple of years? Frankly, surviving - I've just about managed to keep doing my day job, and keep working on the PhD, at home with two young kids and home schooling and the rest of it. Which is an achievement but makes for a boring lightning talk. But I'd like to say that for anyone here who might have been worrying similarly: I think surviving is more than enough.

I'll close on the subject of thinking like an artist and not an engineer. I brought some of the Oblique Strategies deck with me and I thought I'd draw a card to perhaps help you out of a creative dilemma if you're in one. And I kid you not, the first card I drew was this one:

Card reading 'You are an Engineer'

06 December, 2021 10:04PM

Matthias Klumpp

New things in AppStream 0.15

On the road to AppStream 1.0, a lot of items from the long todo list have been done so far – only one major feature is remaining, external release descriptions, which is a tricky one to implement and specify. For AppStream 1.0 it needs to be present or be rejected though, as it would be a major change in how release data is handled in AppStream.

Besides 1.0 preparation work, the recent 0.15 release and the releases before it come with their very own large set of changes, that are worth a look and may be interesting for your application to support. But first, for a change that affects the implementation and not the XML format:

1. Completely rewritten caching code

Keeping all AppStream data in memory is expensive, especially if the data is huge (as on Debian and Ubuntu with their large repositories generated from desktop-entry files as well) and if processes using AppStream are long-running. The latter is more and more the case, not only does GNOME Software run in the background, KDE uses AppStream in KRunner and Phosh will use it too for reading form factor information. Therefore, AppStream via libappstream provides an on-disk cache that is memory-mapped, so data is only consuming RAM if we are actually doing anything with it.

Previously, AppStream used an LMDB-based cache in the background, with indices for fulltext search and other common search operations. This was a very fast solution, but also came with limitations, LMDB’s maximum key size of 511 bytes became a problem quite often, adjusting the maximum database size (since it has to be set at opening time) was annoyingly tricky, and building dedicated indices for each search operation was very inflexible. In addition to that, the caching code was changed multiple times in the past to allow system-wide metadata to be cached per-user, as some distributions didn’t (want to) build a system-wide cache and therefore ran into performance issues when XML was parsed repeatedly for generation of a temporary cache. In addition to all that, the cache was designed around the concept of “one cache for data from all sources”, which meant that we had to rebuild it entirely if just a small aspect changed, like a MetaInfo file being added to /usr/share/metainfo, which was very inefficient.

To shorten a long story, the old caching code was rewritten with the new concepts of caches not necessarily being system-wide and caches existing for more fine-grained groups of files in mind. The new caching code uses Richard Hughes’ excellent libxmlb internally for memory-mapped data storage. Unlike LMDB, libxmlb knows about the XML document model, so queries can be much more powerful and we do not need to build indices manually. The library is also already used by GNOME Software and fwupd for parsing of (refined) AppStream metadata, so it works quite well for that usecase. As a result, search queries via libappstream are now a bit slower (very much depends on the query, roughly 20% on average), but can be mmuch more powerful. The caching code is a lot more robust, which should speed up startup time of applications. And in addition to all of that, the AsPool class has gained a flag to allow it to monitor AppStream source data for changes and refresh the cache fully automatically and transparently in the background.

All software written against the previous version of the libappstream library should continue to work with the new caching code, but to make use of some of the new features, software using it may need adjustments. A lot of methods have been deprecated too now.

2. Experimental compose support

Compiling MetaInfo and other metadata into AppStream collection metadata, extracting icons, language information, refining data and caching media is an involved process. The appstream-generator tool does this very well for data from Linux distribution sources, but the tool is also pretty “heavyweight” with lots of knobs to adjust, an underlying database and a complex algorithm for icon extraction. Embedding it into other tools via anything else but its command-line API is also not easy (due to D’s GC initialization, and because it was never written with that feature in mind). Sometimes a simpler tool is all you need, so the libappstream-compose library as well as appstreamcli compose are being developed at the moment. The library contains building blocks for developing a tool like appstream-generator while the cli tool allows to simply extract metadata from any directory tree, which can be used by e.g. Flatpak. For this to work well, a lot of appstream-generator‘s D code is translated into plain C, so the implementation stays identical but the language changes.

Ultimately, the generator tool will use libappstream-compose for any general data refinement, and only implement things necessary to extract data from the archive of distributions. New applications (e.g. for new bundling systems and other purposes) can then use the same building blocks to implement new data generators similar to appstream-generator with ease, sharing much of the code that would be identical between implementations anyway.

2. Supporting user input controls

Want to advertise that your application supports touch input? Keyboard input? Has support for graphics tablets? Gamepads? Sure, nothing is easier than that with the new control relation item and supports relation kind (since 0.12.11 / 0.15.0, details):


3. Defining minimum display size requirements

Some applications are unusable below a certain window size, so you do not want to display them in a software center that is running on a device with a small screen, like a phone. In order to encode this information in a flexible way, AppStream now contains a display_length relation item to require or recommend a minimum (or maximum) display size that the described GUI application can work with. For example:

  <display_length compare="ge">360</display_length>

This will make the application require a display length greater or equal to 300 logical pixels. A logical pixel (also device independent pixel) is the amount of pixels that the application can draw in one direction. Since screens, especially phone screens but also screens on a desktop, can be rotated, the display_length value will be checked against the longest edge of a display by default (by explicitly specifying the shorter edge, this can be changed).

This feature is available since 0.13.0, details. See also Tobias Bernard’s blog entry on this topic.

4. Tags

This is a feature that was originally requested for the LVFS/fwupd, but one of the great things about AppStream is that we can take very project-specific ideas and generalize them so something comes out of them that is useful for many. The new tags tag allows people to tag components with an arbitrary namespaced string. This can be useful for project-internal organization of applications, as well as to convey certain additional properties to a software center, e.g. an application could mark itself as “featured” in a specific software center only. Metadata generators may also add their own tags to components to improve organization. AppStream gives no recommendations as to how these tags are to be interpreted except for them being a strictly optional feature. So any meaning is something clients and metadata authors need to negotiate. It therefore is a more specialized usecase of the already existing custom tag, and I expect it to be primarily useful within larger organizations that produce a lot of software components that need sorting. For example:

  <tag namespace="lvfs">vendor-2021q1</tag>
  <tag namespace="plasma">featured</tag>

This feature is available since 0.15.0, details.

5. MetaInfo Creator changes

The MetaInfo Creator (source) tool is a very simple web application that provides you with a form to fill out and will then generate MetaInfo XML to add to your project after you have answered all of its questions. It is an easy way for developers to add the required metadata without having to read the specification or any guides at all.

Recently, I added support for the new control and display_length tags, resolved a few minor issues and also added a button to instantly copy the generated output to clipboard so people can paste it into their project. If you want to create a new MetaInfo file, this tool is the best way to do it!

The creator tool will also not transfer any data out of your webbrowser, it is strictly a client-side application.

And that is about it for the most notable changes in AppStream land! Of course there is a lot more, additional tags for the LVFS and content rating have been added, lots of bugs have been squashed, the documentation has been refined a lot and the library has gained a lot of new API to make building software centers easier. Still, there is a lot to do and quite a few open feature requests too. Onwards to 1.0!

06 December, 2021 05:40PM by Matthias

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

tidyCpp 0.0.6 on CRAN: Package Maintenance

Another small release of the tidyCpp package arrived on CRAN this morning. The packages offers a clean C++ layer (as well as one small C++ helper class) on top of the C API for R which aims to make use of this robust (if awkward) C API a little easier and more consistent. See the vignette for motivating examples.

This release makes a tiny code change, remove a YAML file for the disgraced former continuous integration service we shall not name (yet that we all used to use). And just like digest five days ago, drat four days ago, littler three days ago, RcppAPT two days ago, and RcppSpdlog yesterday, we converted the vignettes from using the minidown package to the (fairly new) simplermarkdown package which is so much more appropriate for our use of the minimal water.css style.

The NEWS entry follows.

Changes in tidyCpp version 0.0.6 (2021-12-06)

  • Assign nullptr in dtor for Protect class

  • Switch vignette engine to simplermarkdown

Thanks to my CRANberries, there is also a diffstat report for this release.

For questions, suggestions, or issues please use the issue tracker at the GitHub repo.

If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

06 December, 2021 02:16PM

December 05, 2021

Antoine Beaupré

mbsync vs OfflineIMAP

After recovering from my latest email crash (previously, previously), I had to figure out which tool I should be using. I had many options but I figured I would start with a popular one (mbsync).

But I also evaluated OfflineIMAP which was resurrected from the Python 2 apocalypse, and because I had used it before, for a long time.

Read on for the details.

Benchmark setup

All programs were tested against a Dovecot 1:2.3.13+dfsg1-2 server, running Debian bullseye.

The client is a Purism 13v4 laptop with a Samsung SSD 970 EVO 1TB NVMe drive.

The server is a custom build with a AMD Ryzen 5 2600 CPU, and a RAID-1 array made of two NVMe drives (Intel SSDPEKNW010T8 and WDC WDS100T2B0C).

The mail spool I am testing against has almost 400k messages and takes 13GB of disk space:

$ notmuch count --exclude=false
$ du -sh --exclude xapian Maildir
13G Maildir

The baseline we are comparing against is SMD (syncmaildir) which performs the sync in about 7-8 seconds locally (3.5 seconds for each push/pull command) and about 10-12 seconds remotely.

Anything close to that or better is good enough. I do not have recent numbers for a SMD full sync baseline, but the setup documentation mentions 20 minutes for a full sync. That was a few years ago, and the spool has obviously grown since then, so that is not a reliable baseline.

A baseline for a full sync might be also set with rsync, which copies files at nearly 40MB/s, or 317Mb/s!

anarcat@angela:tmp(main)$ time rsync -a --info=progress2 --exclude xapian Maildir/
 12,647,814,731 100%   37.85MB/s    0:05:18 (xfr#394981, to-chk=0/395815)    
72.38user 106.10system 5:19.59elapsed 55%CPU (0avgtext+0avgdata 15988maxresident)k
8816inputs+26305112outputs (0major+50953minor)pagefaults 0swaps

That is 5 minutes to transfer the entire spool. Incremental syncs are obviously pretty fast too:

anarcat@angela:tmp(main)$ time rsync -a --info=progress2 --exclude xapian Maildir/
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/395815)    
1.42user 0.81system 0:03.31elapsed 67%CPU (0avgtext+0avgdata 14100maxresident)k
120inputs+0outputs (3major+12709minor)pagefaults 0swaps

As an extra curiosity, here's the performance with tar, pretty similar with rsync, minus incremental which I cannot be bothered to figure out right now:

anarcat@angela:tmp(main)$ time ssh tar --exclude xapian -cf - Maildir/ | pv -s 13G | tar xf - 
56.68user 58.86system 5:17.08elapsed 36%CPU (0avgtext+0avgdata 8764maxresident)k
0inputs+0outputs (0major+7266minor)pagefaults 0swaps
12,1GiO 0:05:17 [39,0MiB/s] [===================================================================> ] 92%

Interesting that rsync manages to almost beat a plain tar on file transfer, I'm actually surprised by how well it performs here, considering there are many little files to transfer.

(But then again, this maybe is exactly where rsync shines: while tar needs to glue all those little files together, rsync can just directly talk to the other side and tell it to do live changes. Something to look at in another article maybe?)

Since both ends are NVMe drives, those should easily saturate a gigabit link. And in fact, a backup of the server mail spool achieves much faster transfer rate on disks:

anarcat@marcos:~$ tar fc - Maildir | pv -s 13G > Maildir.tar
15,0GiO 0:01:57 [ 131MiB/s] [===================================] 115%

That's 131Mibyyte per second, vastly faster than the gigabit link. The client has similar performance:

anarcat@angela:~(main)$ tar fc - Maildir | pv -s 17G > Maildir.tar
16,2GiO 0:02:22 [ 116MiB/s] [==================================] 95%

So those disks should be able to saturate a gigabit link, and they are not the bottleneck on fast links. Which begs the question of what is blocking performance of a similar transfer over the gigabit link, but that's another question altogether, because no sync program ever reaches the above performance anyways.

Finally, note that when I migrated to SMD, I wrote a small performance comparison that could be interesting here. It show SMD to be faster than OfflineIMAP, but not as much as we see here. In fact, it looks like OfflineIMAP slowed down significantly since then (May 2018), but this could be due to my larger mail spool as well.


The isync (AKA mbsync) project is written in C and supports syncing Maildir and IMAP folders, with possibly multiple replicas. I haven't tested this but I suspect it might be possible to sync between two IMAP servers as well. It supports partial mirorrs, message flags, full folder support, and "trash" functionality.

Complex configuration file

I started with this .mbsyncrc configuration file:

SyncState *
Sync New ReNew Flags

IMAPAccount anarcat
User anarcat
PassCmd "pass"
CertificateFile /etc/ssl/certs/ca-certificates.crt

IMAPStore anarcat-remote
Account anarcat

MaildirStore anarcat-local
# Maildir/top/sub/sub
#SubFolders Verbatim
# Maildir/.top.sub.sub
SubFolders Maildir++
# Maildir/top/.sub/.sub
# SubFolders legacy
# The trailing "/" is important
#Path ~/Maildir-mbsync/
Inbox ~/Maildir-mbsync/

Channel anarcat
# AKA Far, convert when all clients are 1.4+
Master :anarcat-remote:
# AKA Near
Slave :anarcat-local:
# Exclude everything under the internal [Gmail] folder, except the interesting folders
#Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
# Or include everything
Patterns *
# Automatically create missing mailboxes, both locally and on the server
#Create Both
Create slave
# Sync the movement of messages between folders and deletions, add after making sure the sync works
#Expunge Both

Long gone are the days where I would spend a long time reading a manual page to figure out the meaning of every option. If that's your thing, you might like this one. But I'm more of a "EXAMPLES section" kind of person now, and I somehow couldn't find a sample file on the website. I started from the Arch wiki one but it's actually not great because it's made for Gmail (which is not a usual Dovecot server). So a sample config file in the manpage would be a great addition. Thankfully, the Debian packages ships one in /usr/share/doc/isync/examples/mbsyncrc.sample but I only found that after I wrote my configuration. It was still useful and I recommend people take a look if they want to understand the syntax.

Also, that syntax is a little overly complicated. For example, Far needs colons, like:

Far :anarcat-remote:

Why? That seems just too complicated. I also found that sections are not clearly identified: IMAPAccount and Channel mark section beginnings, for example, which is not at all obvious until you learn about mbsync's internals. There are also weird ordering issues: the SyncState option needs to be before IMAPAccount, presumably because it's global.

Using a more standard format like .INI or TOML could improve that situation.

Stellar performance

A transfer of the entire mail spool takes 56 minutes and 6 seconds, which is impressive.

It's not quite "line rate": the resulting mail spool was 12GB (which is a problem, see below), which turns out to be about 29Mbit/s and therefore not maxing the gigabit link, and an order of magnitude slower than rsync.

The incremental runs are roughly 2 seconds, which is even more impressive, as that's actually faster than rsync:

===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.015       0.052       1.930       2.029       2.105       
user        0.660       0.040       0.592       0.661       0.722       
sys         0.338       0.033       0.268       0.341       0.387    

Those tests were performed with isync 1.3.0-2.2 on Debian bullseye. Tests with a newer isync release originally failed because of a corrupted message that triggered bug 999804 (see below). Running 1.4.3 under valgrind works around the bug, but adds a 50% performance cost, the full sync running in 1h35m.

Once the upstream patch is applied, performance with 1.4.3 is fairly similar, considering that the new sync included the register folder with 4000 messages:

120.74user 213.19system 59:47.69elapsed 9%CPU (0avgtext+0avgdata 105420maxresident)k
29128inputs+28284376outputs (0major+45711minor)pagefaults 0swaps

That is ~13GB in ~60 minutes, which gives us 28.3Mbps. Incrementals are also pretty similar to 1.3.x, again considering the double-connect cost:

===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.500       0.087       2.340       2.491       2.629       
user        0.718       0.037       0.679       0.711       0.793       
sys         0.322       0.024       0.284       0.320       0.365

Those tests were all done on a Gigabit link, but what happens on a slower link? My server uplink is slow: 25 Mbps down, 6 Mbps up. There mbsync is worse than the SMD baseline:

===> multitime results
1: mbsync -a
Mean        Std.Dev.    Min         Median      Max
real        31.531      0.724       30.764      31.271      33.100      
user        1.858       0.125       1.721       1.818       2.131       
sys         0.610       0.063       0.506       0.600       0.695       

That's 30 seconds for a sync, which is an order of magnitude slower than SMD.

Great user interface

Compared to OfflineIMAP and (ahem) SMD, the mbsync UI is kind of neat:

anarcat@angela:~(main)$ mbsync -a
Notice: Master/Slave are deprecated; use Far/Near instead.
C: 1/2  B: 204/205  F: +0/0 *0/0 #0/0  N: +1/200 *0/0 #0/0

(Note that nice switch away from slavery-related terms too.)

The display is minimal, and yet informative. It's not obvious what does mean at first glance, but the manpage is useful at least for clarifying that:

This represents the cumulative progress over channels, boxes, and messages affected on the far and near side, respectively. The message counts represent added messages, messages with updated flags, and trashed messages, respectively. No attempt is made to calculate the totals in advance, so they grow over time as more information is gathered. (Emphasis mine).

In other words:

  • C 2/2: channels done/total (2 done out of 2)
  • B 204/205: mailboxes done/total (204 out of 205)
  • F: changes on the far side
  • N: +10/200 *0/0 #0/0: changes on the "near" side:
    • +10/200: 10 out of 200 messages downloaded
    • *0/0: no flag changed
    • #0/0: no message deleted

You get used to it, in a good way. It does not, unfortunately, show up when you run it in systemd, which is a bit annoying as I like to see a summary mail traffic in the logs.

Interoperability issue

In my notmuch setup, I have bound key S to "mark spam", which basically assigns the tag spam to the message and removes a bunch of others. Then I have a notmuch-purge script which moves that message to the spam folder, for training purposes. It basically does this:

notmuch search --output=files --format=text0 "$search_spam" \
    | xargs -r -0 mv -t "$HOME/Maildir/${PREFIX}junk/cur/"

This method, which worked fine in SMD (and also OfflineIMAP) created this error on sync:

Maildir error: duplicate UID 37578.

And indeed, there are now two messages with that UID in the mailbox:

anarcat@angela:~(main)$ find Maildir/.junk/ -name '*U=37578*'

This is actually a known limitation or, as mbsync(1) calls it, a "RECOMMENDATION":

When using the more efficient default UID mapping scheme, it is important that the MUA renames files when moving them between Maildir fold ers. Mutt always does that, while mu4e needs to be configured to do it:

(setq mu4e-change-filenames-when-moving t)

So it seems I would need to fix my script. It's unclear how the paths should be renamed, which is unfortunate, because I would need to change my script to adapt to mbsync, but I can't tell how just from reading the above.

(A manual fix is actually to rename the file to remove the U= field: mbsync will generate a new one and then sync correctly.)

Fortunately, someone else already fixed that issue: afew, a notmuch tagging script (much puns, such hurt), has a move mode that can rename files correctly, specifically designed to deal with mbsync. I had already been told about afew, but it's one more reason to standardize my notmuch hooks on that project, it looks like.

Update: I have tried to use afew and found it has significant performance issues. It also has a completely different paradigm to what I am used to: it assumes all incoming mail has a new and lays its own tags on top of that (inbox, sent, etc). It can only move files from one folder at a time (see this bug) which breaks my spam training workflow. In general, I sync my tags into folders (e.g. ham, spam, sent) and message flags (e.g. inbox is F, unread is "not S", etc), and afew is not well suited for this (although there are hacks that try to fix this). I have worked hard to make my tagging scripts idempotent, and it's something afew doesn't currently have. Still, it would be better to have that code in Python than bash, so maybe I should consider my options here.

Stability issues

The newer release in Debian bookworm (currently at 1.4.3) has stability issues on full sync. I filed bug 999804 in Debian about this, which lead to a thread on the upstream mailing list. I have found at least three distinct crashes that could be double-free bugs "which might be exploitable in the worst case", not a reassuring prospect.

The thing is: mbsync is really fast, but the downside of that is that it's written in C, and with that comes a whole set of security issues. The Debian security tracker has only three CVEs on isync, but the above issues show there could be many more.

Reading the source code certainly did not make me very comfortable with trusting it with untrusted data. I considered sandboxing it with systemd (below) but having systemd run as a --user process makes that difficult. I also considered using an apparmor profile but that is not trivial because we need to allow SSH and only some parts of it...

Thankfully, upstream has been diligent at addressing the issues I have found. They provided a patch within a few days which did fix the sync issues.

Update: upstream actually took the issue very seriously. They not only got CVE-2021-44143 assigned for my bug report, they also audited the code and found several more issues collectively identified as CVE-2021-3657, which actually also affect 1.3 (ie. Debian 11/bullseye/stable). Somehow my corpus doesn't trigger that issue, but it was still considered serious enough to warrant a CVE. So one the one hand: excellent response from upstream; but on the other hand: how many more of those could there be in there?

Automation with systemd

The Arch wiki has instructions on how to setup mbsync as a systemd service. It suggests using the --verbose (-V) flag which is a little intense here, as it outputs 1444 lines of messages.

I have used the following .service file:

Description=Mailbox synchronization service

ExecStart=/usr/bin/mbsync -a


And the following .timer:

Description=Mailbox synchronization timer



Note that we trigger notmuch through systemd, with the Before and also by adding mbsync.service to the notmuch-new.service file:

Description=notmuch new

ExecStart=/usr/bin/notmuch new


An improvement over polling repeatedly with a .timer would be to wake up only on IMAP notify, but neither imapnotify nor goimapnotify seem to be packaged in Debian. It would also not cover for the "sent folder" use case, where we need to wake up on local changes.

Password-less setup

The sample file suggests this should work:

IMAPStore remote
Tunnel "ssh -q /usr/sbin/imapd"

Add BatchMode, restrict to IdentitiesOnly, provide a password-less key just for this, add compression (-C), find the Dovecot imap binary, and you get this:

IMAPAccount anarcat-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o -C /usr/lib/dovecot/imap"

And it actually seems to work:

$ mbsync -a
Notice: Master/Slave are deprecated; use Far/Near instead.
C: 0/2  B: 0/1  F: +0/0 *0/0 #0/0  N: +0/0 *0/0 #0/0imap(anarcat): Error: net_connect_unix(/run/dovecot/stats-writer) failed: Permission denied
C: 2/2  B: 205/205  F: +0/0 *0/0 #0/0  N: +1/1 *3/3 #0/0imap(anarcat)<1611280><90uUOuyElmEQlhgAFjQyWQ>: Info: Logged out in=10808 out=15396642 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=1 body_bytes=8087

It's a bit noisy, however. dovecot/imap doesn't have a "usage" to speak of, but even the source code doesn't hint at a way to disable that Error message, so that's unfortunate. That socket is owned by root:dovecot so presumably Dovecot runs the imap process as $user:dovecot, which we can't do here. Oh well?

Interestingly, the SSH setup is not faster than IMAP.

With IMAP:

===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.367       0.065       2.220       2.376       2.458       
user        0.793       0.047       0.731       0.776       0.871       
sys         0.426       0.040       0.364       0.434       0.476

With SSH:

===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.515       0.088       2.274       2.532       2.594       
user        0.753       0.043       0.645       0.766       0.804       
sys         0.328       0.045       0.212       0.340       0.393

Basically: 200ms slower. Tolerable.

Migrating from SMD

The above was how I migrated to mbsync on my first workstation. The work on the second one was more streamlined, especially since the corruption on mailboxes was fixed:

  1. install isync, with the patch:

    dpkg -i isync_1.4.3-1.1~_amd64.deb
  2. copy all files over from previous workstation to avoid a full resync (optional):

    rsync -a --info=progress2 angela:Maildir/ Maildir-mbsync/
  3. rename all files to match new hostname (optional):

    find Maildir-mbsync/ -type f -name '*.angela,*' -print0 |  rename -0 's/\.angela,/\.curie,/'
  4. trash the notmuch database (optional):

    rm -rf Maildir-mbsync/.notmuch/xapian/
  5. disable all smd and notmuch services:

    systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer
  6. do one last sync with smd:

    smd-pull --show-tags ; smd-push --show-tags ; notmuch new ; notmuch-sync-flagged -v
  7. backup notmuch on the client and server:

    notmuch dump | pv > notmuch.dump
  8. backup the maildir on the client and server:

    cp -al Maildir Maildir-bak
  9. create the SSH key:

    ssh-keygen -t ed25519 -f .ssh/id_ed25519_mbsync
    cat .ssh/
  10. add to .ssh/authorized_keys on the server, like this:

    command="/usr/lib/dovecot/imap",restrict ssh-ed25519 AAAAC...

  11. move old files aside, if present:

    mv Maildir Maildir-smd
  12. move new files in place (CRITICAL SECTION BEGINS!):

    mv Maildir-mbsync Maildir
  13. run a test sync, only pulling changes:

    mbsync --create-near --remove-none --expunge-none --noop anarcat-register

  14. if that works well, try with all mailboxes:

    mbsync --create-near --remove-none --expunge-none --noop -a

  15. if that works well, try again with a full sync:

    mbsync register mbsync -a

  16. reindex and restore the notmuch database, this should take ~25 minutes:

    notmuch new
    pv notmuch.dump | notmuch restore
  17. enable the systemd services and retire the smd-* services:

    systemctl --user enable mbsync.timer notmuch-new.service systemctl --user start mbsync.timer rm ~/.config/systemd/user/smd* systemctl daemon-reload

During the migration, notmuch helpfully told me the full list of those lost messages:

Warning: cannot apply tags to missing message:
Warning: cannot apply tags to missing message:
Warning: cannot apply tags to missing message:
Warning: cannot apply tags to missing message:
Warning: cannot apply tags to missing message:
Warning: cannot apply tags to missing message: notmuch-sha1-000458df6e48d4857187a000d643ac971deeef47
Warning: cannot apply tags to missing message: notmuch-sha1-0079d8e0c3340e6f88c66f4c49fca758ea71d06d
Warning: cannot apply tags to missing message: notmuch-sha1-0194baa4cfb6d39bc9e4d8c049adaccaa777467d
Warning: cannot apply tags to missing message: notmuch-sha1-02aede494fc3f9e9f060cfd7c044d6d724ad287c
Warning: cannot apply tags to missing message: notmuch-sha1-06606c625d3b3445420e737afd9a245ae66e5562
Warning: cannot apply tags to missing message: notmuch-sha1-0747b020f7551415b9bf5059c58e0a637ba53b13

As detailed in the crash report, all of those were actually innocuous and could be ignored.

Also note that we completely trash the notmuch database because it's actually faster to reindex from scratch than let notmuch slowly figure out that all mails are new and all the old mails are gone. The fresh indexing took:

nov 19 15:08:54 angela notmuch[2521117]: Processed 384679 total files in 23m 41s (270 files/sec.).
nov 19 15:08:54 angela notmuch[2521117]: Added 372610 new messages to the database.

While a reindexing on top of an existing database was going twice as slow, at about 120 files/sec.

Current config file

Putting it all together, I ended up with the following configuration file:

SyncState *
Sync All

# IMAP side, AKA "Far"
IMAPAccount anarcat-imap
User anarcat
PassCmd "pass"
CertificateFile /etc/ssl/certs/ca-certificates.crt

IMAPAccount anarcat-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o -C /usr/lib/dovecot/imap"

IMAPStore anarcat-remote
Account anarcat-tunnel

# Maildir side, AKA "Near"
MaildirStore anarcat-local
# Maildir/top/sub/sub
#SubFolders Verbatim
# Maildir/.top.sub.sub
SubFolders Maildir++
# Maildir/top/.sub/.sub
# SubFolders legacy
# The trailing "/" is important
#Path ~/Maildir-mbsync/
Inbox ~/Maildir/

# what binds Maildir and IMAP
Channel anarcat
Far :anarcat-remote:
Near :anarcat-local:
# Exclude everything under the internal [Gmail] folder, except the interesting folders
#Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
# Or include everything
#Patterns *
Patterns * !register  !.register
# Automatically create missing mailboxes, both locally and on the server
Create Both
#Create Near
# Sync the movement of messages between folders and deletions, add after making sure the sync works
Expunge Both
# Propagate mailbox deletion
Remove both

IMAPAccount anarcat-register-imap
User register
PassCmd "pass"
CertificateFile /etc/ssl/certs/ca-certificates.crt

IMAPAccount anarcat-register-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o -C /usr/lib/dovecot/imap"

IMAPStore anarcat-register-remote
Account anarcat-register-tunnel

MaildirStore anarcat-register-local
SubFolders Maildir++
Inbox ~/Maildir/.register/

Channel anarcat-register
Far :anarcat-register-remote:
Near :anarcat-register-local:
Create Both
Expunge Both
Remove both

Note that it may be out of sync with my live (and private) configuration file, as I do not publish my "dotfiles" repository publicly for security reasons.


I've used OfflineIMAP for a long time before switching to SMD. I don't exactly remember why or when I started using it, but I do remember it became painfully slow as I started using notmuch, and would sometimes crash mysteriously. It's been a while, so my memory is hazy on that.

It also kind of died in a fire when Python 2 stop being maintained. The main author moved on to a different project, imapfw which could serve as a framework to build IMAP clients, but never seemed to implement all of the OfflineIMAP features and certainly not configuration file compatibility. Thankfully, a new team of volunteers ported OfflineIMAP to Python 3 and we can now test that new version to see if it is an improvement over mbsync.

Crash on full sync

The first thing that happened on a full sync is this crash:

Copy message from RemoteAnarcat:junk:
 ERROR: Copying message 30624 [acc: Anarcat]
  decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Thread 'Copy message from RemoteAnarcat:junk' terminated with exception:
Traceback (most recent call last):
  File "/usr/share/offlineimap3/offlineimap/", line 406, in utf7m_decode
    for c in binary.decode():
AttributeError: 'memoryview' object has no attribute 'decode'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/share/offlineimap3/offlineimap/", line 146, in run
  File "/usr/lib/python3.9/", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/share/offlineimap3/offlineimap/folder/", line 802, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/", line 342, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/", line 908, in _fetch_from_imap
    ndata1 = self.parser['8bit-RFC'].parsebytes(data[0][1])
  File "/usr/lib/python3.9/email/", line 123, in parsebytes
    return self.parser.parsestr(text, headersonly)
  File "/usr/lib/python3.9/email/", line 67, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/lib/python3.9/email/", line 56, in parse
  File "/usr/lib/python3.9/email/", line 176, in feed
  File "/usr/lib/python3.9/email/", line 180, in _call_parse
  File "/usr/lib/python3.9/email/", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/", line 298, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/", line 256, in _parsegen
    if self._cur.get_content_type() == 'message/delivery-status':
  File "/usr/lib/python3.9/email/", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/usr/lib/python3.9/email/", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/lib/python3.9/email/", line 163, in header_fetch_parse
    return self.header_factory(name, value)
  File "/usr/lib/python3.9/email/", line 601, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.9/email/", line 196, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.9/email/", line 445, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.9/email/", line 2675, in parse_content_type_header
  File "/usr/lib/python3.9/email/", line 2569, in parse_mime_parameters
    token, value = get_parameter(value)
  File "/usr/lib/python3.9/email/", line 2492, in get_parameter
    token, value = get_value(value)
  File "/usr/lib/python3.9/email/", line 2403, in get_value
    token, value = get_quoted_string(value)
  File "/usr/lib/python3.9/email/", line 1294, in get_quoted_string
    token, value = get_bare_quoted_string(value)
  File "/usr/lib/python3.9/email/", line 1223, in get_bare_quoted_string
    token, value = get_encoded_word(value)
  File "/usr/lib/python3.9/email/", line 1064, in get_encoded_word
    text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
  File "/usr/lib/python3.9/email/", line 181, in decode
    string = bstring.decode(charset)
AttributeError: decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')

Last 1 debug messages logged for Copy message from RemoteAnarcat:junk prior to exception:
thread: Register new thread 'Copy message from RemoteAnarcat:junk' (account 'Anarcat')
ERROR: Exceptions occurred during the run!
ERROR: Copying message 30624 [acc: Anarcat]
  decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')

  File "/usr/share/offlineimap3/offlineimap/folder/", line 802, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/", line 342, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/", line 908, in _fetch_from_imap
    ndata1 = self.parser['8bit-RFC'].parsebytes(data[0][1])
  File "/usr/lib/python3.9/email/", line 123, in parsebytes
    return self.parser.parsestr(text, headersonly)
  File "/usr/lib/python3.9/email/", line 67, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/lib/python3.9/email/", line 56, in parse
  File "/usr/lib/python3.9/email/", line 176, in feed
  File "/usr/lib/python3.9/email/", line 180, in _call_parse
  File "/usr/lib/python3.9/email/", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/", line 298, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/", line 256, in _parsegen
    if self._cur.get_content_type() == 'message/delivery-status':
  File "/usr/lib/python3.9/email/", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/usr/lib/python3.9/email/", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/lib/python3.9/email/", line 163, in header_fetch_parse
    return self.header_factory(name, value)
  File "/usr/lib/python3.9/email/", line 601, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.9/email/", line 196, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.9/email/", line 445, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.9/email/", line 2675, in parse_content_type_header
  File "/usr/lib/python3.9/email/", line 2569, in parse_mime_parameters
    token, value = get_parameter(value)
  File "/usr/lib/python3.9/email/", line 2492, in get_parameter
    token, value = get_value(value)
  File "/usr/lib/python3.9/email/", line 2403, in get_value
    token, value = get_quoted_string(value)
  File "/usr/lib/python3.9/email/", line 1294, in get_quoted_string
    token, value = get_bare_quoted_string(value)
  File "/usr/lib/python3.9/email/", line 1223, in get_bare_quoted_string
    token, value = get_encoded_word(value)
  File "/usr/lib/python3.9/email/", line 1064, in get_encoded_word
    text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
  File "/usr/lib/python3.9/email/", line 181, in decode
    string = bstring.decode(charset)

Folder junk [acc: Anarcat]:
 Copy message UID 30626 (29008/49310) RemoteAnarcat:junk -> LocalAnarcat:junk
Command exited with non-zero status 100
5252.91user 535.86system 3:21:00elapsed 47%CPU (0avgtext+0avgdata 846304maxresident)k
96344inputs+26563792outputs (1189major+2155815minor)pagefaults 0swaps

That only transferred about 8GB of mail, which gives us a transfer rate of 5.3Mbit/s, more than 5 times slower than mbsync. This bug is possibly limited to the bullseye version of offlineimap3 (the lovely 0.0~git20210225.1e7ef9e+dfsg-4), while the current sid version (the equally gorgeous 0.0~git20211018.e64c254+dfsg-1) seems unaffected.

Tolerable performance

The new release still crashes, except it does so at the very end, which is an improvement, since the mails do get transferred:

 *** Finished account 'Anarcat' in 511:12
ERROR: Exceptions occurred during the run!
ERROR: Exception parsing message with ID (<>) from imaplib (response type: bytes).
 AttributeError: decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')

  File "/usr/share/offlineimap3/offlineimap/folder/", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/", line 910, in _fetch_from_imap
    raise OfflineImapError(

ERROR: Exception parsing message with ID (<>) from imaplib (response type: bytes).
 AttributeError: decoding with 'x-mac-roman' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')

  File "/usr/share/offlineimap3/offlineimap/folder/", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/", line 910, in _fetch_from_imap
    raise OfflineImapError(

ERROR: IMAP server 'RemoteAnarcat' does not have a message with UID '32686'

  File "/usr/share/offlineimap3/offlineimap/folder/", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/", line 889, in _fetch_from_imap
    raise OfflineImapError(reason, severity)

Command exited with non-zero status 1
8273.52user 983.80system 8:31:12elapsed 30%CPU (0avgtext+0avgdata 841936maxresident)k
56376inputs+43247608outputs (811major+4972914minor)pagefaults 0swaps
"offlineimap  -o " took 8 hours 31 mins 15 secs

This is 8h31m for transferring 12G, which is around 3.1Mbit/s. That is nine times slower than mbsync, almost an order of magnitude!

Now that we have a full sync, we can test incremental synchronization. That is also much slower:

===> multitime results
1: sh -c "offlineimap -o || true"
            Mean        Std.Dev.    Min         Median      Max
real        24.639      0.513       23.946      24.526      25.708      
user        23.912      0.473       23.404      23.795      24.947      
sys         1.743       0.105       1.607       1.729       2.002

That is also an order of magnitude slower than mbsync, and significantly slower than what you'd expect from a sync process. ~30 seconds is long enough to make me impatient and distracted; 3 seconds, less so: I can wait and see the results almost immediately.

Integrity check

That said: this is still on a gigabit link. It's technically possible that OfflineIMAP performs better than mbsync over a slow link, but I Haven't tested that theory.

The OfflineIMAP mail spool is missing quite a few messages as well:

anarcat@angela:~(main)$ find Maildir-offlineimap -type f -type f -a \! -name '.*' | wc -l 
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*' | wc -l 

... although that's probably all either new messages or the register folder, so OfflineIMAP might actually be in a better position there. But digging in more, it seems like the actual per-folder diff is fairly similar to mbsync: a few messages missing here and there. Considering OfflineIMAP's instability and poor performance, I have not looked any deeper in those discrepancies.

Other projects to evaluate

Those are all the options I have considered, in alphabetical order

  • doveadm-sync: requires dovecot on both ends, can tunnel over SSH, may have performance issues in incremental sync, written in C
  • fdm: fetchmail replacement, IMAP/POP3/stdin/Maildir/mbox,NNTP support, SOCKS support (for Tor), complex rules for delivering to specific mailboxes, adding headers, piping to commands, etc. discarded because no (real) support for keeping mail on the server, and written in C
  • getmail: fetchmail replacement, IMAP/POP3 support, supports incremental runs, classification rules, Python
  • interimap: syncs two IMAP servers, apparently faster than doveadm and offlineimap, but requires running an IMAP server locally, Perl
  • isync/mbsync: TLS client certs and SSH tunnels, fast, incremental, IMAP/POP/Maildir support, multiple mailbox, trash and recursion support, and generally has good words from multiple Debian and notmuch people (Arch tutorial), written in C, review above
  • mail-sync: notify support, happens over any piped transport (e.g. ssh), diff/patch system, requires binary on both ends, mentions UUCP in the manpage, mentions rsmtp which is a nice name for rsendmail. not evaluated because it seems awfully complex to setup, Haskell
  • nncp: treat the local spool as another mail server, not really compatible with my "multiple clients" setup, Golang
  • offlineimap3: requires IMAP, used the py2 version in the past, might just still work, first sync painful (IIRC), ways to tunnel over SSH, review above, Python

Most projects were not evaluated due to lack of time.


I'm now using mbsync to sync my mail. I'm a little disappointed by the synchronisation times over the slow link, but I guess that's on par for the course if we use IMAP. We are bound by the network speed much more than with custom protocols. I'm also worried about the C implementation and the crashes I have witnessed, but I am encouraged by the fast upstream response.

Time will tell if I will stick with that setup. I'm certainly curious about the promises of interimap and mail-sync, but I have ran out of time on this project.

05 December, 2021 09:20PM

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

RcppSpdlog 0.0.7 on CRAN: Package Maintenance

A new version 0.0.7 of RcppSpdlog is now on CRAN. RcppSpdlog bundles spdlog, a wonderful header-only C++ logging library with all the bells and whistles you would want that was written by Gabi Melman, and also includes fmt by Victor Zverovich.

This release brings upstream bugfix releases 1.9.1 and 1.9.2 of spdlog. We also removed the YAML file (and badge) for the disgraced former continuous integration service we shall not name (yet that we all used to use). And just like digest four days ago, drat three days ago, littler two days ago, and RcppAPT yesterday, we converted the vignettes from using the minidown package to the (fairly new) simplermarkdown package which is so much more appropriate for our use of the minimal water.css style.

The (minimal) NEWS entry for this release follows.

Changes in RcppSpdlog version 0.0.7 (2021-12-05)

  • Upgraded to upstream bug fix releases spdlog 1.9.1 and 1.9.2

  • Travis artifacts and badges have been pruned

  • Vignette now uses simplermarkdown

Courtesy of my CRANberries, there is also a diffstat report. More detailed information is on the RcppSpdlog page, or the package documention site.

If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

05 December, 2021 07:34PM

Reproducible Builds

Reproducible Builds in November 2021

Welcome to the November 2021 report from the Reproducible Builds project.

As a quick recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries. The motivation behind the reproducible builds effort is therefore to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. If you are interested in contributing to our project, please visit our Contribute page on our website.

On November 6th, Vagrant Cascadian presented at this year’s edition of the SeaGL conference, giving a talk titled Debugging Reproducible Builds One Day at a Time:

I’ll explore how I go about identifying issues to work on, learn more about the specific issues, recreate the problem locally, isolate the potential causes, dissect the problem into identifiable parts, and adapt the packaging and/or source code to fix the issues.

A video recording of the talk is available on

Fedora Magazine published a post written by Zbigniew Jędrzejewski-Szmek about how to Use Diffoscope in packager workflows, specifically around ensuring that new versions of a package do not introduce breaking changes:

In the role of a packager, updating packages is a recurring task. For some projects, a packager is involved in upstream maintenance, or well written release notes make it easy to figure out what changed between the releases. This isn’t always the case, for instance with some small project maintained by one or two people somewhere on GitHub, and it can be useful to verify what exactly changed. Diffoscope can help determine the changes between package releases. []

kpcyrd announced the release of rebuilderd version 0.16.3 on our mailing list this month, adding support for builds to generate multiple artifacts at once.

Lastly, we held another IRC meeting on November 30th. As mentioned in previous reports, due to the global events throughout 2020 etc. there will be no in-person summit event this year.


diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes, including preparing and uploading versions 190, 191, 192, 193 and 194 to Debian:

  • New features:

    • Continue loading a .changes file even if the referenced files do not exist, but include a comment in the returned diff. []
    • Log the reason if we cannot load a Debian .changes file. []
  • Bug fixes:

    • Detect XML files as XML files if file(1) claims if they are XML files or if they are named .xml. (#999438)
    • Don’t duplicate file lists at each directory level. (#989192)
    • Don’t raise a traceback when comparing nested directories with non-directories. []
    • Re-enable test_android_manifest. []
    • Don’t reject Debian .changes files if they contain non-printable characters. []
  • Codebase improvements:

    • Avoid aliasing variables if we aren’t going to use them. []
    • Use isinstance over type. []
    • Drop a number of unused imports. []
    • Update a bunch of %-style string interpolations into f-strings or str.format. []
    • When pretty-printing JSON, mark the difference as being reformatted, additionally avoiding including the full path. []
    • Import itertools top-level module directly. []

Chris Lamb also made an update to the command-line client to trydiffoscope, a web-based version of the diffoscope in-depth and content-aware diff utility, specifically only waiting for 2 minutes for to respond in tests. (#998360)

In addition Brandon Maier corrected an issue where parts of large diffs were missing from the output [], Zbigniew Jędrzejewski-Szmek fixed some logic in the assert_diff_startswith method [] and Mattia Rizzolo updated the packaging metadata to denote that we support both Python 3.9 and 3.10 [] as well as a number of warning-related changes[][]. Vagrant Cascadian also updated the diffoscope package in GNU Guix [][].

Distribution work

In Debian, Roland Clobus updated the wiki page documenting Debian reproducible ‘Live’ images to mention some new bug reports and also posted an in-depth status update to our mailing list.

In addition, 90 reviews of Debian packages were added, 18 were updated and 23 were removed this month adding to our knowledge about identified issues. Chris Lamb identified a new toolchain issue, `absolute_path_in_cmake_file_generated_by_meson.

Work has begun on classifying reproducibility issues in packages within the Arch Linux distribution. Similar to the analogous effort within Debian (outlined above), package information is listed in a human-readable packages.yml YAML file and a sibling file shows how to classify packages too.

Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report for openSUSE and Vagrant Cascadian updated a link on our website to link to the GNU Guix reproducibility testing overview [].

Software development

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Elsewhere, in software development, Jonas Witschel updated strip-nondeterminism, our tool to remove specific non-deterministic results from a completed build so that it did not fail on JAR archives containing invalid members with a .jar extension []. This change was later uploaded to Debian by Chris Lamb.

reprotest is the Reproducible Build’s project end-user tool to build the same source code twice in widely different environments and checking whether the binaries produced by the builds have any differences. This month, Mattia Rizzolo overhauled the Debian packaging [][][] and fixed a bug surrounding suffixes in the Debian package version [], whilst Stefano Rivera fixed an issue where the package tests were broken after the removal of diffoscope from the package’s strict dependencies [].

Testing framework

The Reproducible Builds project runs a testing framework at, to check packages and other artifacts for reproducibility. This month, the following changes were made:

  • Holger Levsen:

    • Document the progress in setting up []
    • Add the packages required for debian-snapshot. []
    • Make the dstat package available on all Debian based systems. []
    • Mark virt32b-armhf and virt64b-armhf as down. []
  • Jochen Sprickerhof:

    • Add SSH authentication key and enable access to the osuosl168-amd64 node. [][]
  • Mattia Rizzolo:

    • Revert “reproducible Debian: mark virt(32 64)b-armhf as down” - restored. []
  • Roland Clobus (Debian “live” image generation):

    • Rename sid internally to unstable until an issue in the snapshot system is resolved. []
    • Extend testing to include Debian bookworm too.. []
    • Automatically create the Jenkins ‘view’ to display jobs related to building the Live images. []
  • Vagrant Cascadian:

    • Add a Debian ‘package set’ group for the packages and tools maintained by the Reproducible Builds maintainers themselves. []

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

05 December, 2021 06:33PM

December 04, 2021

hackergotchi for Jonathan Dowland

Jonathan Dowland

Haskell mortgage calculator

A few months ago I was trying to compare two mortgage offers, and ended up writing a small mortgage calculator to help me. Both mortgages were fixed-term for the same time period (5 years). One of the mortgages had a lower rate than the other, but much higher arrangement fees.

A broker recommended the mortgage with the higher rate but lower fee, on an affordability basis for the fixed term: over all, we would spend less money within the fixed term on that deal than the other. (I thought) this left one bit of information missing: what remaining balance would there be at the end of the term?

The mortgages I want to model are defined in terms of a monthly repayment figure and an annual interest rate for the fixed period. I think interest is usually recalculated on a daily basis, so I convert the annual rate down to a daily rate.

Repayments only happen once a month. Months are not all the same size. Using mod 30 on the 'day' approximates a monthly payment. Over 5 years, there would be 60 months, meaning 60 repayments. (I'm ignoring leap years)

λ> length . filter id .take (5*365) $ [ x`mod`30==0 | x <- [1..]]

Here's what I came up with. I was a little concerned the repayment approximation was too far out so I compared the output with a more precise (but boring) spreadsheet and they agreed to within an acceptable tolerance.

The numbers that follow are all made up to illustrate the function and don't reflect my actual mortgage. :)

borrowed = 1000000 -- day 0 amount outstanding

aer   = 0.89
repay = 1000
der   = aer / 36

owed n | n == 0          = borrowed
       | n `mod` 30 == 0 = last + interest - repay
       | otherwise       = last + interest
        last     = owed (n - 1)
        interest = last * der

04 December, 2021 10:01PM

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

RcppAPT 0.0.8: Package Maintenance

A new version of the RcppAPT package interfacing from R to the C++ library behind the awesome apt, apt-get, apt-cache, … commands and their cache powering Debian, Ubuntu and the like arrived on CRAN earlier today.

RcppAPT allows you to query the (Debian or Ubuntu) package dependency graph at will, with build-dependencies (if you have deb-src entries), reverse dependencies, and all other goodies. See the vignette and examples for illustrations.

This release updates some package metadata, adds a new package testing helper, and, just like digest three days ago, drat two days ago, and littler yesterday, we converted the vignettes from using the minidown package to the (fairly new) simplermarkdown package which is so much more appropriate for our use of the minimal water.css style.

Changes in version 0.0.8 (2021-12-04)

  • New test file version.R ensures NEWS file documents current package version

  • Travis artifacts and badges have been pruned

  • Vignettes now use simplermarkdown

Courtesy of my CRANberries, there is also a diffstat report for this release. A bit more information about the package is available here as well as as the GitHub repo.

If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

04 December, 2021 03:33PM

December 03, 2021

littler 0.3.15 on CRAN: Package Updates

max-heap image

The sixteenth release of littler as a CRAN package just landed, following in the now fifteen year history (!!) as a package started by Jeff in 2006, and joined by me a few weeks later.

littler is the first command-line interface for R as it predates Rscript. It allows for piping as well for shebang scripting via #!, uses command-line arguments more consistently and still starts faster. It also always loaded the methods package which Rscript only started to do in recent years.

littler lives on Linux and Unix, has its difficulties on macOS due to yet-another-braindeadedness there (who ever thought case-insensitive filesystems as a default were a good idea?) and simply does not exist on Windows (yet – the build system could be extended – see RInside for an existence proof, and volunteers are welcome!). See the FAQ vignette on how to add it to your PATH. A few examples are highlighted at the Github repo, as well as in the examples vignette.

This release brings a more robust and featureful install2.r script (thanks to Gergely Daróczi), corrects some documentation typos (thanks to John Kerl), and now compacts pdf vignette better when using the build.r helper. It also one more updates the URLs for the two RStudio downloaders, and adds a simplermarkdown wrapper. Next, we removed the YAML file (and badge) for the disgraced former continuous integration service we shall not name (yet that we all used to use). And, following digest two days ago and drat yesterday, we converted the vignettes from using the minidown package to the (fairly new) simplermarkdown package which is so much more appropriate for our use of the minimal water.css style.

The full change description follows.

Changes in littler version 0.3.15 (2021-12-03)

  • Changes in examples

    • The install2 script can select download methods, and cope with errors from parallel download (thanks to Gergely Daroczi)

    • The build.r now uses both as argument to --compact-vignettes

    • The RStudio download helper were once again updated for changed URLs

    • New caller for simplermarkdown::mdweave_to_html

  • Changes in package

    • Several typos were correct (thanks to John Kerl)

    • Travis artifacts and badges have been pruned

    • Vignettes now use simplermarkdown

My CRANberries service provides a comparison to the previous release. Full details for the littler release are provided as usual at the ChangeLog page, and also on the package docs website. The code is available via the GitHub repo, from tarballs and now of course also from its CRAN page and via install.packages("littler"). Binary packages are available directly in Debian as well as soon via Ubuntu binaries at CRAN thanks to the tireless Michael Rutter.

Comments and suggestions are welcome at the GitHub repo.

If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

03 December, 2021 12:37PM

hackergotchi for Evgeni Golov

Evgeni Golov

Dependency confusion in the Ansible Galaxy CLI

I hope you enjoyed my last post about Ansible Galaxy Namespaces. In there I noted that I originally looked for something completely different and the namespace takeover was rather accidental.

Well, originally I was looking at how the different Ansible content hosting services and their client (ansible-galaxy) behave in regard to clashes in naming of the hosted content.

"Ansible content hosting services"?! There are currently three main ways for users to obtain Ansible content:

  • Ansible Galaxy - the original, community oriented, free hosting platform
  • Automation Hub - the place for Red Hat certified and supported content, available only with a Red Hat subscription, hosted by Red Hat
  • Ansible Automation Platform - the on-premise version of Automation Hub, syncs content from there and allows customers to upload own content

Now the question I was curious about was: how would the tooling behave if different sources would offer identically named content?

This was inspired by Alex Birsan: Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies and zofrex: Bundler is Still Vulnerable to Dependency Confusion Attacks (CVE⁠-⁠2020⁠-⁠36327), who showed that the tooling for Python, Node.js and Ruby can be tricked into fetching content from "the wrong source", thus allowing an attacker to inject malicious code into a deployment.

For the rest of this article, it's not important that there are different implementations of the hosting services, only that users can configure and use multiple sources at the same time.

The problem is that, if the user configures their server_list to contain multiple Galaxy-compatible servers, like Ansible Galaxy and Automation Hub, and then asks to install a collection, the Ansible Galaxy CLI will ask every server in the list, until one returns a successful result. The exact order seems to differ between versions, but this doesn't really matter for the issue at hand.

Imagine someone wants to install the redhat.satellite collection from Automation Hub (using ansible-galaxy collection install redhat.satellite). Now if their configuration defines Galaxy as the first, and Automation Hub as the second server, Galaxy is always asked whether it has redhat.satellite and only if the answer is negative, Automation Hub is asked. Today there is no redhat namespace on Galaxy, but there is a redhat user on GitHub, so…

The canonical answer to this issue is to use a requirements.yml file and setting the source parameter. This parameter allows you to express "regardless which sources are configured, please fetch this collection from here". That's is nice, but I think this not being the default syntax (contrary to what e.g. Bundler does) is a bad approach. Users might overlook the security implications, as the shorter syntax without the source just "magically" works.

However, I think this is not even the main problem here. The documentation says: Once a collection is found, any of its requirements are only searched within the same Galaxy instance as the parent collection. The install process will not search for a collection requirement in a different Galaxy instance. But as it turns out, the source behavior was changed and now only applies to the exact collection it is set for, not for any dependencies this collection might have.

For the sake of the example, imagine two collections: evgeni.test1 and evgeni.test2, where test2 declares a dependency on test1 in its galaxy.yml. Actually, no need to imagine, both collections are available in version 1.0.0 from and test1 version 2.0.0 is available from

Now, given our recent reading of the docs, we craft the following requirements.yml:

- name: evgeni.test2
  version: '*'

In a perfect world, following the documentation, this would mean that both collections are fetched from, right? However, this is not what ansible-galaxy does. It will fetch evgeni.test2 from the specified source, determine it has a dependency on evgeni.test1 and fetch that from the "first" available source from the configuration.

Take for example the following ansible.cfg:

server_list = test_galaxy, release_galaxy, test_galaxy



And try to install collections, using the above requirements.yml:

% ansible-galaxy collection install -r requirements.yml -vvv                 
ansible-galaxy 2.9.27
  config file = /home/evgeni/Devel/ansible-wtf/collections/ansible.cfg
  configured module search path = ['/home/evgeni/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.10/site-packages/ansible
  executable location = /usr/bin/ansible-galaxy
  python version = 3.10.0 (default, Oct  4 2021, 00:00:00) [GCC 11.2.1 20210728 (Red Hat 11.2.1-1)]
Using /home/evgeni/Devel/ansible-wtf/collections/ansible.cfg as config file
Reading requirement file at '/home/evgeni/Devel/ansible-wtf/collections/requirements.yml'
Found installed collection theforeman.foreman:3.0.0 at '/home/evgeni/.ansible/collections/ansible_collections/theforeman/foreman'
Process install dependency map
Processing requirement collection 'evgeni.test2'
Collection 'evgeni.test2' obtained from server explicit_requirement_evgeni.test2
Opened /home/evgeni/.ansible/galaxy_token
Processing requirement collection 'evgeni.test1' - as dependency of evgeni.test2
Collection 'evgeni.test1' obtained from server test_galaxy
Starting collection install process
Installing 'evgeni.test2:1.0.0' to '/home/evgeni/.ansible/collections/ansible_collections/evgeni/test2'
Downloading to /home/evgeni/.ansible/tmp/ansible-local-133/tmp9uqyjgki
Installing 'evgeni.test1:2.0.0' to '/home/evgeni/.ansible/collections/ansible_collections/evgeni/test1'
Downloading to /home/evgeni/.ansible/tmp/ansible-local-133/tmp9uqyjgki

As you can see, evgeni.test1 is fetched from, instead of Now, if those servers instead would be Galaxy and Automation Hub, and somebody managed to snag the redhat namespace on Galaxy, I would be now getting the wrong stuff… Another problematic setup would be with Galaxy and on-prem Ansible Automation Platform, as you can have any namespace on the later and these most certainly can clash with namespaces on public Galaxy.

I have reported this behavior to Ansible Security on 2021-08-26, giving a 90 days disclosure deadline, which expired on 2021-11-24.

So far, the response was that this is working as designed, to allow cross-source dependencies (e.g. a private collection referring to one on Galaxy) and there is an issue to update the docs to match the code. If users want to explicitly pin sources, they are supposed to name all dependencies and their sources in requirements.yml. Alternatively they obviously can configure only one source in the configuration and always mirror all dependencies.

I am not happy with this and I think this is terrible UX, explicitly inviting people to make mistakes.

03 December, 2021 08:00AM by evgeni

December 02, 2021

hackergotchi for Jonathan McDowell

Jonathan McDowell

Building a desktop to improve my work/life balance

ASRock DeskMini X300

It’s been over 20 months since the first COVID lockdown kicked in here in Northern Ireland and I started working from home. Even when the strict lockdown was lifted the advice here has continued to be “If you can work from home you should work from home”. I’ve been into the office here and there (for new starts given you need to hand over a laptop and sort out some login details it’s generally easier to do so in person, and I’ve had a couple of whiteboard sessions that needed the high bandwidth face to face communication), but day to day is all from home.

Early on I commented that work had taken over my study. This has largely continued to be true. I set my work laptop on the stand on a Monday morning and it sits there until Friday evening, when it gets switched for the personal laptop. I have a lovely LG 34UM88 21:9 Ultrawide monitor, and my laptops are small and light so I much prefer to use them docked. Also my general working pattern is to have a lot of external connections up and running (build machine, test devices, log host) which means a suspend/resume cycle disrupts things. So I like to minimise moving things about.

I spent a little bit of time trying to find a dual laptop stand so I could have both machines setup and switch between them easily, but I didn’t find anything that didn’t seem to be geared up for DJs with a mixer + laptop combo taking up quite a bit of desk space rather than stacking laptops vertically. Eventually I realised that the right move was probably a desktop machine.

Now, I haven’t had a desktop machine since before I moved to the US, realising at the time that having everything on my laptop was much more convenient. I decided I didn’t want something too big and noisy. Cheap GPUs seem hard to get hold of these days - I’m not a gamer so all I need is something that can drive a ~ 4K monitor reliably enough. Looking around the AMD Ryzen 7 5700G seemed to be a decent CPU with one of the better integrated GPUs. I spent some time looking for a reasonable Mini-ITX case + motherboard and then I happened upon the ASRock DeskMini X300. This turns out to be perfect; I’ve no need for a PCIe slot or anything more than an m.2 SSD. I paired it with a Noctua NH-L9a-AM4 heatsink + fan (same as I use in the house server), 32GB DDR4 and a 1TB WD SN550 NVMe SSD. Total cost just under £650 inc VAT + delivery (and that’s a story for another post).

A desktop solves the problem of fitting both machines on the desk at once, but there’s still the question of smoothly switching between them. I read Evgeni Golov’s article on a simple KVM switch for €30. My monitor has multiple inputs, so that’s sorted. I did have a cheap USB2 switch (all I need for the keyboard/trackball) but it turned out to be pretty unreliable at the host detecting the USB change. I bought a UGREEN USB 3.0 Sharing Switch Box instead and it’s turned out to be pretty reliable. The problem is that the LG 32UM88 turns out to have a poor DDC implementation, so while I can flip the keyboard easily with the UGREEN box I also have to manually select the monitor input. Which is a bit annoying, but not terrible.

The important question is whether this has helped. I built all this at the end of October, so I’ve had a month to play with it. Turns out I should have done it at some point last year. At the end of the day instead of either sitting “at work” for a bit longer, or completely avoiding the study, I’m able to lock the work machine and flick to my personal setup. Even sitting in the same seat that “disconnect”, and the knowledge I won’t see work Slack messages or emails come in and feeling I should respond, really helps. It also means I have access to my personal setup during the week without incurring a hit at the start of the working day when I have to set things up again. So it’s much easier to just dip in to some personal tech stuff in the evening than it was previously. Also from the point of view I don’t need to setup the personal config, I can pick up where I left off. All of which is really nice.

It’s also got me thinking about other minor improvements I should make to my home working environment to try and improve things. One obvious thing now the winter is here again is to improve my lighting; I have a good overhead LED panel but it’s terribly positioned for video calls, being just behind me. So I think I’m looking some sort of strip light I can have behind the large monitor to give a decent degree of backlight (possibly bouncing off the white wall). Lots of cheap options I’m not convinced about, and I’ve had a few ridiculously priced options from photographer friends; suggestions welcome.

02 December, 2021 08:00PM

hackergotchi for Steve Kemp

Steve Kemp

It has been some time..

I realize it has been quite some time since I last made a blog-post, so I guess the short version is "I'm still alive", or as Granny Weatherwax would have said:


Of course if I die now this would be an awkward post!

I can't think of anything terribly interesting I've been doing recently, mostly being settled in my new flat and tinkering away with things. The latest "new" code was something for controlling mpd via a web-browser:

This is a simple HTTP server which allows you to minimally control mpd running on localhost:6600. (By minimally I mean literally "stop", "play", "next track", and "previous track").

I have all my music stored on my desktop, I use mpd to play it locally through a pair of speakers plugged into that computer. Sometimes I want music in the sauna, or in the bedroom. So I have a couple of bluetooth speakers which are used to send the output to another room. When I want to skip tracks I just open the mpd-web site on my phone and tap the button. (I did look at android mpd-clients, but at the same time it seemed like installing an application for this was a bit overkill).

I guess I've not been doing so much "computer stuff" outside work for a year or so. I guess lack of time, lack of enthusiasm/motivation.

So looking forward to things? I'll be in the UK for a while over Christmas, barring surprises. That should be nice as I'll get to see family, take our child to visit his grandparents (on his birthday no less) and enjoy playing the "How many Finnish people can I spot in the UK?" game

02 December, 2021 02:59PM

December 01, 2021

hackergotchi for Junichi Uekawa

Junichi Uekawa


December. The world is turbulent and I am still worried where we are going.

01 December, 2021 11:56PM by Junichi Uekawa

Thorsten Alteholz

My Debian Activities in November 2021

FTP master

This month I accepted 564 and rejected 93 packages. The overall number of packages that got accepted was 591.

Debian LTS

This was my eighty-ninth month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian.

This month my all in all workload has been 40h. During that time I did LTS and normal security uploads of:

  • [DLA 2820-1] atftp security update for two CVEs
  • [DLA 2821-1] axis security update for one CVE
  • [DLA 2822-1] netkit-rsh security update for two CVEs
  • [DLA 2825-1] libmodbus security update for two CVEs
  • [#1000408] for libmodbus in Buster
  • [#1000485] for btrbk in Bullseye
  • [#1000486] for btrbk in Buster

I also started to work on pgbouncer to get an update for each release and had to process packages from NEW on security-master.

Further I worked on a script to automatically publish DLAs on the Debian website, that are posted to debian-lts-announce. The script can be found on salsa. It only publishes stuff from people on a whitelist. At the moment it is running on a computer at home. You might run your own copy, or just send me an email to be put on the whitelist as well.

Last but not least I did some days of frontdesk duties.

Debian ELTS

This month was the forty-first ELTS month.

During my allocated time I uploaded:

  • ELA-517-1 for atftp
  • ELA-519-1 for qtbase-opensource-src
  • ELA-520-1 for libsdl1.2
  • ELA-521-1 for libmodbus

Last but not least I did some days of frontdesk duties.

Debian Printing

Unfortunately I did not do as much as I wanted this month. At least I looked at some old bugs and uploaded new upstream versions of …

I hope this will improve in December again. New versions of cups and hplip are on my TODO-list.

Debian Astro

This month I uploaded new versions of …

Other stuff

I improved packaging or fixed bugs of:

01 December, 2021 03:33PM by alteholz

Russ Allbery

Review: A World Without Email

Review: A World Without Email, by Cal Newport

Publisher: Portfolio/Penguin
Copyright: 2021
ISBN: 0-525-53657-4
Format: Kindle
Pages: 264

A World Without Email is the latest book by computer science professor and productivity writer Cal Newport. After a detour to comment on the drawbacks of social media in Digital Minimalism, Newport is back to writing about focus and concentration in the vein of Deep Work. This time, though, the topic is workplace structure and collaborative process rather than personal decisions.

This book is a bit hard for me to review because I spoiled myself for the contents by listening to a lot of Newport's podcast, where he covers the same material. I therefore didn't enjoy it as much as I otherwise would have because the ideas were familiar. I recommend the book over the podcast, though; it's tighter, more coherent, and more comprehensive.

The core contention of this book is that knowledge work (roughly, jobs where one spends significant time working on a computer processing information) has stumbled into a superficially tempting but inefficient and psychologically harmful structure that Newport calls the hyperactive hive mind. This way of organizing work is a local maxima: it feels productive, it's flexible and very easy to deploy, and most minor changes away from it make overall productivity worse. However, the incentive structure is all wrong. It prioritizes quick responses and coordination overhead over deep thinking and difficult accomplishments.

The characteristic property of the hyperactive hive mind is free-flowing, unstructured communication between co-workers. If you need something from someone else, you ask them for it and they send it to you. The "email" in the title is not intended literally; Slack and related instant messaging apps are even more deeply entrenched in the hyperactive hive mind than email is. The key property of this workflow is that most collaborative work is done by contacting other people directly via ad hoc, unstructured messages.

Newport's argument is that this workflow has multiple serious problems, not the least of which is that it makes us miserable. If you have read his previous work, you will correctly expect this to tie into his concept of deep work. Ad hoc, unstructured communication creates a constant barrage of unimportant small tasks and interrupts, most of which require several asynchronous exchanges before your brain can stop tracking the task. This creates constant context-shifting, loss of focus and competence, and background stress from ever-growing email inboxes, unread message notifications, and the semi-frantic feeling that you're forgetting something you need to do.

This is not an original observation, of course. Many authors have suggested individual ways to improve this workflow: rules about how often to check one's email, filtering approaches, task managers, and other personal systems. Newport's argument is that none of these individual approaches can address the problem due to social effects. It's all well and good to say that you should unplug from distractions and ignore requests while you concentrate, but everyone else's workflow assumes that their co-workers are responsive to ad hoc requests. Ignoring this social contract makes the job of everyone still stuck in the hyperactive hive mind harder. They won't appreciate that, and your brain will not be able to relax knowing that you're not meeting your colleagues' expectations.

In Newport's analysis, the necessary solution is a comprehensive redesign of how we do knowledge work, akin to the redesign of factory work that came with the assembly line. It's a collective problem that requires a collective solution. In other industries, organizing work for efficiency and quality is central to the job of management, but in knowledge work (for good historical reasons) employees are mostly left to organize their work on their own. That self-organization has produced a system that doesn't require centralized coordination or decisions and provides a lot of superficial flexibility, but which may be significantly inferior to a system designed for how people think and work.

Even if you find this convincing (and I think Newport makes a good case), there are reasons to be suspicious of corporations trying to make people more productive. The assembly line made manufacturing much more efficient, but it also increased the misery of workers so much that Henry Ford had to offer substantial raises to retain workers. As one of Newport's knowledge workers, I'm not enthused about that happening to my job.

Newport recognizes this and tries to address it by drawing a distinction between the workflow (how information moves between workers) and the work itself (how individual workers solve problems in their area of expertise). He argues that companies need to redesign the former, but should leave the latter to each worker. It's a nice idea, and it will probably work in industries like tech with substantial labor bargaining power. I'm more cynical about other industries.

The second half of the book is Newport's specific principles and recommendations for designing better workflows that don't rely on unstructured email. Some of this will be familiar (and underwhelming) to anyone who works in tech; Newport recommends ticket systems and thinks agile, scrum, and kanban are pointed in the right direction. But there are some other good ideas in here, such as embracing specialization.

Newport argues (with some evidence) that the drastic reduction in secretarial jobs, on the grounds that workers with computers can do the same work themselves, was a mistake. Even with new automation, this approach increased the range of tasks required in every other job. Not only was this a drain on the time of other workers, it caused more context switching, which made everyone less efficient and undermined work quality. He argues for reversing that trend: where the work cannot be automated, hire more support workers and more specialized workers in general, stop expecting everyone to be their own generalist admin, and empower support workers to create better systems rather than using the hyperactive hive mind model to answer requests.

There's more here, ranging from specifics of how to develop a structured process for a type of work to the importance of enabling sustained concentration on a task. It's a less immediately actionable book than Newport's previous writing, but I welcome the partial shift in focus to more systemic issues. Newport continues to be relentlessly apolitical, but here it feels less like he's eliding important analysis and more like he thinks the interests of workers and good employers are both served by the approach he's advocating.

I will warn that Newport leans heavily on evolutionary psychology in his argument that the hyperactive hive mind is bad for us. I think he has some good arguments about the anxiety that comes with not responding to requests from others, but I'm not sure intrusive experiments on spectacularly-unusual remnant hunter-gatherer groups, who are treated like experimental animals, are the best way of making that case. I realize this isn't Newport's research, but I think he could have made his point with more directly relevant experiments.

He also continues his obsession with the superiority of in-person conversation over written communication, and while he has a few good arguments, he has a tendency to turn them into sweeping generalizations that are directly contradicted by, well, my entire life. It would be nice if he were more willing to acknowledge that it's possible to express deep emotional nuance and complex social signaling in writing; it simply requires a level of practice and familiarity (and shared vocabulary) that's often missing from the workplace.

I was muttering a lot near the start of this book, but thankfully those sections are short, and I think the rest of his argument sits on a stronger foundation.

I hope Newport continues moving in the direction of more systemic analysis. If you enjoyed Deep Work, you will probably find A World Without Email interesting. If you're new to Newport, this is not a bad place to start, particularly if you have influence on how communication is organized in your workplace. Those who work in tech will find some bits of this less interesting, but Newport approaches the topic from a different angle than most agile books and covers a broader range if ideas.

Recommended if you like reading this sort of thing.

Rating: 7 out of 10

01 December, 2021 05:07AM

Paul Wise

FLOSS Activities November 2021


This month I didn't have any particular focus. I just worked on issues in my info bubble.





  • Debian BTS: unarchive/reopen/triage bugs for reintroduced packages
  • Debian wiki: unblock IP addresses, approve accounts


  • Respond to queries from Debian users and contributors on the mailing lists and IRC


The SPTAG, visdom, gensim, purple-discord, plac, fail2ban, uvloop work was sponsored by my employer. All other work was done on a volunteer basis.

01 December, 2021 02:53AM

November 30, 2021

Russell Coker

hackergotchi for Steinar H. Gunderson

Steinar H. Gunderson


How do you get a git commit with an interesting commit ID (or “SHA”)? Of course, interesting is in the eye of the beholder, but let's define it as having many repeated hex nibbles, e.g. “000” in the commit would be somewhat interesting and “8888888888888888888888888” would be very interesting. This is pretty similar to the dreaded cryptocoin mining; we have no simple way of forcing a given SHA-1 hash unless someone manages a complete second-preimage break, so we must brute-force. (And hopefully without boiling the planet in the process; we'd have to settle for a bit shorter runs than in the example above.)

Git commit IDs are SHA-1 checksums of what they contain; the tree object (“what does the commit contain”), the parents, the commit message and some dates. Of those, let's use the author date as the nonce (I chose to keep the committer date truthful, so as to not be accused of forging history too much). We can set up a shell script to commit with --amend, sweeping GIT_AUTHOR_DATE over the course of a day or so and having EDITOR=true in order not to have to close the editor all the time.

It turns out this is pretty slow (unsurprisingly!). So we discover that actually launching the “editor” takes a long time, and --no-edit is much faster. We can also move to a tmpfs in order not to be block on fsync and block allocation (eatmydata would also work, but doesn't fix the filesystem overhead). At this point, we're at roughly 50 commits/sec or so. So we can sweep through the entire day of author dates, and if nothing interesting comes up, we can just try again (as we also get a new committer date, we've essentially reset our random generator).

But we can do much better than this. A commit in git is many different things; load the index, see if we need to add something, then actually make the commit object and finally update HEAD and whatever branch we might be on. Of those, we only really need to make the commit object and see what it hash ended up with! So we change our script to use git commit-tree instead, and whoa, we're up to 300 commits/sec.

Now we're bottlenecked at the time it takes to fork and launch the git binary—so we can hack the git sources and move the date sweep into builtin/commit-tree.c. This is radically faster; about 100 times as fast! Now what takes time is compressing and creating the commit object.

But OK, my 5950X has 16 cores, right, so we can just split the range in 16 and have different cores test different ranges? Wrong! Because now, the entire sweep takes less than a second, so we no longer get the different committer date and the cores are testing the same SHA over and over. (In effect, our nonce space is too small.) We cheat a bit and add extra whitespace to the end of the commit message to get a larger parameter space; the core ID determines how many spaces.

At this point, you can make commits so fast that the problem essentially becomes that you run out of space, and need to run git prune every few seconds. So the obvious next step would be to not compress and write out the commits at all… and then, I suppose, optimize the routines to not call any git stuff anymore, and then have GPUs do the testing, and of course, finally we'll have Gitcoin ASICs, and every hope of reaching the 1.5 degree goal is lost…

Did I say Gitcoin? No, unfortunately that name was already taken. So I'll call it Commitcoin. And I'm satisifed with a commit containing dddddddd, even though it's of course possible to do much better—hardness is only approximately 2^26 commits to get a commit as interesting as that.

(Cryptobros, please stay out of my inbox. I'm not interested.)

30 November, 2021 11:00AM

Russell Coker

Your Device Has Been Improved

I’ve just started a Samsung tablet downloading a 770MB update, the description says:

  • Overall stability of your device has been improved
  • The security of your device has been improved

Technically I have no doubt that both those claims are true and accurate. But according to common understanding of the English language I think they are both misleading.

By “stability improved” they mean “fixed some bugs that made it unstable” and no technical person would imagine that after a certain number of such updates the number of bugs will ever reach zero and the tablet will be perfectly reliable. In fact if you should consider yourself lucky if they fix more bugs than they add. It’s not THAT uncommon for phones and tablets to be bricked (rendered unusable by software) by an update. In the past I got a Huawei Mate9 as a warranty replacement for a Nexus 6P because an update caused so many Nexus 6P phones to fail that they couldn’t be replaced with an identical phone [1].

By “security improved” they usually mean “fixed some security flaws that were recently discovered to make it almost as secure as it was designed to be”. Note that I deliberately say “almost as secure” because it’s sometimes impossible to fix a security flaw without making significant changes to interfaces which requires more work than desired for an old product and also gives a higher probability of things going wrong. So it’s sometimes better to aim for almost as secure or alternatively just as secure but with some features disabled.

Device manufacturers (and most companies in the Android space make the same claims while having the exact same bugs to deal with, Samsung is no different from the others in this regards) are not making devices more secure or more reliable than when they were initially released. They are aiming to make them almost as secure and reliable as when they were released. They don’t have much incentive to try too hard in this regard, Samsung won’t suffer if I decide my old tablet isn’t reliable enough and buy a new one, which will almost certainly be from Samsung because they make nice tablets.

As a thought experiment, consider if car repairers did the same thing. “Getting us to service your car will improve fuel efficiency”, great how much more efficient will it be than when I purchased it?

As another thought experiment, consider if car companies stopped providing parts for car repair a few years after releasing a new model. This is effectively what phone and tablet manufacturers have been doing all along, software updates for “stability and security” are to devices what changing oil etc is for cars.

30 November, 2021 09:41AM by etbe

November 29, 2021

hackergotchi for Evgeni Golov

Evgeni Golov

Getting access to somebody else's Ansible Galaxy namespace

TL;DR: adding features after the fact is hard, normalizing names is hard, it's patched, carry on.

I promise, the longer version is more interesting and fun to read!

Recently, I was poking around Ansible Galaxy and almost accidentally got access to someone else's namespace. I was actually looking for something completely different, but accidental finds are the best ones!

If you're asking yourself: "what the heck is he talking about?!", let's slow down for a moment:

  • Ansible is a great automation engine built around the concept of modules that do things (mostly written in Python) and playbooks (mostly written in YAML) that tell which things to do
  • Ansible Galaxy is a place where people can share their playbooks and modules for others to reuse
  • Galaxy Namespaces are a way to allow users to distinguish who published what and reduce name clashes to a minimum

That means that if I ever want to share how to automate installing vim, I can publish evgeni.vim on Galaxy and other people can download that and use it. And if my evil twin wants their vim recipe published, it will end up being called evilme.vim. Thus while both recipes are called vim they can coexist, can be downloaded to the same machine, and used independently.

How do you get a namespace? It's automatically created for you when you login for the first time. After that you can manage it, you can upload content, allow others to upload content and other things. You can also request additional namespaces, this is useful if you want one for an Organization or similar entities, which don't have a login for Galaxy.

Apropos login, Galaxy uses GitHub for authentication, so you don't have to store yet another password, just smash that octocat!

Did anyone actually click on those links above? If you did (you didn't, right?), you might have noticed another section in that document: Namespace Limitations. That says:

Namespace names in Galaxy are limited to lowercase word characters (i.e., a-z, 0-9) and ‘_’, must have a minimum length of 2 characters, and cannot start with an ‘_’. No other characters are allowed, including ‘.’, ‘-‘, and space. The first time you log into Galaxy, the server will create a Namespace for you, if one does not already exist, by converting your username to lowercase, and replacing any ‘-‘ characters with ‘_’.

For my login evgeni this is pretty boring, as the generated namespace is also evgeni. But for the GitHub user Evil-Pwnwil-666 it will become evil_pwnwil_666. This can be a bit confusing.

Another confusing thing is that Galaxy supports two types of content: roles and collections, but namespaces are only for collections! So it is Evil-Pwnwil-666.vim if it's a role, but evil_pwnwil_666.vim if it's a collection.

I think part of this split is because collections were added much later and have a much more well thought design of both the artifact itself and its delivery mechanisms.

This is by the way very important for us! Due to the fact that collections (and namespaces!) were added later, there must be code that ensures that users who were created before also get a namespace.

Galaxy does this (and I would have done it the same way) by hooking into the login process, and after the user is logged in it checks if a Namespace exists and if not it creates one and sets proper permissions.

And this is also exactly where the issue was!

The old code looked like this:

    # Create lowercase namespace if case insensitive search does not find match
    qs = models.Namespace.objects.filter(
    if qs.exists():
        namespace = qs[0]
        namespace = models.Namespace.objects.create(**ns_defaults)


See how namespace.owners.add is always called? Even if the namespace already existed? Yepp!

But how can we exploit that? Any user either already has a namespace (and owns it) or doesn't have one that could be owned. And given users are tied to GitHub accounts, there is no way to confuse Galaxy here. Now, remember how I said one could request additional namespaces, for organizations and stuff? Those will have owners, but the namespace name might not correspond to an existing user!

So all we need is to find an existing Galaxy namespace that is not a "default" namespace (aka a specially requested one) and get a GitHub account that (after the funny name conversion) matches the namespace name.

Thankfully Galaxy has an API, so I could dump all existing namespaces and their owners. Next I filtered that list to have only namespaces where the owner list doesn't contain a username that would (after conversion) match the namespace name. I found a few. And for one of them (let's call it the_target), the corresponding GitHub username (the-target) was available! Jackpot!

I've registered a new GitHub account with that name, logged in to Galaxy and had access to the previously found namespace.

This felt like sufficient proof that my attack worked and I mailed my findings to the Ansible Security team. The issue was fixed in d4f84d3400f887a26a9032687a06dd263029bde3 by moving the namespace.owners.add call to the "new namespace" branch.

And this concludes the story of how I accidentally got access to someone else's Galaxy namespace (which was revoked after the report, no worries).

29 November, 2021 08:00AM by evgeni

Dima Kogan


I just spent an unspeakable number of days typing to produce something that sounds very un-impressive: an FLTK widget that can display an image. The docs and code live here. The big difference from the usual image-drawing widget is that this one uses OpenGL internally, so after the initial image load, the common operations (drawing, redrawing, panning and zooming) are very fast. I have high-resolution images in my projects, and this will make my tools much nicer.

Three separate interfaces are available:

  • C: core library and GLUT application
  • C++: FLTK widget
  • Python: FLTK widget for pyfltk

The FLTK widgets have built-in interactive panning/zooming, and the library can draw line overlays. So nice applications can be built quickly.

I already added some early disabled-by-default support into the mrcal-stereo tool to visualize the rectification and report sensitivities:



29 November, 2021 07:53AM by Dima Kogan

Russ Allbery

Fall haul

It's been a while since I've posted one of these, and I also may have had a few moments of deciding to support authors by buying their books even if I'm not going to get a chance to read them soon. There's also a bit of work reading in here.

Ryka Aoki — Light from Uncommon Stars (sff)
Frederick R. Chromey — To Measure the Sky (non-fiction)
Neil Gaiman, et al. — Sandman: Overture (graphic novel)
Alix E. Harrow — A Spindle Splintered (sff)
Jordan Ifueko — Raybearer (sff)
Jordan Ifueko — Redemptor (sff)
T. Kingfisher — Paladin's Hope (sff)
TJ Klune — Under the Whispering Door (sff)
Kiese Laymon — How to Slowly Kill Yourself and Others in America (non-fiction)
Yuna Lee — Fox You (romance)
Tim Mak — Misfire (non-fiction)
Naomi Novik — The Last Graduate (sff)
Shelley Parker-Chan — She Who Became the Sun (sff)
Gareth L. Powell — Embers of War (sff)
Justin Richer & Antonio Sanso — OAuth 2 in Action (non-fiction)
Dean Spade — Mutual Aid (non-fiction)
Lana Swartz — New Money (non-fiction)
Adam Tooze — Shutdown (non-fiction)
Bill Watterson — The Essential Calvin and Hobbes (strip collection)
Bill Willingham, et al. — Fables: Storybook Love (graphic novel)
David Wong — Real-World Cryptography (non-fiction)
Neon Yang — The Black Tides of Heaven (sff)
Neon Yang — The Red Threads of Fortune (sff)
Neon Yang — The Descent of Monsters (sff)
Neon Yang — The Ascent to Godhood (sff)
Xiran Jay Zhao — Iron Widow (sff)

29 November, 2021 03:45AM

November 28, 2021

hackergotchi for Wouter Verhelst

Wouter Verhelst

GR procedures and timelines

A vote has been proposed in Debian to change the formal procedure in Debian by which General Resolutions (our name for "votes") are proposed. The original proposal is based on a text by Russ Allberry, which changes a number of rules to be less ambiguous and, frankly, less weird.

One thing Russ' proposal does, however, which I am absolutely not in agreement with, is to add a absolutly hard time limit after three weeks. That is, in the proposed procedure, the discussion time will be two weeks initially (unless the Debian Project Leader chooses to reduce it, which they can do by up to one week), and it will be extended if more options are added to the ballot; but after three weeks, no matter where the discussion stands, the discussion period ends and Russ' proposed procedure forces us to go to a vote, unless all proposers of ballot options agree to withdraw their option.

I believe this is a big mistake. I think any procedure we come up with should allow for the possibility that we may end up with a situation where everyone agrees that extending the discussion time a short time is a good idea, without necessarily resetting the whole discussion time to another two weeks (modulo a decision by the DPL).

At the same time, any procedure we come up with should try to avoid the possibility of process abuse by people who would rather delay a vote ad infinitum than to see it voted upon. A hard time limit certainly does that; but I believe it causes more problems than it solves.

I think insted that it is necessary for any procedure to allow for the discussion time to be extended as long as a strong enough consensus exists that this would be beneficial.

As such, I have proposed an amendment to Russ' proposal (a full version of my proposed constitution can be seen on salsa) that hopefully solves these issues in a novel way: it allows anyone to request an extension to the discussion time, which then needs to be sponsored according to the same rules as a new ballot option. If the time extension is successfully created, those who supported the extension can then also no longer propose any new ones. Additionally, after 4 weeks, the proposed procedure allows anyone to object, so that 4 weeks is probably the practical limit -- although the possibility exists if enough support exists to extend the discussion time (or not enough to end it). The full rules involve slightly more than that (I don't like to put too much formal language in a blog post), but they're not too complicated, I think.

That proposal has received a number of seconds, but after a week it hasn't yet reached the constitutional requirement for the option to be on the ballot.

So, I guess this is a public request for more support to my proposal. If you're a Debian Developer and you agree with me that my proposed procedure is better than the alternative, please step forward and let yourself be heard.


28 November, 2021 07:04PM

hackergotchi for Joachim Breitner

Joachim Breitner

Zero-downtime upgrades of Internet Computer canisters

TL;DR: Zero-downtime upgrades are possible if you stick to the basic actor model.


DFINITY’s Internet Computer provides a kind of serverless compute platform, where the services are WebAssemmbly programs called “canisters”. These services run without stopping (or at least that’s what it feels like from the service’s perspective; this is called “orthogonal persistence”), and process one message after another. Messages not only come from the outside (“ingress” calls), but are also exchanged between canisters.

On top of these uni-directional messages, the system provides the concept of “inter-canister calls”, which associates a respondse message with the outgoing message, and guarantees that a response will come. This RPC-like interface allows canister developers to program in the popular async/await model, where these inter-canister calls look almost like normal function calls, and the subsequent code is suspended until the response comes back.

The problem

This is all very well, until you try to upgrade your canister, i.e. install new code to fix a bug or add a feature. Because if you used the await pattern, there may still be suspended computations waiting for the response. If you swap out the program now, the code of that suspended computation will no longer be present, and the response cannot be handled! Worse, because of an infelicity with the current system’s API, when the response comes back, it may actually corrupt your service’s state.

That is why upgrading a canister requires stopping it first, which means waiting for all outstanding calls to come back. During this time, your canister is not available for new calls (so there is downtime), and worse, the length of the downtime is at the whims of the canisters you called – they could withhold the response ad infinitum, rendering your canister unupgradeable.

Clearly, this is not acceptable for any serious application. In this post, I’ll explore some of the ways to mitigate this problem, and how to create canisters that are safely instantanously (no downtime) upgradeable.

It’s a spectrum

Some canisters are trivially upgradeable, for others all hope is lost; it depends on what the canister does and how. As an overview, here is the spectrum:

  1. A canister that never performs inter-canister calls can always be upgraded without stopping.
  2. A canister that only does one-way calls, and does them in a particular way (see below), can always be upgraded without stopping.
  3. A canister that performs calls, and where it is acceptable to simply drop outstanding repsonses, can always be upgraded without stopping, once the System API has been improved and your Canister Development Kit (CDK; Motoko or Rust) has adapted.
  4. A canister that performs calls, but uses explicit continuations to handle, responses instead of the await-convenience, based on an eventually fixed System API, can be upgradeded without stopping, and will even handle responses afterwards.
  5. A canister that uses await to do inter-canister call cannot be upgraded without stopping.

In this post I will explain 2, which is possible now, in more detail. Variant 3 and 4 only become reality if and when the System API has improved.

One-way calls

A one-way call is a call where you don’t care about the response; neither the replied data, nor possible failure conditions.

Since you don’t care about the response, you can pass an invalid continuation to the system (technical detail: a Wasm table index of -1). Because it is invalid for any (realistic) Wasm module, it will stay invalid even after an upgrade, and the problem of silent corruption mentioned above is avoided. And otherwise it’s fine for this to be invalid: it means the canister “traps” once the response comes back, which is harmeless (and possibly even cheaper than a do-nothing computation).

This requires your CDK to support this kind of call. Mostly incidential, Motoko (and Candid) actually have the concept of one-way call in their type system, namely shared functions with return type () instead of async ... (Motoko is actually older than the system, and not every prediction about what the system will provide has proven successful). So, pending this PR to be released, Motoko will implement one-way calls in this way. On Rust, you have to use the System API directly or wait for cdk-rs to provide this ability (patches welcome, happy to advise).

You might wonder: How are calls useful if I don’t get to look at the response? Of course, this is a set-back – calls with responses are useful, and await is convenient. And if you have to integrate with an existing service that only provides normal calls, you are out of luck.

But if you get to design the canister and all called canisters together, it may be possible to use only one-way messages. You’d be programming in the plain actor model now, with all its advantages (simple concurrency, easy to upgrade, general robustness).

Consider for example a token ledger canister, not unlike the ICP ledger canister. For the most part, it doesn’t have to do any outgoing calls (and thus be trivially upgradeble). But say we need to add notify functionality, where the ledger canister tells other canisters about a transaction. This is a good example for a one-way call: Maybe the ledger canister doesn’t care if that notification was received? The ICP leder does care (once it comes back successful, this particular notification cannot be sent again), but maybe your ledger can do it differently: let the other canister confirm the receip via another one-way call, instead of via the reply; or simply charge for each notification and do not worry about repeated notifications.

Maybe you want to add archiving functionality, where the ledger canister streams its data to an archive canister. There, again, instead of using successful responses to confirm receipt, the archive canister can ping the ledger canister with the latest received index directly.

Yes, it changes the programming model a bit, and all involved parties have to play together, but the gain (zero-downtime upgrades) is quite valuable, and removes a fair number of other sources of issues.

And in the future?

The above is possible with today’s Internet Computer. If the System API gets improves the way I hope it will be, you have a possible middle ground: You still don’t get to use await and instead have to write your response handler as separate functions, but this way you can call any canister again, and you get the system’s assistance in mapping responses to calls. With this in place, any canister can be rewritten to a form that supports zero-downtime upgrades, without affecting its interface or what the canister can do.

28 November, 2021 05:11PM by Joachim Breitner (

Russ Allbery

Review: Soul Music

Review: Soul Music, by Terry Pratchett

Series: Discworld #16
Publisher: Harper
Copyright: January 1995
Printing: November 2013
ISBN: 0-06-223741-1
Format: Mass market
Pages: 420

Soul Music is the sixteenth Discworld novel and something of a plot sequel to Reaper Man (although more of a sequel to the earlier Mort). I would not start reading the Discworld books here.

Susan is a student in the Quirm College for Young Ladies with an uncanny habit of turning invisible. Well, not invisible exactly; rather, people tend to forget that she's there, even when they're in the middle of talking to her. It's disconcerting for the teachers, but convenient when one is uninterested in Literature and would rather read a book.

She listened with half an ear to what the rest of the class was doing.

It was a poem about daffodils.

Apparently the poet had liked them very much.

Susan was quite stoic about this. It was a free country. People could like daffodils if they wanted to. They just should not, in Susan's very definite opinion, be allowed to take up more than a page to say so.

She got on with her education. In her opinion, school kept on trying to interfere with it.

Around her, the poet's vision was being taken apart with inexpert tools.

Susan's determinedly practical education is interrupted by the Death of Rats, with the help of a talking raven and Binky the horse, and without a lot of help from Susan, who is decidedly uninterested in being the sort of girl who goes on adventures. Adventures have a different opinion, since Susan's grandfather is Death. And Death has wandered off again.

Meanwhile, the bard Imp y Celyn, after an enormous row with his father, has gone to Ankh-Morpork. This is not going well; among other things, the Guild of Musicians and their monopoly and membership dues came as a surprise. But he does meet a dwarf and a troll in the waiting room of the Guild, and then buys an unusual music instrument in the sort of mysterious shop that everyone knows has been in that location forever, but which no one has seen before.

I'm not sure there is such a thing as a bad Discworld novel, but there is such a thing as an average Discworld novel. At least for me, Soul Music is one of those. There are some humorous bits, a few good jokes, one great character, and some nice bits of philosophy, but I found the plot forgettable and occasionally annoying. Susan is great. Imp is... not, which is made worse by the fact the reader is eventually expected to believe Susan cares enough about Imp to drive the plot.

Discworld has always been a mix of parody and Pratchett's own original creation, and I have always liked the original creation substantially more than the parody. Soul Music is a parody of rock music, complete with Cut-Me-Own-Throat Dibbler as an unethical music promoter. The troll Imp meets makes music by beating rocks together, so they decide to call their genre "music with rocks in it." The magical instrument Imp buys has twelve strings and a solid body. Imp y Celyn means "bud of the holly." You know, like Buddy Holly. Get it?

Pratchett's reference density is often on the edge of overwhelming the book, but for some reason the parody references in this one felt unusually forced and obvious to me. I did laugh occasionally, but by the end of the story the rock music plot had worn out its welcome. This is not helped by the ending being a mostly incoherent muddle of another parody (admittedly featuring an excellent motorcycle scene). Unlike Moving Pictures, which is a similar parody of Hollywood, Pratchett didn't seem to have much insightful to say about music. Maybe this will be more your thing if you like constant Blues Brothers references.

Susan, on the other hand, is wonderful, and for me is the reason to read this book. She is a delightfully atypical protagonist, and her interactions with the teachers and other students at the girl's school are thoroughly enjoyable. I would have happily read a whole book about her, and more broadly about Death and his family and new-found curiosity about the world. The Death of Rats was also fun, although more so in combination with the raven to translate. I wish this part of her story had a more coherent ending, but I'm looking forward to seeing her in future books.

Despite my complaints, the parody part of this book wasn't bad. It just wasn't as good as the rest of the book. I wanted a better platform for Susan's introduction than a lot of music and band references. If you really like Pratchett's parodies, your mileage may vary. For me, this book was fun but forgettable.

Followed, in publication order, by Interesting Times. The next Death book is Hogfather.

Rating: 7 out of 10

28 November, 2021 05:35AM

November 27, 2021

Review: A Psalm for the Wild-Built

Review: A Psalm for the Wild-Built, by Becky Chambers

Series: Monk & Robot #1
Publisher: Tordotcom
Copyright: July 2021
ISBN: 1-250-23622-3
Format: Kindle
Pages: 160

At the start of the story, Sibling Dex is a monk in a monastery in Panga's only City. They have spent their entire life there, love the buildings, know the hidden corners of the parks, and find the architecture beautiful. They're also heartily sick of it and desperate for the sound of crickets.

Sometimes, a person reaches a point in their life when it becomes absolutely essential to get the fuck out of the city.

Sibling Dex therefore decides to upend their life and travel the outlying villages doing tea service. And they do. They commission an ox-bike wagon, throw themselves into learning cultivation and herbs, experiment with different teas, and practice. It's a lot to learn, and they don't get it right from the start, but Sibling Dex is the sort of person who puts in the work to do something well. Before long, they have a new life as a traveling tea monk.

It's better than living in the City. But it still isn't enough.

We don't find out much about the moon of Panga in this story. Humans live there and it has a human-friendly biosphere with recognizable species, but it is clearly not Earth. The story does not reveal how humans came to live there. Dex's civilization is quite advanced and appears to be at least partly post-scarcity: people work and have professions, but money is rarely mentioned, poverty doesn't appear to be a problem, and Dex, despite being a monk with no obvious source of income, is able to commission the construction of a wagon home without any difficulty. They follow a religion that has no obvious Earth analogue.

The most fascinating thing about Panga is an event in its history. It previously had an economy based on robot factories, but the robots became sentient. Since this is a Becky Chambers story, the humans reaction was to ask the robots what they wanted to do and respect their decision. The robots, not very happy about having their whole existence limited to human design, decided to leave, walking off into the wild. Humans respected their agreement, rebuilt their infrastructure without using robots or artificial intelligence, and left the robots alone. Nothing has been heard from them in centuries.

As you might expect, Sibling Dex meets a robot. Its name is Mosscap, and it was selected to check in with humans. Their attempts to understand each other is much of the story. The rest is Dex's attempt to find what still seems to be missing from life, starting with an attempt to reach a ruined monastery out in the wild.

As with Chambers's other books, A Psalm for the Wild-Built contains a lot of earnest and well-meaning people having thoughtful conversations. Unlike her other books, there is almost no plot apart from those conversations of self-discovery and a profile of Sibling Dex as a character. That plus the earnestness of two naturally introspective characters who want to put their thoughts into words gave this story an oddly didactic tone for me. There are moments that felt like the moral of a Saturday morning cartoon show (I am probably dating myself), although the morals are more sophisticated and conditional. Saying I disliked the tone would be going too far, but it didn't flow as well for me as Chambers's other novels.

I liked the handling of religion, and I loved Sibling Dex's efforts to describe or act on an almost impossible to describe sense that their life isn't quite what they want. There are some lovely bits of description, including the abandoned monastery. The role of a tea monk in this imagined society is a neat, if small, bit of world-building: a bit like a counselor and a bit like a priest, but not truly like either because of the different focus on acceptance, listening, and a hot cup of tea. And Dex's interaction with Mosscap over offering and accepting food is a beautiful bit of characterization.

That said, the story as a whole didn't entirely gel for me, partly because of the didactic tone and partly because I didn't find Mosscap or the described culture of the robots as interesting as I was hoping that I would. But I'm still invested enough that I would read the sequel.

A Psalm for the Wild-Built feels like a prelude or character introduction more than a complete story. When we leave the characters, they're just getting started. You know more about the robots (and Sibling Dex) at the end than you did at the beginning, but don't expect much in the way of resolution.

Followed by A Prayer for the Crown-Shy, scheduled for 2022.

Rating: 7 out of 10

27 November, 2021 05:27AM

November 25, 2021

hackergotchi for Mike Gabriel

Mike Gabriel

Touching Firefox on Linux

More as a reminder to myself, but possibly also helpful to other people who want to use Firefox on a tablet running Debian...

Without the below adjustment, finger gestures in Firefox running on a tablet result in image moving, text highlighting, etc. (operations related to copy+paste). Not the intuitively expected behaviour...

If you use e.g. GNOME on Wayland for your tablet and want to enable touch functionalities in Firefox, then switch the whole browser to native Wayland rendering. This line in ~/.profile seems to help:


If you use a desktop environment running on top of X.Org, then make sure you have added the following line to ~/.profile:

export MOZ_USE_XINPUT2=1

Logout/login again and Firefox should be scrollable with 2-finger movements up and down, zooming in and out also works then.

Mike (aka sunweaver at

25 November, 2021 10:01AM by sunweaver

November 24, 2021

Antoine Beaupré

Automating major Debian upgrades

It's major upgrade time again! The Debian project just published the Debian 11 "bullseye" release, and it's pretty awesome! This makes me realize that I have never written here about my peculiar upgrade process, and figured it was worth bringing that up to a wider audience.

My upgrade process also has a notable changes section which includes major version changes (e.g. Inkscape 1.0!), new packages (e.g. podman!) and important behavior changes (e.g. driverless scanning and printing!).

I'm particularly interested to hear about any significant change I might have missed. If you know of a cool new package that shipped with bullseye and that I forgot, do let me know!

But that's for the cool new stuff. We need to talk about the problems with Debian major upgrades.


I have been maintaining detailed upgrade guides, on my wiki, starting with the jessie release, but I have actually written such guides for as far back as Debian squeeze in 2011 (another worker wrote the older Debian lenny upgrade guide in 2009). Koumbit, since then, has kept maintaining those guides all the way to the latest bullseye upgrade, through 7 major releases!

Over the years, those guides evolved from a quick "cheat-sheet" format copied from the release notes into a more or less "scripted" form that I currently use.

Each guide has a procedure made of a few steps that can be basically copy-pasted to batch-upgrade a host (or multiple hosts in parallel) as quickly as possible. There is also the predict-os script which allows you to keep track of progress of the upgrades in a Puppet cluster.

Limitations of the official procedure

In comparison with my procedure, the official upgrade guide is mostly designed to upgrade a single machine, typically a workstation, with a rather slow and exhaustive process. The PDF version of the upgrade guide is 14 pages long! This, obviously, does not work when you have tens or hundreds of machines to upgrade.

Debian upgrades are notorious for being extremely reliable, but we have a lot of packages, and there are always corner cases where the upgrade will just fail because of a bug specific to your environment. Those will only be fixed after some back and forth in the community (and that's assuming users report those bugs, which is not always the case). There's no obvious way to deploy "hot fixes" in this context, at least not without fixing the package and publishing it on an unofficial Debian archive while the official ones catch up. This is slow and difficult.

Or some packages require manual labor. Examples of this are the PostgreSQL or Ganeti packages which require you to upgrade your clusters by hand, while the old and new packages live side by side. Debian packages bring you far in the upgrade process, but sometimes not all the way.

Which means every Debian install needs to be manually upgraded and inspected when a new release comes out. That's slow and error prone and we can do better.

How to automate major upgrades

I have a proposal to automate this. It's been mostly dormant in the Debian wiki, for 5 years now. Fundamentally, this is a hard problem: Debian gets installed in so many different environments, from workstations to physical servers to virtual machines, embedded systems and so on, that it's extremely hard to come up with a "one size fits all" system.

The (manual) procedure I'm using is mostly targeting servers, but I'm also using it on workstations. And I'll note that it's specific to my home setup: I have a different procedure at work, although it has a lot of common code.

To automate this, I would factor out that common code with hooks where you could easily inject special code like "you need to upgrade ferm first", "you need an extra reboot here", or "this is how you finish the PostgreSQL upgrade".

With Debian getting closer to a 2 year release cycle, with the previous release being supported basically only one year after the new stable comes out, I feel more and more strongly that this needs better automation.

So I'm thinking that I should write a prototype for this. Ubuntu has do-release-upgrade that is too Ubuntu-specific to be reused. An attempt at collaborating on this has been mostly met with silence from Ubuntu's side as well.

I'm thinking that using something like Fabric, Mitogen, or Transilience: anything that will allow me to write simple, portable Python code that can run transparently on a local machine (for single systems upgrades, possibly with a GUI frontend) to remote servers (for large clusters of servers, maybe with canaries and grouping using Cumin). I'll note that Koumbit started experimenting with Puppet Bolt in the bullseye upgrade process, but that feels too site-specific to be useful more broadly.


I am not sure where this stands in the XKCD time trade-off evaluation, because the table doesn't actually cover the time frequency of Debian release (which is basically "biennial") and the amount of time the upgrade would take across a cluster (which varies a lot, but that I estimate to be between one to 6 hours per machine).

Assuming I have 80 machines to upgrade, that is 80 to 480 hours (between ~3 to 20 days) of work! It's unclear how much work such an automated system would shave off, however. Assuming things are an order of magnitude faster (say I upgrade 10 machines at a time), I would shave off between 3 and 18 days of work, which implies I might allow myself to spend a minimum of 5 days working on such a project.

The other option: never upgrade

Before people mention those: I am aware of containers, Kubernetes, and other deployment mechanisms. Indeed, those may be a long-term solution, we currently can't afford to migrate everything over to containers right now: that is a huge migration and a total paradigm shift. At that point, whatever is left might not even be Debian in the first place. And besides, if you run Kubernetes, you still need to run some OS underneath and upgrade that, so that problem never completely disappears.

Still, maybe that's the final answer: never upgrade.

For some stateless machines like DNS replicas or load balancers, that might make a lot of sense as there's no or little data to carry to the new host. But this implies a seamless and fast provisioning process, and we don't have that either: at my work, installing a machine takes about as long as upgrading it, and that's after a significant amount of work automating that process, partly writing my own Debian installer with Fabric (!).

What is your process?

I'm curious to hear what people think of those ideas. It strikes me as really odd that no one has really tackled that problem yet, considering how many clusters of Debian machines are out there. Surely people are upgrading those, and not following that slow step by step guide, right?

I suspect everyone is doing the same thing: we all have our little copy-paste script we batch onto multiple machines, sometimes in parallel. That is what the sysadmins are doing as well.

There must be a better way. What is yours?

My upgrades so far

So far, I have upgraded 2 out of my 3 home machines running buster -- others have been installed directly in bullseye -- with only my main, old, messy server left. Upgrades have been pretty painless so far (see another report, for example), much better than the previous buster upgrade. Obviously, for me personal use, automating this is pointless.

Work-side, however, is another story: we have over 80 boxes to upgrade there and that will take a while. The last stretch to buster cycle took about two years to complete, so we might be done by the time the next release (12, "bookworm") is released, but that's actually a full year after "buster" becomes EOL, so it's actually too late...

At least I fixed the installers so that new the machines we create all ship with bullseye, so we stopped accumulating new buster hosts...

Thanks to lelutin and pabs for reviewing a draft of this post.

24 November, 2021 02:14PM

November 23, 2021

Enrico Zini

Really lossy compression of JPEG

Suppose you have a tool that archives images, or scientific data, and it has a test suite. It would be good to collect sample files for the test suite, but they are often so big one can't really bloat the repository with them.

But does the test suite need everything that is in those files? Not necesarily. For example, if one's testing code that reads EXIF metadata, one doesn't care about what is in the image.

That technique works extemely well. I can take GRIB files that are several megabytes in size, zero out their data payload, and get nice 1Kb samples for the test suite.

I've started to collect and organise the little hacks I use for this into a tool I called mktestsample:

$ mktestsample -v samples1/*
2021-11-23 20:16:32 INFO common samples1/cosmo_2d+0.grib: size went from 335168b to 120b
2021-11-23 20:16:32 INFO common samples1/grib2_ifs.arkimet: size went from 4993448b to 39393b
2021-11-23 20:16:32 INFO common samples1/polenta.jpg: size went from 3191475b to 94517b
2021-11-23 20:16:32 INFO common samples1/test-ifs.grib: size went from 1986469b to 4860b

Those are massive savings, but I'm not satisfied about those almost 94Kb of JPEG:

$ ls -la samples1/polenta.jpg
-rw-r--r-- 1 enrico enrico 94517 Nov 23 20:16 samples1/polenta.jpg
$ gzip samples1/polenta.jpg
$ ls -la samples1/polenta.jpg.gz
-rw-r--r-- 1 enrico enrico 745 Nov 23 20:16 samples1/polenta.jpg.gz

I believe I did all I could: completely blank out image data, set quality to zero, maximize subsampling, and tweak quantization to throw everything away.

Still, the result is a 94Kb file that can be gzipped down to 745 bytes. Is there something I'm missing?

I suppose JPEG is better at storing an image than at storing the lack of an image. I cannot really complain :)

I can still commit compressed samples of large images to a git repository, taking very little data indeed. That's really nice!

23 November, 2021 06:58PM

November 22, 2021

hackergotchi for Ricardo Mones

Ricardo Mones

Claws Mail 4 in experimental

A full month has passed since Claws Mail 4.0.0 was uploaded to Debian experimental, and, somewhat surprisingly, I've received no bug report about it.

This of course can be either because nobody has been brave enough to install it or because well, it works really nice.

For those who don't know what I'm talking about, just note that this version is the first Debian upload for the GTK+3 version of Claws Mail. There was an initial upstream release, namely 3.99, but it was less polished and also I was very busy, so I decided not to upload it. Since then I've been using git's 'gtk3' branch daily without problems, so, for me, it's as stable as its GTK+2 counterpart. There's still some rough edges, of course.

Note also that, if everything goes well, Claws Mail 4.x will be the version to be shipped with Debian 12 (bookworm).

22 November, 2021 09:49AM by mones

hackergotchi for Paul Tagliamonte

Paul Tagliamonte

Be careful when using vxlan!

I’ve spent a bit of time playing with vxlan - which is very neat, but also incredibly insecure by default.

When using vxlan, be very careful to understand how the host is connected to the internet. The kernel will listen on all interfaces for packets, which means hosts accessable to VMs it’s hosting (e.g., by bridged interface or a private LAN will accept packets from VMs and inject them into arbitrary VLANs, even ones it’s not on.

I reported this to the kernel mailing list to no reply with more technical details.

The tl;dr is:

  $ ip link add vevx0a type veth peer name vevx0z
  $ ip addr add dev vevx0a
  $ ip addr add dev vevx0z
  $ ip link add vxlan0 type vxlan id 42 \
    local dev vevx0a dstport 4789
  $ # Note the above 'dev' and 'local' ip are set here
  $ ip addr add dev vxlan0

results in vxlan0 listening on all interfaces, not just vevx0z or vevx0a. To prove it to myself, I spun up a docker container (using a completely different network bridge – with no connection to any of the interfaces above), and ran a Go program to send VXLAN UDP packets to my bridge host:

$ docker run -it --rm -v $(pwd):/mnt debian:unstable /mnt/spam

which results in packets getting injected into my vxlan interface

$ sudo tcpdump -e -i vxlan0
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vxlan0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
21:30:15.746754 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! > localhost: ip-proto-114
21:30:15.746773 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! > localhost: ip-proto-114
21:30:15.746787 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! > localhost: ip-proto-114
21:30:15.746801 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! > localhost: ip-proto-114
21:30:15.746815 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! > localhost: ip-proto-114
21:30:15.746827 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! > localhost: ip-proto-114
21:30:15.746870 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! > localhost: ip-proto-114
21:30:15.746885 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! > localhost: ip-proto-114
21:30:15.746899 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! > localhost: ip-proto-114
21:30:15.746913 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! > localhost: ip-proto-114
10 packets captured
10 packets received by filter
0 packets dropped by kernel

(the program in question is the following:)

  package main

  import (
  func main() {
      conn, err := net.Dial("udp", os.Args[1])
      if err != nil { panic(err) }
      for i := 0; i < 10; i++ {
          vxf := &vxlan.Frame{
              VNI: vxlan.VNI(42),
              Ethernet: &ethernet.Frame{
                  Source:      net.HardwareAddr{0xDE, 0xAD, 0xBE,
0xEF, 0x00, 0x01},
                  Destination: net.HardwareAddr{0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF},
                  EtherType:   ethernet.EtherTypeIPv4,
                  Payload:     []byte("Hello, World!"),
          frb, err := vxf.MarshalBinary()
          if err != nil { panic(err) }
          _, err = conn.Write(frb)
          if err != nil { panic(err) }

When using vxlan, be absolutely sure all hosts that can address any interface on the host are authorized to send arbitrary packets into any VLAN that box can send to, or there’s very careful and specific controls and firewalling. Note this includes public interfaces (e.g., dual-homed private network / internet boxes), or any type of dual-homing (VPNs, etc).

22 November, 2021 02:39AM

Antoine Beaupré

The last syncmaildir crash

My syncmaildir (SMD) setup failed me one too many times (previously, previously). In an attempt to migrate to an alternative mail synchronization tool, I looked into using my IMAP server again, and found out my mail spool was in a pretty bad shape. I'm comparing mbsync and offlineimap in the next post but this post talks about how I recovered the mail spool so that tools like those could correctly synchronise the mail spool again.

The latest crash

On Monday, SMD just started failing with this error:

nov 15 16:12:19 angela systemd[2305]: Starting pull emails with syncmaildir...
nov 15 16:12:22 angela systemd[2305]: smd-pull.service: Succeeded.
nov 15 16:12:22 angela systemd[2305]: Finished pull emails with syncmaildir.
nov 15 16:14:08 angela systemd[2305]: Starting pull emails with syncmaildir...
nov 15 16:14:11 angela systemd[2305]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
nov 15 16:14:11 angela systemd[2305]: smd-pull.service: Failed with result 'exit-code'.
nov 15 16:14:11 angela systemd[2305]: Failed to start pull emails with syncmaildir.
nov 15 16:16:14 angela systemd[2305]: Starting pull emails with syncmaildir...
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Network error.
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Unable to get any data from the other endpoint.
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: This problem may be transient, please retry.
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Hint: did you correctly setup the SERVERNAME variable
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: on your client? Did you add an entry for it in your ssh
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: configuration file?
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Network error
nov 15 16:16:17 angela smd-pull[27188]: register: smd-client@localhost: TAGS: error::context(handshake) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
nov 15 16:16:17 angela systemd[2305]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
nov 15 16:16:17 angela systemd[2305]: smd-pull.service: Failed with result 'exit-code'.
nov 15 16:16:17 angela systemd[2305]: Failed to start pull emails with syncmaildir.

What is frustrating is that there's actually no network error here. Running the command by hand I did see a different message, but now I have lost it in my backlog. It had something to do with a filename being too long, and I gave up debugging after a while. This happened suddenly too, which added to the confusion.

In a fit of rage I started this blog post and experimenting with alternatives, which led me down a lot of rabbit holes.

Reviewing my previous mail crash documentation, it seems most solutions involve talking to an IMAP server, so I figured I would just do that. Wanting to try something new, i gave isync (AKA mbsync) a try. Oh dear, I did not expect how much trouble just talking to my IMAP server would be, which wasn't not isync's fault, for what that's worth. It was the primary tool I used to debug things, and served me well in that regard.

Mailbox corruption

The first thing I found out is that certain messages in the IMAP spool were corrupted. mbsync would stop on a FETCH command and Dovecot would give me those errors on the server side.

"wrong W value"

nov 16 15:31:27 marcos dovecot[3621800]: imap(anarcat)<3630489><wAmSzO3QZtfAqAB1>: Error: Mailbox junk: Maildir filename has wrong W value, renamed the file from /home/anarcat/Maildir/.junk/cur/1454623938.M101164P22216.marcos,S=2495,W=2578:2,S to /home/anarcat/Maildir/.junk/cur/1454623938.M101164P22216.marcos,S=2495:2,S
nov 16 15:31:27 marcos dovecot[3621800]: imap(anarcat)<3630489><wAmSzO3QZtfAqAB1>: Error: Mailbox junk: Deleting corrupted cache record uid=1582: UID 1582: Broken virtual size in mailbox junk: read(/home/anarcat/Maildir/.junk/cur/1454623938.M101164P22216.marcos,S=2495,W=2578:2,S): FETCH BODY[] got too little data: 2540 vs 2578

At least this first error was automatically healed by Dovecot (by renaming the file without the W= flag). The problem is that the FETCH command fails and mbsync exits noisily. So you need to constantly restart mbsync with a silly command like:

while ! mbsync -a; do sleep 1; done

"cached message size larger than expected"

nov 16 13:53:08 marcos dovecot[3520770]: imap(anarcat)<3594402><M5JHb+zQ3NLAqAB1>: Error: Mailbox Sent: UID=19288: read(/home/anarcat/Maildir/.Sent/cur/1224790447.M898726P9811V000000000000FE06I00794FB1_0.marvin,S=2588:2,S) failed: Cached message size larger than expected (2588 > 2482, box=Sent, UID=19288) (read reason=mail stream)
nov 16 13:53:08 marcos dovecot[3520770]: imap(anarcat)<3594402><M5JHb+zQ3NLAqAB1>: Error: Mailbox Sent: Deleting corrupted cache record uid=19288: UID 19288: Broken physical size in mailbox Sent: read(/home/anarcat/Maildir/.Sent/cur/1224790447.M898726P9811V000000000000FE06I00794FB1_0.marvin,S=2588:2,S) failed: Cached message size larger than expected (2588 > 2482, box=Sent, UID=19288)
nov 16 13:53:08 marcos dovecot[3520770]: imap(anarcat)<3594402><M5JHb+zQ3NLAqAB1>: Error: Mailbox Sent: UID=19288: read(/home/anarcat/Maildir/.Sent/cur/1224790447.M898726P9811V000000000000FE06I00794FB1_0.marvin,S=2588:2,S) failed: Cached message size larger than expected (2588 > 2482, box=Sent, UID=19288) (read reason=)
nov 16 13:53:08 marcos dovecot[3520770]: imap-login: Panic: epoll_ctl(del, 7) failed: Bad file descriptor

This second problem is much harder to fix, because dovecot does not recover automatically. This is Dovecot complaining that the cached size (the S= field, but also present in Dovecot's metadata files) doesn't match the file size.

I wonder if at least some of those messages were corrupted in the OfflineIMAP to syncmaildir migration because part of that procedure is to run the strip_header script to remove content from the emails. That could easily have broken things since the files do not also get renamed.


So I read a lot of the Dovecot documentation on the maildir format, and wrote an extensive fix script for those two errors. The script worked and mbsync was able to sync the entire mail spool.

And no, rebuilding the index files didn't work. Also tried doveadm force-resync -u anarcat which didn't do anything.

In the end I also had to do this, because the wrong cache values were also stored elsewhere.

service dovecot stop ; find -name 'dovecot*' -delete; service dovecot start

This would have totally broken any existing clients, but thankfully I'm starting from scratch (except maybe webmail, but I'm hoping it will self-heal as well, assuming it only has a cache and not a full replica of the mail spool).

Incoherence between Maildir and IMAP

Unfortunately, the first mbsync was incomplete as it was missing about 15,000 mails:

anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*' | wc -l 
anarcat@angela:~(main)$ find Maildir-mbsync/ -type f -a \! -name '.*' | wc -l 

As it turns out, mbsync was not at fault here either: this was yet more mail spool corruption.

It's actually 26 folders (out of 205) with inconsistent sizes, which can be found with:

for folder in * .[^.]* ; do 
  printf "%s\t%d\n" $folder $(find "$folder" -type f -a \! -name '.*' | wc -l );

The special \! -name '.*' bit is to ignore the mbsync metadata, which creates .uidvalidity and .mbsyncstate in every folder. That ignores about 200 files but since they are spread around all folders, which was making it impossible to review where the problem was.

Here is what the diff looks like:

--- Maildir-list    2021-11-17 20:42:36.504246752 -0500
+++ Maildir-mbsync-list 2021-11-17 20:18:07.731806601 -0500
@@ -6,16 +6,15 @@
 .Archives  1
 .Archives.2010 3553
-.Archives.2011 3583
-.Archives.2012 12593
+.Archives.2011 3582
+.Archives.2012 620
 .Archives.2013 8576
 .Archives.2014 11057
-.Archives.2015 8173
+.Archives.2015 8165
 .Archives.2016 54
 .band  34
 .bitbuck   1
@@ -38,13 +37,12 @@
 .couchsurfers  2
-cur    11285
+cur    11280
 .current   130
 .cv    2
 .debbug    262
-.debian    37544
-drafts 1
-.Drafts    4
+.debian    37533
+.Drafts    2
 .drone 241
 .drupal    188
 .drupal-devel  303

Misfiled messages

It's a bit all over the place, but we can already notice some huge differences between mailboxes, for example in the Archives folders. As it turns out, at least 12,000 of those missing mails were actually misfiled: instead of being in the Maildir/.Archives.2012/cur/ folder, they were directly in Maildir/.Archives.2012/. This is something that doesn't matter for SMD (and possibly for notmuch? it does matter, notmuch suddenly found 12,000 new mails) but that definitely matters to Dovecot and therefore mbsync...

After moving those files around, we still have 4,000 message missing:

anarcat@angela:~(main)$ find Maildir-mbsync/  -type f -a \! -name '.*' | wc -l 
anarcat@angela:~(main)$ find Maildir/  -type f -a \! -name '.*' | wc -l 

The problem is that those 4,000 missing mails are harder to track. Take, for example, .Archives.2011, which has a single message missing, out of 3,582. And the files are not identical: the checksums don't match after going through the IMAP transport, so we can't use a tool like hashdeep to compare the trees and find why any single file is missing.

"register" folder

One big chunk of the 4,000, however, is a special folder called register in my spool, which I am syncing separately (see Securing registration email for details on that setup). That actually covers 3,700 of those messages, so I actually have a more modest 300 messages to figure out, after (easily!) configuring mbsync to sync that folder separately:

 @@ -30,9 +33,29 @@ Slave :anarcat-local:
  # Exclude everything under the internal [Gmail] folder, except the interesting folders
  #Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
  # Or include everything
 -Patterns *
 +#Patterns *
 +Patterns * !register  !.register
  # Automatically create missing mailboxes, both locally and on the server
  #Create Both
  Create slave
  # Sync the movement of messages between folders and deletions, add after making sure the sync works
  #Expunge Both
 +IMAPAccount anarcat-register
 +User register
 +PassCmd "pass"
 +CertificateFile /etc/ssl/certs/ca-certificates.crt
 +IMAPStore anarcat-register-remote
 +Account anarcat-register
 +MaildirStore anarcat-register-local
 +SubFolders Maildir++
 +Inbox ~/Maildir-mbsync/.register/
 +Channel anarcat-register
 +Master :anarcat-register-remote:
 +Slave :anarcat-register-local:
 +Create slave

"tmp" folders and empty messages

After syncing the "register" messages, I end up with the measly little 160 emails out of sync:

anarcat@angela:~(main)$ find Maildir-mbsync/  -type f -a \! -name '.*' | wc -l 
anarcat@angela:~(main)$ find Maildir/  -type f -a \! -name '.*' | wc -l 

Argh. After more digging, I have found 131 mails in the tmp/ directories of the client's mail spool. Mysterious! On the server side, it's even more files, and not the same ones. Possible that those were mails that were left there during a failed delivery of some sort, during a power failure or some sort of crash? Who knows. It could be another race condition in SMD if it runs while mail is being delivered in tmp/...

The first thing to do with those is to cleanup a bunch of empty files (21 on angela):

find .[^.]*/tmp -type f -empty -delete

As it turns out, they are all duplicates, in the sense that notmuch can easily find a copy of files with the same message ID in its database. In other words, this hairy command returns nothing

find .[^.]*/tmp -type f | while read path; do
  msgid=$(grep -m 1  -i ^message-id "$path" | sed 's/Message-ID: //i;s/[<>]//g');
  if notmuch count --exclude=false  "id:$msgid" | grep -q 0; then
    echo "$path <$msgid> not in notmuch" ;

... which is good. Or, to put it another way, this is safe:

find .[^.]*/tmp -type f -delete

Poof! 314 mails cleaned on the server side. Interestingly, SMD doesn't pick up on those changes at all and still sees files in tmp/ directories on the client side, so we need to operate the same twisted logic there.

notmuch to the rescue again

After cleaning that on the client, we get:

anarcat@angela:~(main)$ find Maildir/  -type f -a \! -name '.*' | wc -l 
anarcat@angela:~(main)$ find Maildir-mbsync/  -type f -a \! -name '.*' | wc -l 

Ha! 27 mails difference. Those are the really sticky, unclear ones. I was hoping a full sync might clear that up, but after deleting the entire directory and starting from scratch, I end up with:

anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*' | wc -l 
anarcat@angela:~(main)$ find Maildir-mbsync -type f -type f -a \! -name '.*' | wc -l 

That is: even more messages missing (now 37). Sigh.

Thankfully, this is something notmuch can help with: it can index all files by Message-ID (which I learned is case-insensitive, yay) and tell us which messages don't make it through.

Considering the corruption I found in the mail spool, I wouldn't be the least surprised those messages are just skipped by the IMAP server. Unfortunately, there's nothing on the Dovecot server logs that would explain the discrepancy.

Here again, notmuch comes to the rescue. We can list all message IDs to figure out that discrepancy:

notmuch search --exclude=false --output=messages '*' | pv -s 18M | sort > Maildir-msgids
notmuch --config=.notmuch-config-mbsync search --exclude=false --output=messages '*' | pv -s 18M | sort > Maildir-mbsync-msgids

And then we can see how many messages notmuch thinks are missing:

$ wc -l *msgids
372723 Maildir-mbsync-msgids
372752 Maildir-msgids

That's 29 messages. Oddly, it doesn't exactly match the find output:

anarcat@angela:~(main)$ find Maildir-mbsync -type f -type f -a \! -name '.*' | wc -l 
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*' | wc -l 

That is 10 more messages. Ugh. But actually, I know what those are: more misfiled messages (in a .folder/draft/ directory, bizarrely, so the totals actually match.

In the notmuch output, there's a lot of stuff like this:


Those are messages without a valid Message-ID. Notmuch (presumably) constructs one based on the file's checksum. Because the files differ between the IMAP server and the local mail spool (which is unfortunate, but possibly inevitable), those do not match. There are exactly the same number of those on both sides, so I'll go ahead and assume those are all accounted for.

What remains is:

anarcat@angela:~(main)$ diff -u Maildir-mbsync-msgids Maildir-msgids  | grep '^\-[^-]' | grep -v sha1 | wc -l 
anarcat@angela:~(main)$ diff -u Maildir-mbsync-msgids Maildir-msgids  | grep '^\+[^+]' | grep -v sha1 | wc -l 

ie. 21 missing from mbsync, and, surprisingly, 2 missing from the original mail spool.

Further inspection also showed they were all messages with some sort of "corruption": no body and only headers. I am not sure that is a legal email format in the first place. Since they were mostly spam or administrative emails ("You have been unsubscribed from mailing list..."), it seems fairly harmless to ignore those.


As we'll see in the next article, SMD has stellar performance. But that comes at a huge cost: it accesses the mail storage directly. This can (and has) created significant problems on the mail server. It's unclear exactly why those things happen, but Dovecot expects a particular storage format on its file, and it seems unwise to bypass that.

In the future, I'll try to remember to avoid that, especially since mechanisms like SMD require special server access (SSH) which, in the long term, I am not sure I want to maintain or expect.

In other words, just talking with an IMAP server opens up a lot more possibilities of hosting than setting up a custom synchronisation protocol over SSH. It's also safer and more reliable, as we have seen. Thankfully, I've been able to recover from all the errors I could find, but it could have gone differently and it would have been possible for SMD to permanently corrupt significant part of my mail archives.

In the end, however, the last drop was just another weird bug which, ironically, SMD mysteriously recovered from on its own while I was writing this documentation and migrating away from it.

In any case, I recommend SMD users start looking for alternatives. The project has been archived upstream, and the Debian package has been orphaned. I have seen significant mail box corruption, including entire mail spool destruction, mostly due to incorrect locking code. I have filed a release-critical bug in Debian to make sure it doesn't ship with Debian bookworm.

Alternatives like mbsync provide fast and reliable transport, including over SSH. See the next article for further discussion of the alternatives.

22 November, 2021 02:22AM

November 21, 2021

Julian Andres Klode

APT Z3 Solver Basics

Z3 is a theorem prover developed at Microsoft research and available as a dynamically linked C++ library in Debian-based distributions. While the library is a whopping 16 MB, and the solver is a tad slow, it’s permissive licensing, and number of tactics offered give it a huge potential for use in solving dependencies in a wide variety of applications.

Z3 does not need normalized formulas, but offers higher level abstractions like atmost and atleast and implies, that we will make use of together with boolean variables to translate the dependency problem to a form Z3 understands.

In this post, we’ll see how we can apply Z3 to the dependency resolution in APT. We’ll only discuss the basics here, a future post will explore optimization criteria and recommends.

Translating the universe

APT’s package universe consists of 3 relevant things: packages (the tuple of name and architecture), versions (basically a .deb), and dependencies between versions.

While we could translate our entire universe to Z3 problems, we instead will construct a root set from packages that were manually installed and versions marked for installation, and then build the transitive root set from it by translating all versions reachable from the root set.

For each package P in the transitive root set, we create a boolean literal P. We then translate each version P1, P2, and so on. Translating a version means building a boolean literal for it, e.g. P1, and then translating the dependencies as shown below.

We now need to create two more clauses to satisfy the basic requirements for debs:

  1. If a version is installed, the package is installed; and vice versa. We can encode this requirement for P above as P == atleast({P1,P2}, 1).
  2. There can only be one version installed. We add an additional constraint of the form atmost({P1,P2}, 1).

We also encode the requirements of the operation.

  1. For each package P that is manually installed, add a constraint P.
  2. For each version V that is marked for install, add a constraint V.
  3. For each package P that is marked for removal, add a constraint !P.


Packages in APT have dependencies of two basic forms: Depends and Conflicts, as well as variations like Breaks (identical to Conflicts in solving terms), and Recommends (soft Depends) - we’ll ignore those for now. We’ll discuss Conflicts in the next section.

Let’s take a basic dependency list: A Depends: X|Y, Z. To represent that dependency, we expand each name to a list of versions that can satisfy the dependency, for example X1|X2|Y1, Z1.

Translating this dependency list to our Z3 solver, we create boolean variables X1,X2,Y1,Z1 and define two rules:

  1. A implies atleast({X1,X2,Y1}, 1)
  2. A implies atleast({Z1}, 1)

If there actually was nothing that satisfied the Z requirement, we’d have added a rule not A. It would be possible to simply not tell Z3 about the version at all as an optimization, but that adds more complexity, and the not A constraint should not cause too many problems.


Conflicts cannot have or in them. A dependency B Conflicts: X, Y means that only one of B, X, and Y can be installed. We can directly encode this in Z3 by using the constraint atmost({B,X,Y}, 1). This is an optimized encoding of the constraint: We could have encoded each conflict in the form !B or !X, !B or !X, and so on. Usually this leads to worse performance as it introduces additional clauses.

Complete example

Let’s assume we start with an empty install and want to install the package a below.

Package: a
Version: 1
Depends: c | b

Package: b
Version: 1

Package: b
Version: 2
Conflicts: x

Package: d
Version: 1

Package: x
Version: 1

The translation in Z3 rules looks like this:

  1. Package rules for a:
    1. a == atleast({a1}, 1) - package is installed iff one version is
    2. atmost({a1}, 1) - only one version may be installed
    3. a – a must be installed
  2. Dependency rules for a
    1. implies(a1, atleast({b2, b1}, 1)) – the translated dependency above. note that c is gone, it’s not reachable.
  3. Package rules for b:
    1. b == atleast({b1,b2}, 1) - package is installed iff one version is
    2. atmost({b1, b2}, 1) - only one version may be installed
  4. Dependencies for b (= 2):
    1. atmost({b2, x1}, 1) - the conflicts between x and b = 2 above
  5. Package rules for x:
    1. x == atleast({x1}, 1) - package is installed iff one version is
    2. atmost({x1}, 1) - only one version may be installed

The package d is not translated, as it is not reachable from the root set {a1}, the transitive root set is {a1,b1,b2,x1}.

Next iteration: Optimization

We have now constructed the basic set of rules that allows us to solve solve our dependency problems (equivalent to SAT), however it might lead to suboptimal solutions where it removes automatically installed packages, or installs more packages than necessary, to name a few examples.

In our next iteration, we have to look at introducing optimization; for example, have the minimum number of removals, the minimal number of changed packages, or satisfy as many recommends as possible. We will also look at the upgrade problem (upgrade as many packages as possible), the autoremove problem (remove as many automatically installed packages as possible).

21 November, 2021 07:49PM

Antoine Beaupré

Another syncmaildir crash

So I had another major email crash with my syncmaildir setup. This time I was at least able to confirm the issue, and I still haven't lost mail thanks to backups and sheer luck (again).

The crash

It is not really worth going over the crash in details, it's fairly similar to the last one: something bad happened and smd started destroying everything. The hint is that it takes a long time to do what usually takes seconds. It helps that I now have a second monitor showing logs.

I still lost much more mail than the last time. I used to have "301 723 messages", according to notmuch. But then when I ran smd-pull by hand, it was telling me:

95K emails scanned

Oops. You can see notmuch happily noticing the destroyed files on the server:

jun 28 16:33:40 marcos notmuch[28532]: No new mail. Removed 65498 messages. Detected 1699 file renames.
jun 28 16:36:05 marcos notmuch[29746]: No new mail. Removed 68883 messages. Detected 2488 file renames.
jun 28 16:41:40 marcos notmuch[31972]: No new mail. Removed 118295 messages. Detected 3657 file renames.

The final count ended up being 81 042 messages, according to notmuch. A whopping 220 000 mails deleted.

The interesting bit, this time around, is that I caught smd in the act of running two processes in parallel:

jun 28 16:30:09 curie systemd[2845]: Finished pull emails with syncmaildir. 
jun 28 16:30:09 curie systemd[2845]: Starting push emails with syncmaildir... 
jun 28 16:30:09 curie systemd[2845]: Starting pull emails with syncmaildir... 

So clearly that is the source of the bug.


Emergency stop on curie:

notmuch dump > notmuch.dump
systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer

On marcos (the server), guessed the number of messages delivered since the last backup to be 71, just looking at timestamps in the mail log. Made a list:

grep postfix/local /var/log/mail.log | tail -71 > lost-mail

Found postfix queue IDs:

sed 's/.*\]://;s/:.*//' lost-mail > qids

Turn those into message IDs, find those that are missing from the disk (had previously ran notmuch new just to be sure it's up to date):

while read qid ; do 
    grep "$qid: message-id" /var/log/mail.log
done < qids  | sed 's/.*message-id=<//;s/>//' | while read msgid; do
    sudo -u anarcat notmuch count --exclude=false id:$msgid | grep -q 0 && echo $msgid

Copy this back on curie as missing-msgids and:

$ wc -l missing-msgids 
48 missing-msgids
$ while read msgid ; do notmuch count --exclude=false id:$msgid | grep -q 0 && echo $msgid ; done < missing-msgids

only two mails missing! whoohoo!

Copy those back onto marcos as really-missing-msgids, and look at the full mail logs to see what they are:

~anarcat/src/koumbit-scripts/mail/postfix-trace --from-file really-missing-msgids2

I actually remembered deleting those, so no mail lost!

Rebuild the list of msgids that were lost, on marcos:

while read qid ; do grep "$qid: message-id" /var/log/mail.log; done < qids  | sed 's/.*message-id=<//;s/>//'

Copy that on curie as lost-mail-msgids, then copy the files over in a test dir:

while read msgid ; do
    notmuch search --output=files --exclude=false "id:$msgid"
done < lost-mail-msgids | sed 's#/home/anarcat/Maildir/##' | rsync -v  --files-from=- /home/anarcat/Maildir/

If that looks about right, on marcos:

find restore/Maildir-angela/ -type f | wc -l

... should match the number of missing mails, roughly.

Copy if in the real spool:

while read msgid ; do
    notmuch search --output=files --exclude=false "id:$msgid"
done < lost-mail-msgids | sed 's#/home/anarcat/Maildir/##' | rsync -v  --files-from=- /home/anarcat/Maildir/

Then on the server, notmuch new should find the new emails, and we shouldn't have any lost mail anymore:

while read qid ; do grep "$qid: message-id" /var/log/mail.log; done < qids  | sed 's/.*message-id=<//;s/>//' | while read msgid; do sudo -u anarcat notmuch count --exclude=false id:$msgid | grep -q 0 && echo $msgid ; done

Then, crucial moment, try to pull the new mails from the backups on curie:

anarcat@curie:~(main)$ smd-pull  -n  --show-tags -v
Found lockfile of a dead instance. Ignored.
Phase 0: handshake
Phase 1: changes detection
    5K emails scanned
   10K emails scanned
   15K emails scanned
   20K emails scanned
   25K emails scanned
   30K emails scanned
   35K emails scanned
   40K emails scanned
   45K emails scanned
   50K emails scanned
Phase 2: synchronization
Phase 3: agreement
default: smd-client@localhost: TAGS: stats::new-mails(49687), del-mails(0), bytes-received(215752279), xdelta-received(3703852)
"smd-pull  -n  --show-tags -v" took 3 mins 39 secs

This brought me back to the state after the backup plus the mails delivered during the day, which means I had to catchup with all my holiday's read emails (1440 mails!) but thankfully I made a dump of the notmuch database on curie at the start of the procedure, so this actually restored a sane state:

pv notmuch.dump | notmuch restore



I have filed this as a bug in upstream issue 18. Considering I filed 11 issues and only 3 of those were closed, I'm not holding my breath. I nevertheless filed PR 19 in the hope that this will fix my particular issue, but I'm not even sure this is the right fix...


At this point, I'm really ready to give up on SMD. It's really, really nice to be able to sync mail over SSH because I don't need to store my IMAP password on disk. But surely there are more reliable syncing mechanisms. I do not remember ever losing that much mail before. At worst, offlineimap would duplicate emails like mad, but never destroy my entire mail spool that way.

Update: I have migrated to another tool after one more failure from SMD, see the crash log and recovery.

As mentioned before, there are other programs that sync mail. See the mbsync vs OfflineIMAP review the list of alternatives.

21 November, 2021 04:05PM

Free Software Fellowship

Justin Flory, UNICEF & Red Hat's Jeffrey Epstein moment

Justin Flory, a UNICEF employee affiliated with Red Hat, has recently commenced living in Tirana, Albania, as noted on his blog.

He originally stated the location on his blog and it is captured by, the Way Back Machine from July up to October 2021.

On 11 October 2021, the UN's International Day of the Girl Child, Fellowship published evidence of underage girls in the Tirana hackerspace. Flory has removed references to Tirana and Albania from the page, it is missing from the live version of his page today.

UNICEF, underage girls, Albania

In this photo, we can see Justin Flory lying down. On the left is Elio Qoshi, the subject of outrage about grooming an underage girlfriend for Outreachy. Immediately behind Flory's head, we see a woman who was shortlisted for the Red Hat women in open source award.

What is the real reason Flory is having a long stay in Albania?

Many of these women want a foreign husband, maybe even Justin, that is why other local men don't visit this group. The women all had some free trips paid for by big organizations and after one or two of those trips, their view of the world completely changes and they stop dating any local men.

With all the women over 18 acquiring a passport and chasing foreign men, the local men like Elio simply have no choice other than the underage girls who have no passport and haven't been spoilt by the gifts from abroad.

Elio Qoshi with the 80% female cast.

21 November, 2021 02:00PM

Debian Community News

Mark Shuttleworth, Elio Qoshi & Debian/Ubuntu underage girls

The Free Software Fellowship recently published evidence of the Albanian gangmaster and Mozilla Tech Speaker recruiting and/or grooming teenage girls in a hackerspace.

Anja Xhakani

In 2019, when Dr Richard Stallman commented privately on the Epstein affair at MIT, his words were twisted beyond recognition and used as an excuse for a lynch mob to bully him into resigning.

Dr Richard Stallman, FSF, MIT

Yet what we see in Albania is far worse. It is not merely discussion about underage girls: if you hang around there long enough, it is very likely you will meet some of these women. Or children, or whatever we should call them.

These situations are inevitable in developing countries. Nonetheless, we have made the discovery that Elio Qoshi is now employed by Mark Shuttleworth at Canonical Ltd (Ubuntu).

Elio Qoshi, Canonical, Mark Shuttleworth

Nobody has asked for either of them to resign. Why were people so quick to organize a lynch mob against Dr Stallman but so tolerant of Elio Qoshi?

As far as we can see, many British Debian/Ubuntu developers live in the Cambridge region to the north of London. They have low cost direct flights to Albania from the local Luton airport. Almost every Friday morning at 6am there is at least one developer standing in the queue for the 3 hour Wizzair flight to Tirana.

In the same queue, you are reminded what is at stake: attractive young Albanian wives with British husbands taking their baby back to meet the grandparents for the weekend. By marrying up into the UK, Ireland, Germany or Switzerland, these women can help their own grandparents to retire in dignity by sending home €100 per week.

Luton, Tirana, WizzAir

Developers have decided to turn a blind eye. Don't rock the boat. As long as everybody is getting girls, the welfare of teenage girls is not important.

The former leader of Debian, Chris Lamb, was infamously photographed dining with an Albanian woman just weeks before she won a $6,000 Outreachy internship. When the leaders are this heavily involved, it is no wonder they can't question Mr Qoshi's behavior as they don't want to draw attention to their own decisions.

DebConf19, Chris Lamb, lamby, outreachy candidate, dinner date

In 2022, they hope to have DebConf22 in Kosovo and they are hoping that a bus full of girls will make the 250km journey from Albania to meet the Debian professionals in Kosovo.

As the photos show us, Elio Qoshi has been highly effective at motivating girls to come to these events but not very good at getting them to write code. Nobody is asking them to code.

Techrights has captured a quote from Mark Shuttleworth, as relayed by an earlier Ubuntu employee, Jeff Waugh:

Jeff Waugh [15:59]: “What happened was, we’ve had some very… some of the very early initial meetings (when, you know, there were about 10 people) and Mark [Shuttleworth] showed this picture… and it was of a girl called Sabrina, or that’s the name that he gave her. And it was a very [?] tone, Vaseline on the lens kind of shot, and it was a… a very beautiful shot, but it was with a girl with her face turned away but her breasts perfectly visible. And he saying to everyone, ‘this is what I want the desktop to look like.’”
DebConf22, women, bus, Albania, Kosovo, diversity

21 November, 2021 12:30PM

November 20, 2021

Donald Norwood & Debian spamming users with defamation and dirty politics

Donald Norwood, Debian, Defamation, backstabbing, publicity team Donald Norwood, Debian, publicity team, backstabbing, blackmail, defamation, harassment

Many Debian users received spam from Donald Norwood through the debian-news email list and social media networks this week.

Norwood has attacked a developer with over 20 years experience.

Looking at Norwood's own profile, we see that he has never made a single package. He is one of the non-developing developers. Many of these non-developing developers are wives and girlfriends who have an honorary title. We don't know exactly how Norwood got this title but nonetheless, if he never made a package himself, how can he judge the competence of a real Debian Developer?

Any genuine expulsion involves a tribunal or grievance process where both sides get to review the evidence together and the accused has a right of reply. When we look at the accusations from Norwood, there is no hint of any evidence, no references, no report from any tribunal. It is pure defamation.

Moreover, the volunteer in question was never actually a member of Software in the Public Interest, the incorporated association. It simply isn't possible to expel somebody who is not a member. To falsely claim that a non-member was expelled is much like bouncing a cheque, it is a fraudulent assertion in every way.


It is important to ask the question: why did Donald Norwood spontaneously decide to attack this volunteer?

Norwood's behavior is entirely consistent with a mafia gangster. This is the work of somebody who can not cook but he goes into a restaurant and demands a free lunch. He threatens to write defamatory reviews on social media and TripAdvisor if the restaurant does not obey his Code of Conduct and serve him for free.

Of course there is more to it than that. It looks like Debian is keen to discredit anybody who may ask questions about the relationships between leadership figures and the women/students doing internships. There have been a number of high-profile cases recently, including the Albanian women granted an internship after 2 years of close encounters with Debian leader Chris Lamb and the Mozilla Tech Speaker grooming an underage girl for Outreachy.

Reading between the lines, the defamatory publication from Norwood is telling all the other volunteers that if we ask probity questions, we will be banned/censored from all Debian's communications infrastructure.

Anisa Kuci, Outreachy, DebConf19, interns, Albania, Chris Lamb, girlfriend, Wikimedia, OpenStreetMap, Open Labs, OSCAL


Donald Norwood's attack was purely political. We present some facts to prove it:

  1. The community elected the victim as the FSFE Fellowship representative in 2017
  2. The same victim submitted a nomination for the role of Debian Project Leader in 2019
  3. The same victim submitted a nomination for the Fedora Council in 2020
  4. Donald Norwood published the attack shortly before the victim arrived at the annual meeting of Swiss financial institution

Prosecuting Donald Norwood

We don't want to encourage any Kyle Rittenhouse antics but nonetheless if you need to make a private prosecution for harassment you need to go to the court house with the address of the suspect.

New York business records contain the details of Norwood's home in Brooklyn which is also used as a business address:

The Portalus Group Corporation
Director: Donald Norwood
512 Monroe Street
Brooklyn NY 11221

Anja Xhakani

Jury accepts argument that Rittenhouse acted in self defense when faced by a lynch mob

Kyle Rittenhouse, self defense, character assassins

20 November, 2021 10:30PM

hackergotchi for Jonathan Dowland

Jonathan Dowland

hledger footguns

I wrote in budgeting tools that I was taking a look at Plain Text Accounting and in particular, hledger. My Jury's still out on the tools, but in the time I've been looking at them I've come across a couple of foot-guns I thought it was worth writing down.

hledger's ledger format is derived from that of its predecessor ledger, and so some of the problems might be inherited.

1. significant white space delimiters

The basic syntax for a transaction looks like this

2020-03-15 client payment
    assets:checking         $ 2000
    income:consulting       $-2000

There's some significant white space delimiters in play. The most subtle is what separates the account names from the values: it is two or more spaces. A single space, and the value is treated as part of the account name. For some reason I hit this frequently with trying to encode opening balances: the account name used as the source of the initial balances is something not otherwise generally referred to again (something like equity:opening balances) and the transaction amount is inferred where possible, so I ended up with a bunch of accounts named equity:opening balances £100 and similar.

2. flexible decimal delimiter

The value of transactions can be interspersed with commas and periods to make it more readable: e.g. $2000 could be written as $2,000. Different locales have different conventions here: It seems some(/most/all?) of Europe use periods to separate out the units and a comma to delimit the fractional part, whereas the US and the UK do the opposite. There is no built-in association between the currency symbol you are using and the period/comma convention: it's quite possible to accidentally write a number which is interpreted differently to how you intended, and it doesn't matter if you are using $ or £ etc.

3. new syntax has unexpected results in old versions

Finally, my favourite. hledger has a notion of rules that can be used to match transactions when importing from CSV. The format looks like this:

if (match rule)
& (another rule)
account1 some:account:from
account2 some:account:to

By default, multiple rules in sequence like above are OR'd: any of them can match. The & prefix switches the behaviour to AND. But, & is a relatively new addition: it's not supported in 1.18.1, the version in Debian stable, which upstream released in June 2020. In prior versions the & prefix is not a syntax error, or at least, not one that's reported: it's silently ignored; meaning, the line with the & does nothing, and any of the other rules in the set will match. This is easy to miss, and means imports could be incorrectly posted.

20 November, 2021 09:03PM

November 19, 2021

Mike Hommey

Announcing git-cinnabar 0.5.8

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.7?

  • Updated git to 2.34.0 for the helper.
  • Python 3.5 and newer are now officially supported. Git-cinnabar will try to use the python3 program by default, but will fallback to python2.7 if that’s where the Mercurial libraries are available. It is possible to pick a specific python with the GIT_CINNABAR_PYTHON environment variable.
  • Fixed compatibility with Mercurial 5.8 and newer.
  • The prebuilt binaries are now optimized on arm64 macOS and Windows.
  • git cinnabar download now properly returns an error code when failing to extract the prebuilt binaries.
  • Pushing to a non-empty Mercurial repository without having pulled at least once from it is now prevented.
  • Replaced the nagging about fsck with a smaller check always happening after pulling.
  • Fail earlier on git fetch hg::url <sha1> (it would properly fetch the Mercurial changeset and its ancestors, but git would fail at the end because the sha1 is not a git sha1 ; use git cinnabar fetch instead)
  • Minor fixes.

19 November, 2021 10:05PM by glandium

hackergotchi for Gunnar Wolf

Gunnar Wolf

For our millionth bug, bookworms eat raspberries alive

I guess you already heard, right? The Debian Bug Tracking System has hit a big milestone! We just passed our one millionth bug report! (and yes, that’s a cause for celebration; bug reporting is probably the best way for the system to grow and improve)

So, to celebrate, I want to announce I have nudged our unofficial Raspberry Pi images build scripts to now also build images for our upcoming Debian release, Debian 12 «Bookworm»

(image above: A bookworm learns about raspberries in various stages of testing. Image sources: Transformers Wiki, CC BY-SA and Sam Saunders at Flickr, CC BY-SA)

So… Get’em while they are fresh!! And enjoy the following (non-book)worm-on-a-raspberry picture from Wikimedia Commons:

Oh, FWIW – The site still shows images for Buster. You will notice they are no longer being autobuilt (why spend CPU time in something that’s no longer going to change significatively?). The Bookworm images are not yet tested; as soon as I can test them, I will drop the Buster ones.

19 November, 2021 03:37PM

hackergotchi for Evgeni Golov

Evgeni Golov

A String is not a String, and that's Groovy!

Halloween is over, but I still have some nightmares to share with you, so sit down, take some hot chocolate and enjoy :)

When working with Jenkins, there is almost no way to avoid writing Groovy. Well, unless you only do old style jobs with shell scripts, but y'all know what I think about shell scripts…

Anyways, Eric have been rewriting the jobs responsible for building Debian packages for Foreman to pipelines (and thus Groovy).

Our build process for pull requests is rather simple:

  1. Setup sources - get the orig tarball and adjust changelog to have an unique version for pull requests
  2. Call pbuilder
  3. Upload the built package to a staging archive for testing

For merges, it's identical, minus the changelog adjustment.

And if there are multiple packages changed in one go, it runs each step in parallel for each package.

Now I've been doing mass changes to our plugin packages, to move them to a shared postinst helper instead of having the same code over and over in every package. This required changes to many packages and sometimes I'd end up building multiple at once. That should be fine, right?

Well, yeah, it did build fine, but the upload only happened for the last package. This felt super weird, especially as I was absolutely sure we did test this scenario (multiple packages in one PR) and it worked just fine…

So I went on a ride though the internals of the job, trying to understand why it didn't work.

This requires a tad more information about the way we handle packages for Foreman:

  • the archive is handled by freight
  • it has suites like buster, focal and plugins (that one is a tad special)
  • each suite has components that match Foreman releases, so 2.5, 3.0, 3.1, nightly etc
  • core packages (Foreman etc) are built for all supported distributions (right now: buster and focal)
  • plugin packages are built only once and can be used on every distribution

As generating the package index isn't exactly fast in freight, we tried not not run it too often. The idea was that when we build two packages for the same target (suite/version combination), we upload both at once and run import only once for both. That means that when we build Foreman for buster and focal, this results in two parallel builds and then two parallel uploads (as they end up in different suites). But if we build Foreman and Foreman Installer, we have four parallel builds, but only two parallel uploads, as we can batch upload Foreman and Installer per suite. Well, or so was the theory.

The Groovy code, that was supposed to do this looked roughly like this:

def packages_to_build = find_changed_packages()
def repos = [:]

packages_to_build.each { pkg ->
    suite = 'buster'
    component = '3.0'
    target = "${suite}-${component}"

    if (!repos.containsKey(target)) {
        repos[target] = []



That's pretty straight forward, no? We create an empty Map, loop over a list of packages and add them to an entry in the map which we pre-create as empty if it doesn't exist.

Well, no, the resulting map always ended with only having one element in each target list. And this is also why our original tests always worked: we tested with a PR containing changes to Foreman and a plugin, and plugins go to this special target we have…

So I started playing with the code ( is really great for that!), trying to understand why the heck it erases previous data.

The first finding was that it just always ended up jumping into the "if map entry not found" branch, even though the map very clearly had the correct entry after the first package was added.

The second one was weird. I was trying to minimize the reproducer code (IMHO always a good idea) and switched target = "${suite}-${component}" to target = "lol". Two entries in the list, only one jump into the "map entry not found" branch. What?! �

So this is clearly related to the fact that we're using String interpolation here. But hey, that's a totally normal thing to do, isn't it?!

Admittedly, at this point, I was lost. I knew what breaks, but not why.

Luckily, I knew exactly who to ask: Jens.

After a brief "well, that's interesting", Jens quickly found the source of our griefs: Double-quoted strings are plain java.lang.String if there’s no interpolated expression, but are groovy.lang.GString instances if interpolation is present.. And when we do repos[target] the GString target gets converted to a String, but when we use repos.containsKey() it remains a GString. This is because GStrings get converted to Strings, if the method wants one, but containsKey takes any Object while the repos[target] notation for some reason converts it. Maybe this is because using GString as Map keys should be avoided.

We can reproduce this with simpler code:

def map = [:]
def something = "something"
def key = "${something}"
map[key] = 1
println key.getClass()
map.keySet().each {println it.getClass() }
map.keySet().each {println it.equals(key)}
map.keySet().each {println it.equals(key as String)}

Which results in the following output:

class org.codehaus.groovy.runtime.GStringImpl
class java.lang.String

With that knowledge, the fix was to just use the same repos[target] notation also for checking for existence — Groovy helpfully returns null which is false-y when it can't find an entry in a Map absent.

So yeah, a String is not always a String, and it'll bite you!

19 November, 2021 02:16PM by evgeni

hackergotchi for Neil Williams

Neil Williams

git worktrees

A few scenarios have been problematic with git and I've now discovered git worktrees which help with each.

  • If you've wanted to compare multiple files in different branches of the same tree - without needing to commit on either side.
  • If you want to work on two (or more) versions of the same file at the same time, again without needing to commit.
  • You have a file or a bunch of files that aren't ready to be committed, even locally.
  • You are working on a development branch and an urgent fix is required on an old git tag.
  • You have a large git repository which is a burden to clone (or has complex submodules).

You could go to the trouble of making a new directory and re-cloning the same tree. However, a local commit in one tree is then not accessible to the other tree.

You could commit everything every time, but with a dirty tree, that involves sorting out the .gitignore rules as well. That could well be pointless with an experimental change.

Git worktrees allow multiple filesystems from a single git tree. Commits on any branch are visible from other branches, even when the commit was on a different worktree. This makes things like cherry-picking easy, without needing to push pointless changes or branches.

Branches on a worktree can be rebased as normal, with the benefit that commit hashes from other local changes are available for reference and cherry-picks.

I'm sure git worktrees are not new. However, I've only started using them recently and others have asked about how the worktree operates.

Creating a new tree can be done with a new or existing branch. To make it easier, set the new directory at the same time, usually in ../

New branch (branched from the current branch):

git worktree add -b branch_name ../branch_name

Existing branch - note, slightly different syntax here, specify the commit-ish last (branch name, tag or hash):

git worktree add ../branch_name branch_name
git worktree list
/home/neil/Documents/testing/testrepo        0612677 [master]
/home/neil/Documents/testing/testtree        d38f5a3 [testtree]

Use git worktree remove <name> to drop the entire directory for that tree and the git tracking.

I'm using this for work on the Debian Security Tracker. I have two local branches and having two worktrees allows me to have three terminals open, using the same files and the same git repository.

One to run make serve and update the local SQLite database. One to access master to run git pull One to make local changes without risking collisions on master.

git add data/CVE/list
git commit
# pre commit hook runs here
git log -n 1
# copy the hash
# switch to master terminal
git pull
git cherry-pick <HASH>
git push
# switch to server terminal
git rebase master
# no git pull or fetch, it's all local
# switch back to changes terminal
git rebase master

Sadly, one area where this isn't as easy is with importing a new DSC into Salsa with git-buildpackage as that uses several branches at the same time. It would be possible but you'll need to have a separate upstream and possibly pristine-tar branches and supply the relevant options. Possibly something git-buildpackage to adopt - it is common to need to make changes to the packaging with a new upstream release & a lot of those changes are currently done outside git.

For the rest of the support, see git worktree (1)

19 November, 2021 01:26PM by Neil Williams

hackergotchi for Bits from Debian

Bits from Debian

New Debian Developers and Maintainers (September and October 2021)

The following contributors got their Debian Developer accounts in the last two months:

  • Bastian Germann (bage)
  • Gürkan Myczko (tar)

The following contributors were added as Debian Maintainers in the last two months:

  • Clay Stan
  • Daniel Milde
  • David da Silva Polverari
  • Sunday Cletus Nkwuda
  • Ma Aiguo
  • Sakirnth Nagarasa


19 November, 2021 12:00PM by Jean-Pierre Giraud

hackergotchi for Mike Gabriel

Mike Gabriel

Improbability of a million, lintian thinks...

An interesting mindset overcome by reality...

Also, lintian does not differentiate between between 100.000 and 1.000.000.

W: ayatana-indicator-display: improbable-bug-number-in-closes 1000143
N:   The most recent changelog closes a low-numbered bug number. While this is distantly possible, it's more likely a typo or
N:   a placeholder value that mistakenly wasn't filled in.
N:   Visibility: warning
N:   Show-Always: no
N:   Check: debian/changelog



19 November, 2021 07:08AM by sunweaver

November 18, 2021

hackergotchi for Christoph Berg

Christoph Berg

PostgreSQL and Undelete


Earlier this week, I updated pg_dirtyread to work with PostgreSQL 14. pg_dirtyread is a PostgreSQL extension that allows reading "dead" rows from tables, i.e. rows that have already been deleted, or updated. Of course that works only if the table has not been cleaned-up yet by a VACUUM command or autovacuum, which is PostgreSQL's garbage collection machinery.

Here's an example of pg_dirtyread in action:

# create table foo (id int, t text);
# insert into foo values (1, 'Doc1');
# insert into foo values (2, 'Doc2');
# insert into foo values (3, 'Doc3');

# select * from foo;
 id │  t
  1 │ Doc1
  2 │ Doc2
  3 │ Doc3
(3 rows)

# delete from foo where id < 3;

# select * from foo;
 id │  t
  3 │ Doc3
(1 row)

Oops! The first two documents have disappeared.

Now let's use pg_dirtyread to look at the table:

# create extension pg_dirtyread;

# select * from pg_dirtyread('foo') t(id int, t text);
 id │  t
  1 │ Doc1
  2 │ Doc2
  3 │ Doc3

All three documents are still there, but only one of them is visible.

pg_dirtyread can also show PostgreSQL's system colums with the row location and visibility information. For the first two documents, xmax is set, which means the row has been deleted:

# select * from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text);
 ctid  │ xmin │ xmax │ id │  t
 (0,1) │ 1577 │ 1580 │  1 │ Doc1
 (0,2) │ 1578 │ 1580 │  2 │ Doc2
 (0,3) │ 1579 │    0 │  3 │ Doc3
(3 rows)


Caveat: I'm not promising any of the ideas quoted below will actually work in practice. There are a few caveats and a good portion of intricate knowledge about the PostgreSQL internals might be required to succeed properly. Consider consulting your favorite PostgreSQL support channel for advice if you need to recover data on any production system. Don't try this at work.

I always had plans to extend pg_dirtyread to include some "undelete" command to make deleted rows reappear, but never got around to trying that. But rows can already be restored by using the output of pg_dirtyread itself:

# insert into foo select * from pg_dirtyread('foo') t(id int, t text) where id = 1;

This is not a true "undelete", though - it just inserts new rows from the data read from the table.


Enter pg_surgery, which is a new PostgreSQL extension supplied with PostgreSQL 14. It contains two functions to "perform surgery on a damaged relation". As a side-effect, they can also make delete tuples reappear.

As I discovered now, one of the functions, heap_force_freeze(), works nicely with pg_dirtyread. It takes a list of ctids (row locations) that it marks "frozen", but at the same time as "not deleted".

Let's apply it to our test table, using the ctids that pg_dirtyread can read:

# create extension pg_surgery;

# select heap_force_freeze('foo', array_agg(ctid))
    from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text) where id = 1;

(1 row)

Et voilà, our deleted document is back:

# select * from foo;
 id │  t
  1 │ Doc1
  3 │ Doc3
(2 rows)

# select * from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text);
 ctid  │ xmin │ xmax │ id │  t
 (0,1) │    2 │    0 │  1 │ Doc1
 (0,2) │ 1578 │ 1580 │  2 │ Doc2
 (0,3) │ 1579 │    0 │  3 │ Doc3
(3 rows)


Most importantly, none of the above methods will work if the data you just deleted has already been purged by VACUUM or autovacuum. These actively zero out reclaimed space. Restore from backup to get your data back.

Since both pg_dirtyread and pg_surgery operate outside the normal PostgreSQL MVCC machinery, it's easy to create corrupt data using them. This includes duplicated rows, duplicated primary key values, indexes being out of sync with tables, broken foreign key constraints, and others. You have been warned.

pg_dirtyread does not work (yet) if the deleted rows contain any toasted values. Possible other approaches include using pageinspect and pg_filedump to retrieve the ctids of deleted rows.

Please make sure you have working backups and don't need any of the above.

18 November, 2021 10:21AM

November 17, 2021

hackergotchi for Daniel Pocock

Daniel Pocock

Debian expulsion lies & blackmail

Rogue members of the Debian ecosystem continue to spread untrue statements about expulsions.

Update 2021-11-21: rogue elements of Debian now threatening to reprimand and attack anybody who asks questions about the personal relationships that benefited from the $10,000 diversity budget and the Outreachy grooming evidence. The largest Debian mailing lists and IRC channels are now under partial or complete moderation/censorship, directly contradicting the Debian Social Contract.

It is easy to prove these expulsions are not only untrue but also impossible.

Debian is not an organization. Debian is simply a trademark. The US trademark database shows the trademark is registered to another organization, Software in the Public Interest, Inc.

Most Debian Developers, myself included, have never been members of Software in the Public Interest, Inc.

If we are not members and if we can not join, we can't be expelled.

Imagine making a movie and then removing the names of some people from the credits and putting other names in their place. Is that "expulsion"? Or is that simply stealing credit for somebody else's work? Removing names from the list of Debian Developers is much like removing somebody's name from the credits in a movie. It is wrong.

From the Debian diversity statement

The Debian Project welcomes and encourages participation by everyone.

Banning/censoring some volunteers appears to be incompatible with the diversity statement. Does Debian really care about diversity or is it just more diversitywashing?

Next time I attend a Debian event or join a Debian booth at an event like FOSDEM, I may well wear a t-shirt carrying the opening line of the diversity statement. If some people can't work with the rest of us, they can stay home rather than trying to intimidate people.

Blackmail volunteers for our time, our money and endorsement of the cult

It has all happened before too many times. MJ Ray writes about the scandal of the Debian UK Society in 2006. Said Society asserted that everybody was a member and that "members" who did not pay their tithes and obey the oligarchs would be publicly expelled.

MJ Ray: In June 2005, Ian Jackson chastised me for directing a UK donation away from Debian-UK Society (DUS), quoting a chunk of debian rules

Forced membership under these terms is a close cousin of modern slavery

International Student's Day

Coincidentally, the day of the latest Debian extremism is International Students' Day. I started doing projects with Debian as a student in the 90s. The day commemorates the students murdered by Nazi occupiers in Czechoslovakia, 17 November 1939. Volunteers in Google-occupied free software organizations face fascism in the form of mind games predicated on impossible expulsions.

Related links

Linus Torvalds, Daniel Pocock, Debian, DebConf

17 November, 2021 10:30PM

November 15, 2021

Vincent Bernat

Git as a source of truth for network automation

The first step when automating a network is to build the source of truth. A source of truth is a repository of data that provides the intended state: the list of devices, the IP addresses, the network protocols settings, the time servers, etc. A popular choice is NetBox. Its documentation highlights its usage as a source of truth:

NetBox intends to represent the desired state of a network versus its operational state. As such, automated import of live network state is strongly discouraged. All data created in NetBox should first be vetted by a human to ensure its integrity. NetBox can then be used to populate monitoring and provisioning systems with a high degree of confidence.

When introducing Jerikan, a common feedback we got was: “you should use NetBox for this.” Indeed, Jerikan’s source of truth is a bunch of YAML files versioned with Git.

Why Git?

If we look at how things are done with servers and services, in a datacenter or in the cloud, we are likely to find users of Terraform, a tool turning declarative configuration files into infrastructure. Declarative configuration management tools like Salt, Puppet,1 or Ansible take care of server configuration. NixOS is an alternative: it combines package management and configuration management with a functional language to build virtual machines and containers. When using a Kubernetes cluster, people use Kustomize or Helm, two other declarative configuration management tools. Tapped together, these tools implement the infrastructure as code paradigm.

Infrastructure as code is an approach to infrastructure automation based on practices from software development. It emphasizes consistent, repeatable routines for provisioning and changing systems and their configuration. You make changes to code, then use automation to test and apply those changes to your systems.

― Kief Morris, Infrastructure as Code, O’Reilly.

A version control system is a central tool for infrastructure as code. The usual candidate is Git with a source code management system like GitLab or GitHub. You get:

Traceability and visibility
Git keeps a log of all changes: what, who, why, and when. With a bit of discipline, each change is explained and self-contained. It becomes part of the infrastructure documentation. When the support team complains about a degraded experience for some customers over the last two months or so, you quickly discover this may be related to a change to an incoming policy in New York.
Rolling back
If a change is defective, it can be reverted quickly, safely, and without much effort, even if other changes happened in the meantime. The policy change at the origin of the problem spanned over three routers. Reverting this specific change and deploying the configuration let you solve the situation until you find a better fix.
Branching, reviewing, merging
When working on a new feature or refactoring some part of the infrastructure, a team member creates a branch and works on their change without interfering with the work of other members. Once the branch is ready, a pull request is created and the change is ready to be reviewed by the other team members before merging. You discover the issue was related to diverting traffic through an IX where one ISP was connected without enough capacity. You propose and discuss a fix that includes a change of the schema and the templates used to declare policies to be able to handle this case.
Continuous integration
For each change, automated tests are triggered. They can detect problems and give more details on the effect of a change. Branches can be deployed to a test infrastructure where regression tests are executed. The results can be synthesized as a comment in the pull request to help the review. You check your proposed change does not modify the other existing policies.

Why not NetBox?

NetBox does not share these features. It is a database with a REST and a GraphQL API. Traceability is limited: changes are not grouped into a transaction and they are not documented. You cannot fork the database. Usually, there is one staging database to test modifications before applying them to the production database. It does not scale well and reviews are difficult. Applying the same change to the production database can be hazardous. Rolling back a change is non-trivial.

Update (2021-11)

Nautobot, a fork of NetBox, will soon address this point by using Dolt, an SQL database engine allowing you to clone, branch, and merge, like a Git repository. Dolt is compatible with MySQL clients. See “Nautobots, Roll Back!” for a preview of this feature.

Moreover, NetBox is not usually the single source of truth. It contains your hardware inventory, the IP addresses, and some topology information. However, this is not the place you put authorized SSH keys, syslog servers, or the BGP configuration. If you also use Ansible, this information ends in its inventory. The source of truth is therefore fragmented between several tools with different workflows. Since NetBox 2.7, you can append additional data with configuration contexts. This mitigates this point. The data is arranged hierarchically but the hierarchy cannot be customized.2 Nautobot can manage configuration contexts in a Git repository, while still allowing to use of the API to fetch them. You get some additional perks, thanks to Git, but the remaining data is still in a database with a different lifecycle.

Lastly, the schema used by NetBox may not fit your needs and you cannot tweak it. For example, you may have a rule to compute the IPv6 address from the IPv4 address for dual-stack interfaces. Such a relationship cannot be easily expressed and enforced in NetBox. When changing the IPv4 address, you may forget the IPv6 address. The source of truth should only contain the IPv4 address but you also want the IPv6 address in NetBox because this is your IPAM and you need it to update your DNS entries.

Why not Git?

There are some limitations when putting your source of truth in Git:

  1. If you want to expose a web interface to allow an external team to request a change, it is more difficult to do it with Git than with a database. Out-of-the-box, NetBox provides a nice web interface and a permission system. You can also write your own web interface and interact with NetBox through its API.
  2. YAML files are more difficult to query in different ways. For example, looking for a free IP address is complex if they are scattered in multiple places.

In my opinion, in most cases, you are better off putting the source of truth in Git instead of NetBox. You get a lot of perks by doing that and you can still use NetBox as a read-only view, usable by other tools. We do that with an Ansible module. In the remaining cases, Git could still fit the bill. Read-only access control can be done through submodules. Pull requests can restrict write access: a bot can check the changes only modify allowed files before auto-merging. This still requires some Git knowledge, but many teams are now comfortable using Git, thanks to its ubiquity.

  1. Wikimedia manages its infrastructure with Puppet. They publish everything on GitHub. Creative Commons uses Salt. They also publish everything on GitHub. Thanks to them for doing that! I wish I could provide more real-life examples. ↩︎

  2. Being able to customize the hierarchy is key to avoiding repetition in the data. For example, if switches are paired together, some data should be attached to them as a group and not duplicated on each of them. Tags can be used to partially work around this issue but you lose the hierarchical aspect. ↩︎

15 November, 2021 07:53AM by Vincent Bernat

November 14, 2021

Russ Allbery

Review: The Last Graduate

Review: The Last Graduate, by Naomi Novik

Series: The Scholomance #2
Publisher: Del Rey
Copyright: 2021
ISBN: 0-593-12887-7
Format: Kindle
Pages: 388

This is a direct sequel to A Deadly Education, by which I mean it starts in the same minute at which A Deadly Education ends (and let me say how grateful I am for a sequel that doesn't drop days, months, or years between books). You do not want to read this series out of order.

This book is also very difficult to review without spoiling either it or the previous book, so please bear with me if I'm elliptical in my ravings. Because The Last Graduate is so good. So good, not only as a piece of writing, but as a combination of two of my favorite tropes in fiction, one of which I can't talk about because of spoilers. I adored this book in a way that is not entirely rational.

I will attempt a review below anyway, but if you liked the first book, just stop reading here and go read the second one. It's more of everything I loved in the first book except even better, it did some things I was expecting and some things I didn't expect at all, and it's just so ridiculously good. Just be aware that it has another final-line cliffhanger. The third book is coming in (hopefully) 2022.

Novik handles the cliffhanger at the end of the previous book beautifully, which is worth noting because there were so many ways in which it could have gone poorly. One of the best things about this series is Novik's skill at writing El's relationship with her mother, even though her mother has not appeared in the series so far. El argues with her mother's voice in her head, tells stories about her, wonders what her mother would think of her classmates (or in some cases knows exactly what her mother would think of her classmates), and sometimes makes the explicit decision to not be her mother. The relationship has the sort of messy complexity, shared history, and underlying respect that many people experience in life but that I've rarely seen portrayed this well in a fantasy novel.

Novik's presentation of that relationship works because El's voice is so strong. Within fifteen minutes of starting The Last Graduate, I was already muttering "I love this book" to myself, mostly because of how much I enjoy El's sarcastic, self-deprecating internal commentary. Novik strikes a balance between self-awareness, snark, humor, and real character growth that rivals Murderbot in its effectiveness of first-person perspective. It carries the story over a few weak points, such as a romance that didn't do much for me. Even when I didn't care about part of the plot, I cared about El's opinion of the plot and what it said about El's growing understanding of how to navigate the world.

A Deadly Education was scene and character establishment. El insisted on being herself and following her own morals and social rules, and through that found some allies. The Last Graduate gives El enough breathing space to make more nuanced decisions. This is the part of growing up where one realizes the limitations of one's knee-jerk reactions and innate moral judgment. It's also when it becomes hard to trust success that is entirely outside of one's previous experience. El was not a kid who had friends, so she doesn't know what to do with them now that she has them. She's barely able to convince herself that they are friends.

This is one of the two fictional tropes I mentioned, the one that I can talk about (at least briefly) without major spoilers. I have such a soft spot for stubborn, sarcastic, principled characters who refuse to play by the social rules that they think are required to make friends and who then find friends who like them for themselves. The moment when they start realizing this has happened and have no idea how to deal with it or how to be a person who has friends is one I will happily read over and over again. I enjoyed this book from the beginning, but there were two points when it grabbed my heart and I was all in. The first one is a huge spoiler that I can't talk about. The second was this paragraph:

[She] came round to me and put her arm around my waist and said under her breath, "Hey, she can be taught," with a tease in her voice that wobbled a little, and when I looked at her, her eyes were bright and wet, and I put my arm around her shoulders and hugged her.

You'll know it when you get there.

The Last Graduate also gives the characters other than El and Orion more room, which is part of how it handles the chosen one trope. It's been obvious since early in the first book that Orion is a sort of chosen one, and it becomes obvious to the reader that El may be as well. But Novik doesn't let the plot focus only on them; instead, she uses that trope to look at how alliances and collective action happen, and how no one can carry the weight by themselves. As El learns more and gains power, she also becomes less central to the plot resolution and has to learn how to be less self-reliant. This is not a book where one character is trained to save the world. It's a book where she manages to enlist the support of a kick-ass project manager and becomes part of a team.

Middle books of a trilogy are notoriously challenging. Often they're travel books: the first book sets up a problem, the second book moves the characters both physically and emotionally into a position to solve the problem, and the third book is the payoff. Travel books often sag. They can feel obligatory but somewhat boring, like a chore on the way to the third-book climax. The Last Graduate is not a travel book; it is, instead, a pivot book, which is my favorite form of trilogy. It's a book that rewrites the problem the first book set up, both resolving it and expanding the scope beyond what the reader had expected. This is immensely satisfying when done well, and Novik does it extremely well.

This is not a flawless book. There are some pacing hiccups, there is a romance angle that didn't work for me (although it does arrive at some character insights that I thought were spot on), and although I think Novik is doing something interesting with the trope, there is a lot of chosen one power escalation happening here. It's not the sort of book that I can claim is perfectly written. Instead, it's the sort of book that uses some of my favorite plot elements and emotional beats in such an effective way and with such a memorable character that I do not have it in me to care about any of the flaws. Your mileage may therefore vary, but I would be happy to read books like this until the end of time.

As mentioned above, The Last Graduate ends on another cliffhanger. This time I was worried that Novik might have ended the series there, since there's enough of an internal climax that I could imagine some literary fiction (which often seems allergic to endings) would have stopped here. Thankfully, Novik's web site says this is not the case. The next year is going to be a difficult wait.

The third book of this series is going to be incredibly difficult to write, and I hope Novik is up to the challenge she's made for herself. But she handled the transition between the first and second book so well, and this book is so good that I have a lot of hope. If the third book is half as good as I'm hoping, this is going to be one of my favorite fantasy series of all time.

Followed by an as-yet-untitled third book.

Rating: 10 out of 10

14 November, 2021 04:49AM

Ruby Team

Ruby transition and packaging hints #2 - Gemfile.lock created by bundler/setup with Ruby 2.7 preventing successful test with Ruby 3.0

We currently face an issue in all packages requiring bunlder/setup and trying to run the tests for Ruby 2.7 and 3.0. The problem is that the first tests will create Gemfile.lock (or gemfile/gemfile-*.lock) using Ruby 2.7 and the next run for Ruby 3 will report e.g.:

Failure/Error: require 'bundler/setup' # Set up gems listed in the Gemfile.

  Could not find racc-1.4.16 in any of the sources


/usr/share/rubygems-integration/all/gems/bundler-2.2.27/lib/bundler/definition.rb:496:in `materialize':
  Could not find rexml- in any of the sources (Bundler::GemNotFound)

Both bugs #996207 and #996302 are incarnations of this issue. The fix is as easy as making sure that the .lock files are removed before each run. This can be done in e.g. debian/ruby-tests.rake as very first task:

File.delete("Gemfile.lock") if File.exist?("Gemfile.lock")

In another case the .lock file is created by the tests in gemfiles/. While the first examples could actually be solved by gem2deb removing Gemfile.lock on its own, I’m not quite sure how to handle the last case using packaging tools.

The interesting part is that we will unlikely be confronted with this issue anytime soon again. It seems very specific to the Ruby 3.0 transition.


After talking to Antonio he added some code to gem2deb-test-runner to moving Gemfile.lock files out of the way. The tool already did this in an autopkgtest environment. In the upcoming 1.7 release it will do it in general and this will fix some more FTBFSes, e.g. #998497 and #996141 - originally reported against ruby-voight-kampff and ruby-bootsnap.

14 November, 2021 03:25AM by Daniel Leidert (

November 13, 2021

Ruby transition and packaging hints #1 - Adjusting Ruby version in commands

This is the first part of a series of short posts about issues that came up during the Ruby 3.0 transition and how to fix them. Hopefully more team members will join in and add their input.

During the Ruby 3.0 transition there are essentially two different Ruby versions with two different binaries available, /usr/bin/ruby2.7 and /usr/bin/ruby3.0, while /usr/bin/ruby points to the current default version, which is Ruby 2.7.

In some cases the tests shipped by the source packages will use shell commands to run scripts or Ruby code. It is imparative that in these cases the Ruby executable is not invoked by /usr/bin/ruby or ruby, because this will point to Ruby 2.7 only and fail if the tests are invoked with Ruby version 3.

The fix is to rely on RbConfig.ruby which will point to the absolute pathname of the ruby command for the current Ruby environment, e.g.

cmd = "#{RbConfig.ruby} ..."

This issue appeard for example in ruby-byebug and ruby-backports.

13 November, 2021 08:24PM by Daniel Leidert (

John Goerzen

Managing an External Display on Linux Shouldn’t Be This Hard

I first started using Linux and FreeBSD on laptops in the late 1990s. Back then, there were all sorts of hassles and problems, from hangs on suspend to pure failure to boot. I still worry a bit about suspend on unknown hardware, but by and large, the picture of Linux on laptops has dramatically improved over the last years. So much so that now I can complain about what would once have been a minor nit: dealing with external monitors.

I have a USB-C dock that provides both power and a Thunderbolt display output over the single cable to the laptop. I think I am similar to most people in wanting the following behavior from the laptop:

  • When the lid is closed, suspend if no external monitor is connected. If an external monitor is connected, shut off the built-in display and use the external one exclusively, but do not suspend.
  • Lock the screen automatically after a period of inactivity.
  • While locked, all connected displays should be powered down.
  • When an external display is connected, begin using it automatically.
  • When an external display is disconnected, stop using it. If the lid is closed when the external display is disconnected, go into suspend mode.

This sounds so simple. But somehow on Linux we’ve split up these things into a dozen tiny bits:

  • In /etc/systemd/logind.conf, there are settings about what to do when the lid is opened or closed.
  • Various desktop environments have overlapping settings covering the same things.
  • Then there are the display managers (gdm3, lightdm, etc) that also get in on the act, and frequently have DIFFERENT settings, set in different places, from the desktop environments. And, what’s more, they tend to be involved with locking these days.
  • Then there are screensavers (gnome-screensaver, xscreensaver, etc.) that also enter the picture, and also have settings in these areas.

Problems I’ve Seen

My problems don’t even begin with laptops, but with my desktop, running XFCE with xmonad and lightdm. My desktop is hooked to a display that has multiple inputs. This scenario (reproducible in both buster and bullseye) causes the display to be unusable until a reboot on the desktop:

  1. Be logged in and using the desktop
  2. Without locking the desktop screen, switch the display input to another device
  3. Keep the display input on another device long enough for the desktop screen to auto-lock
  4. At this point, it is impossible to re-awaken the desktop screen.

I should not here that the problems aren’t limited to Debian, but also extend to Ubuntu and various hardware.

Lightdm: which greeter?

At some point while troubleshooting things after upgrading my laptop to bullseye, I noticed that while both were running lightdm, I had different settings and a different appearance between the two. Upon further investigation, I realized that one hat slick-greeter and lightdm-settings installed, while the other had lightdm-gtk-greeter and lightdm-gtk-greeter-settings installed. Very strange.

XFCE: giving up

I eventually gave up on making lightdm work. No combination of settings or greeters would make things work reliably when changing screen configurations. I installed xscreensaver. It doesn’t hang, but it does sometimes take a few tries before it figures out what device to display on.

Worse, since updating from buster to bullseye, XFCE no longer automatically switches audio output when the docking station is plugged in, and there seems to be no easy way to convince Pulseaudio to do this.

X-Based Gnome and derivatives… sigh.

I also tried Gnome, Mate, and Cinnamon, and all of them had various inabilities to configure things to act the way I laid out above.

I’ve long not been a fan of Gnome’s way of hiding things from the user. It now has a Windows-like situation of three distinct settings programs (settings, tweaks, and dconf editor), which overlap in strange ways and interact with systemd in even stranger ways. Gnome 3 make it quite non-intuitive to make app icons from various programs work, and so forth.

Trying Wayland

I recently decided to set up an older laptop that I hadn’t used in awhile. After reading up on Wayland, I decided to try Gnome 3 under Wayland. Both the Debian and Arch wikis note that KDE is buggy on Wayland. Gnome is the only desktop environment that supports it then, unless I want to go with Sway. There’s some appeal to Sway to this xmonad user, but I’ve read of incompatibilities of Wayland software when Gnome’s not available, so I opted to try Gnome.

Well, it’s better. Not perfect, but better. After finding settings buried in a ton of different Settings and Tweaks boxes, I had it mostly working, except gdm3 would never shut off power to the external display. Eventually I found /etc/gdm3/greeter.dconf-defaults, and aadded:


Of course, these overlap with but are distinct from the same kinds of things in Gnome settings.


Running without Gnome seems like a challenge; Gnome is switching audio output appropriately, for instance. I am looking at some of the Gnome Shell tiling window manager extensions and hope that some of them may work for me.

13 November, 2021 04:21PM by John Goerzen

November 12, 2021

hackergotchi for Jonathan Dowland

Jonathan Dowland

Frictionless external backups with systemd

Here's a description of how my monthly external backups are managed at a technical level. I didn't realise I hadn't written this all down anywhere yet.




I plug in one of two (prepared) external hard drives into my headless NAS. The NAS contains my primary data backup. A job automatically decrypts the encrypted filesystem on the drive, mounts it and synchronises the copy of my backup data on the drive from that on the NAS. Whilst this is going on, the blinkstick LED on the NAS switches to a colour to signal "in progress". When it's done, the light changes to green to signal "done" and I can remove it. If something goes wrong, it turns red and I get mail.


I want a third-strand, off-site backup of my and my family's data in case of a disaster in our house. For it to be useful it has to be regular, so I needed to remove as much of the friction of performing the backup as possible.

I use two drives alternately so that I don't have all my eggs in one basket in the window when I bring one of them home and perform the backup.


As much as possible I lean on systemd and its ability to trigger actions based on events.

  1. External drive is plugged in. systemd instantiates a corresponding device unit, named dev-disk-by\x2duuid-aaaaaaaa\x2daaaa\x2daaaa\x2daaaa\x2daaaaaaaaaaaa.device, where aaaa… is the UUID of a partition on the device

  2. The backup job is a systemd service which has a WantedBy relationship on the device unit, so when the device appears, systemd starts the backup service.

  3. The backup service has Requires and After relationships on systemd-cryptsetup@extbackup.service, a service created by systemd's cryptsetup generator on start-up (but slightly customised, see below). The encrypted device is therefore unlocked.

  4. The backup service defines multiple start and stop commands with ExecStart and ExecStop. These are used to:

    1. set the blinkstick to the working colour (blue-ish)
    2. mount the now-decrypted filesystem
    3. get a lock on the backup repository (so nothing else writes to it) and synchronise the files
    4. unmount the filesystem
    5. set the blinkstick to the success colour (green)
  5. Finally, the systemd-cryptsetup@extbackup.service unit realises it is not required any more. It has been customised with StopWhenUnneeded=true1, so the encrypted filesystem is closed, ready for the drive to be removed.

  6. I notice the LED colour is green, remove the drive, and take it to its off-site home.

If anything goes wrong, all my custom systemd units have, as a matter of course,

OnFailure=status-email-user@%n.service blinkstick-fail.service

Preparing a new backup disk

This is mostly just a standard dm-crypt/cryptsetup/LUKS encrypted device, on top of a standard partition on the underlying disk, with a normal filesystem sitting on top: Basically, the most common way to encrypt a drive in Linux. See places like the cryptsetup docs for how to set something like that up. The key things here are

  • set up a decryption key file as well as (or instead of) a pass- phrase and store that somewhere on the filesystem of the NAS
  • back up the LUKS header, as the cryptsetup documentation stresses you should
  • make a note of the underlying partition UUID: it's needed for the WantedBy line in the backup service file. (look in /dev/disk/by-uuid before and after inserting it and see what was added)
  • label the filesystem on top of the encrypted device for convenience
  • set up a /etc/crypttab line with all the info needed to decrypt
  • set up a /etc/fstab line with all the info needed to mount (yes, really; see "Issues" below)

The backup service

Here's the backup service unit definition in its entirety:

 OnFailure=status-email-user@%n.service blinkstick-fail.service
 Requires=systemd-cryptsetup@extbackup.service backup.mount
 After=systemd-cryptsetup@extbackup.service backup.mount

 ExecStart=/usr/local/bin/blinkstick --index 1 --limit 10 --set-color 33c280
 ExecStart=/bin/mount /extbackup
 ExecStop=/bin/umount /extbackup
 ExecStop=/usr/local/bin/blinkstick --index 1 --limit 10 --set-color green


The dashes in the UUID in WantedBy= need to be encoded as \x2d and then the slashes from the path bit as dashes. Using dashes to encode slashes is possibly the single most frustrating systemd design decision.


Sadly (as detailed in Blinkenlights, part 2) there are 2 some frustrating limitations with trying to handle the mount (and unmount) of the filesystem in systemd-land, so instead, it's done using the traditional mount, umount and fstab.

If you can point out any improvements to this approach, please let me know!

  1. I customized mine a while ago by copying the generated service file to a static file, but nowadays I think you could do systemd edit systemd-cryptsetup@extbackup.service to add the StopWhenUnneeded to an override file and not need the rest.
  2. or at least were. It's been a while since I revisited this part.

12 November, 2021 10:18PM

hackergotchi for Adnan Hodzic

Adnan Hodzic

wp-k8s: WordPress on privately hosted Kubernetes cluster (Raspberry Pi 4 + Synology)

Blog post you’re reading right now is privately hosted on Raspberry PI 4 Kubernetes cluster with its data coming from NFS share and MariaDB on...

The post wp-k8s: WordPress on privately hosted Kubernetes cluster (Raspberry Pi 4 + Synology) appeared first on FoolControl: Phear the penguin.

12 November, 2021 11:09AM by Adnan Hodzic

November 10, 2021

hackergotchi for Jonathan Dowland

Jonathan Dowland

LEGO Princess Castle-books

The set

The set

My eldest daughter and I visited a LEGO shop recently and I wanted to buy her a gift. The catch was that we were going to be flying on an airplane the next day, so I wanted to find something with the lowest risk of losing parts on the plane.

We settled on Ariel, Belle, Cinderella and Tiana's Storybook Adventures which had a number of things going for it: It was reasonably priced at under £20, for the size of the set; it included four human minifigs (albeit in a sub-minifig size, some kind of munchkin size, but that did not seem to matter) and an assortment of animal accompaniments; but mostly, it folded up into a self-contained mock fairytail "book", and opened up into an enclosed "tray" play area, minimising the risk of losses on the flight.

The set in its resting state

The set in its resting state

Lego have done a few of these styles of sets, all Disney princess themed, and it looks like they have a few more on their product roadmap. The newer ones incorporate a locking mechanism with a cute Lego key. I love the concept and think it should be extended to other themes/properties. I can imagine a Lego Star Wars-themed version with a little Death Star trench in the middle, or even an original IP like Classic Space, or Medieval.

Exploring a DIY Lego book frame

Exploring a DIY Lego book frame

I really liked the Book device which reminded me of hollow books as a child. The cover and spine pieces are bespoke Lego bricks made for purpose, but I thought you could create something similar with generic parts. Holly and I had a go at the concept with what bricks we had to hand. It's definitely viable (and you could do a lot better with a wider selection of bricks / more skilled builders) and it will be fun to pick something to try and build on the spine.

10 November, 2021 09:48PM

hackergotchi for Neil Williams

Neil Williams

LetsEncrypt with Apache, Gunicorn and Debian Bullseye

This took me far too long to identify and debug, so I'm going to write it up here for my own reference and to possibly help others.


Upgrading an old codebase from Python2 on Buster to Python3 ready for Bullseye and from Django1 to Django2 (prepared for Django3). Everything is fine at this stage - the Django test server is happy with HTTP and it gives enough support to do the actual code changes to get to Python3. All well and good so far. The main purpose of this particular code was to support payments, so a chunk of the testing cannot be done without HTTPS, which is where things got awkward.

This particular service needs HTTPS using LetsEncrypt and Apache2. To support Django, I typically use Gunicorn.

All of this works with HTTP. Moving to HTTPS was easy to test using the default-ssl virtual host that comes with Apache2 in Debian. It's a static page and it worked well with https. The problems all start when trying to use this known-working HTTPS config with the other Apache virtual host to add support for the gunicorn proxy.

Apache reverse proxy AH00898 – Error during SSL Handshake with remote server


Now that I know why this happened, it's easier to see what was happening. At the time, I was swamped in a plethora of options and permutations between the Django HTTPS options and the Apache VirtualHost SSL and proxy commands. Going through all of those took up huge amounts of time, all in the wrong area.

In previous configurations using packages in Buster, gunicorn could simply run on http://localhost:8000 and Apache would proxy that as https.

In versions in Bullseye, this no longer works and it is that handover from https in Apache to http in the proxy is where it is failing.

Apache is using HTTPS because the LetsEncrypt certificates, created using dehydrated, are specified in the VirtualHost configuration. To fix the handshake error, the proxy server needs to know about the certificates created by dehydrated as well.

Gunicorn needs the certificates

The clue is in the gunicorn help:

--keyfile FILE        SSL key file [None]
--certfile FILE       SSL certificate file [None]

The final part of the puzzle is that the certificates created by dehydrated are in a private location:

drwx------ 2 root root /var/lib/dehydrated/certs/

To test gunicorn, this will mean using sudo but that's just a step towards running gunicorn as a systemd service (when access to the certs will not be a problem).

Starting gunicorn using these options shows the proxy now being available at https://localhost:8000 which is a subtle but very important change.

Environment=LOGLEVEL=DEBUG WORKERS=4 LOGFILE=/var/log/gunicorn/site.log
ExecStart=/usr/bin/gunicorn3 site.wsgi --log-level $LOGLEVEL --log-file $LOGFILE --workers $WORKERS \
--certfile /var/lib/dehydrated/certs/site/cert.pem \
--keyfile /var/lib/dehydrated/certs/site/privkey.pem

The specified locations are symbolic links created by dehydrated to cope with regular updates of the certificates using cron.

10 November, 2021 04:03PM by Neil Williams

November 09, 2021

hackergotchi for Benjamin Mako Hill

Benjamin Mako Hill

The Hidden Costs of Requiring Accounts

Should online communities require people to create accounts before participating?

This question has been a source of disagreement among people who start or manage online communities for decades. Requiring accounts makes some sense since users contributing without accounts are a common source of vandalism, harassment, and low quality content. In theory, creating an account can deter these kinds of attacks while still making it pretty quick and easy for newcomers to join. Also, an account requirement seems unlikely to affect contributors who already have accounts and are typically the source of most valuable contributions. Creating accounts might even help community members build deeper relationships and commitments to the group in ways that lead them to stick around longer and contribute more.

In a new paper published in Communication Research, I worked with Aaron Shaw provide an answer. We analyze data from “natural experiments” that occurred when 136 wikis on started requiring user accounts. Although we find strong evidence that the account requirements deterred low quality contributions, this came at a substantial (and usually hidden) cost: a much larger decrease in high quality contributions. Surprisingly, the cost includes “lost” contributions from community members who had accounts already, but whose activity appears to have been catalyzed by the (often low quality) contributions from those without accounts.

A version of this post was first posted on the Community Data Science blog.

The full citation for the paper is: Hill, Benjamin Mako, and Aaron Shaw. 2020. “The Hidden Costs of Requiring Accounts: Quasi-Experimental Evidence from Peer Production.” Communication Research, 48 (6): 771–95.

If you do not have access to the paywalled journal, please check out this pre-print or get in touch with us. We have also released replication materials for the paper, including all the data and code used to conduct the analysis and compile the paper itself.

09 November, 2021 07:55PM by Benjamin Mako Hill

Antoine Beaupré

The Neo-Colonial Internet

I grew up with the Internet and its ethics and politics have always been important in my life. But I have also been involved at other levels, against police brutality, for Food, Not Bombs, worker autonomy, software freedom, etc. For a long time, that all seemed coherent.

But the more I look at the modern Internet -- and the mega-corporations that control it -- and the less confidence I have in my original political analysis of the liberating potential of technology. I have come to believe that most of our technological development is harmful to the large majority of the population of the planet, and of course the rest of the biosphere. And now I feel this is not a new problem.

This is because the Internet is a neo-colonial device, and has been from the start. Let me explain.

What is Neo-Colonialism?

The term "neo-colonialism" was coined by Kwame Nkrumah, first president of Ghana. In Neo-Colonialism, the Last Stage of Imperialism (1965), he wrote:

In place of colonialism, as the main instrument of imperialism, we have today neo-colonialism ... [which] like colonialism, is an attempt to export the social conflicts of the capitalist countries. ...

The result of neo-colonialism is that foreign capital is used for the exploitation rather than for the development of the less developed parts of the world. Investment, under neo-colonialism, increases, rather than decreases, the gap between the rich and the poor countries of the world.

So basically, if colonialism is Europeans bringing genocide, war, and its religion to the Africa, Asia, and the Americas, neo-colonialism is the Americans (note the "n") bringing capitalism to the world.

Before we see how this applies to the Internet, we must therefore make a detour into US history. This matters, because anyone would be hard-pressed to decouple neo-colonialism from the empire under which it evolves, and here we can only name the United States of America.

US Declaration of Independence

Let's start with the United States declaration of independence (1776). Many Americans may roll their eyes at this, possibly because that declaration is not actually part of the US constitution and therefore may have questionable legal standing. Still, it was obviously a driving philosophical force in the founding of the nation. As its author, Thomas Jefferson, stated:

it was intended to be an expression of the American mind, and to give to that expression the proper tone and spirit called for by the occasion

In that aging document, we find the following pearl:

We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.

As a founding document, the Declaration still has an impact in the sense that the above quote has been called an:

"immortal declaration", and "perhaps [the] single phrase" of the American Revolutionary period with the greatest "continuing importance." (Wikipedia)

Let's read that "immortal declaration" again: "all men are created equal". "Men", in that context, is limited to a certain number of people, namely "property-owning or tax-paying white males, or about 6% of the population". Back when this was written, women didn't have the right to vote, and slavery was legal. Jefferson himself owned hundreds of slaves.

The declaration was aimed at the King and was a list of grievances. A concern of the colonists was that the King:

has excited domestic insurrections amongst us, and has endeavoured to bring on the inhabitants of our frontiers, the merciless Indian Savages whose known rule of warfare, is an undistinguished destruction of all ages, sexes and conditions.

This is a clear mark of the frontier myth which paved the way for the US to exterminate and colonize the territory some now call the United States of America.

The declaration of independence is obviously a colonial document, having being written by colonists. None of this is particularly surprising, historically, but I figured it serves as a good reminder of where the Internet is coming from, since it was born in the US.

A Declaration of the Independence of Cyberspace

Two hundred and twenty years later, in 1996, John Perry Barlow wrote a declaration of independence of cyberspace. At this point, (almost) everyone has a right to vote (including women), slavery was abolished (although some argue it still exists in the form of the prison system); the US has made tremendous progress. Surely this text will have aged better than the previous declaration it is obviously derived from. Let's see how it reads today and how it maps to how the Internet is actually built now.

Borders of Independence

One of the key ideas that Barlow brings up is that "cyberspace does not lie within your borders". In that sense, cyberspace is the final frontier: having failed to colonize the moon, Americans turn inwards, deeper into technology, but still in the frontier ideology. And indeed, Barlow is one of the co-founder of the Electronic Frontier Foundation (the beloved EFF), founded six years prior.

But there are other problems with this idea. As Wikipedia quotes:

The declaration has been criticized for internal inconsistencies.[9] The declaration's assertion that 'cyberspace' is a place removed from the physical world has also been challenged by people who point to the fact that the Internet is always linked to its underlying geography.[10]

And indeed, the Internet is definitely a physical object. First controlled and severely restricted by "telcos" like AT&T, it was somewhat "liberated" from that monopoly in 1982 when an anti-trust lawsuit broke up the monopoly, a key historical event that, one could argue, made the Internet possible.

(From there on, "backbone" providers could start competing and emerge, and eventually coalesce into new monopolies: Google has a monopoly on search and advertisement, Facebook on communications for a few generations, Amazon on storage and computing, Microsoft on hardware, etc. Even AT&T is now pretty much as consolidated as it was before.)

The point is: all those companies have gigantic data centers and intercontinental cables. And those are definitely prioritizing the western world, the heart of the empire. Take for example Google's latest 3,900 mile undersea cable: it does not connect Argentina to South Africa or New Zealand, it connects the US to UK and Spain. Hardly a revolutionary prospect.

Private Internet

But back to the Declaration:

Do not think that you can build it, as though it were a public construction project. You cannot. It is an act of nature and it grows itself through our collective actions.

In Barlow's mind, the "public" is bad, and private is good, natural. Or, in other words, a "public construction project" is unnatural. And indeed, the modern "nature" of development is private: most of the Internet is now privately owned and operated.

I must admit that, as an anarchist, I loved that sentence when I read it. I was rooting for "us", the underdogs, the revolutionaries. And, in a way, I still do: I am on the board of Koumbit and work for a non-profit that has pivoted towards censorship and surveillance evasion. Yet I cannot help but think that, as a whole, we have failed to establish that independence and put too much trust in private companies. It is obvious in retrospect, but it was not, 30 years ago.

Now, the infrastructure of the Internet has zero accountability to traditional political entities supposedly representing the people, or even its users. The situation is actually worse than when the US was founded (e.g. "6% of the population can vote"), because the owners of the tech giants are only a handful of people who can override any decision. There's only one Amazon CEO, he's called Jeff Bezos, and he has total control. (Update: Bezos actually ceded the CEO role to Andy Jassy, AWS and Amazon music founder, while remaining executive chairman. I would argue that, as the founder and the richest man on earth, he still has strong control over Amazon.)

Social Contract

Here's another claim of the Declaration:

We are forming our own Social Contract.

I remember the early days, back when "netiquette" was a word, it did feel we had some sort of a contract. Not written in standards of course -- or barely (see RFC1855) -- but as a tacit agreement. How wrong we were. One just needs to look at Facebook to see how problematic that idea is on a global network.

Facebook is the quintessential "hacker" ideology put in practice. Mark Zuckerberg explicitly refused to be "arbiter of truth" which implicitly means he will let lies take over its platforms.

He also sees Facebook as place where everyone is equal, something that echoes the Declaration:

We are creating a world that all may enter without privilege or prejudice accorded by race, economic power, military force, or station of birth.

(We note, in passing, the omission of gender in that list, also mirroring the infamous "All men are created equal" claim of the US declaration.)

As the Wall Street Journal's (WSJ) Facebook files later shown, both of those "contracts" have serious limitations inside Facebook. There are VIPs who systematically bypass moderation systems including fascists and rapists. Drug cartels and human traffickers thrive on the platform. Even when Zuckerberg himself tried to tame the platform -- to get people vaccinated or to make it healthier -- he failed: "vaxxer" conspiracies multiplied and Facebook got angrier.

This is because the "social contract" behind Facebook and those large companies is a lie: their concern is profit and that means advertising, "engagement" with the platform, which causes increased anxiety and depression in teens, for example.

Facebook's response to this is that they are working really hard on moderation. But the truth is that even that system is severely skewed. The WSJ showed that Facebook has translators for only 50 languages. It's a surprisingly hard to count human languages but estimates range the number of distinct languages between 2500 and 7000. So while 50 languages seems big at first, it's actually a tiny fraction of the human population using Facebook. Taking the first 50 of the Wikipedia list of languages by native speakers we omit languages like Dutch (52), Greek (74), and Hungarian (78), and that's just a few random nations picks from Europe.

As an example, Facebook has trouble moderating even a major language like Arabic. It censored content from legitimate Arab news sources when they mentioned the word al-Aqsa because Facebook associates it with the al-Aqsa Martyrs' Brigades when they were talking about the Al-Aqsa Mosque... This bias against Arabs also shows how Facebook reproduces the American colonizer politics.

The WSJ also pointed out that Facebook spends only 13% of its moderation efforts outside of the US, even if that represents 90% of its users. Facebook spends three more times moderating on "brand safety", which shows its priority is not the safety of its users, but of the advertisers.

Military Internet

Sergey Brin and Larry Page are the Lewis and Clark of our generation. Just like the latter were sent by Jefferson (the same) to declare sovereignty over the entire US west coast, Google declared sovereignty over all human knowledge, with its mission statement "to organize the world's information and make it universally accessible and useful". (It should be noted that Page somewhat questioned that mission but only because it was not ambitious enough, Google having "outgrown" it.)

The Lewis and Clark expedition, just like Google, had a scientific pretext, because that is what you do to colonize a world, presumably. Yet both men were military and had to receive scientific training before they left. The Corps of Discovery was made up of a few dozen enlisted men and a dozen civilians, including York an African American slave owned by Clark and sold after the expedition, with his final fate lost in history.

And just like Lewis and Clark, Google has a strong military component. For example, Google Earth was not originally built at Google but is the acquisition of a company called Keyhole which had ties with the CIA. Those ties were brought inside Google during the acquisition. Google's increasing investment inside the military-industrial complex eventually led Google to workers organizing a revolt although it is currently unclear to me how much Google is involved in the military apparatus. Other companies, obviously, do not have such reserve, with Microsoft, Amazon, and plenty of others happily bidding on military contracts all the time.

Spreading the Internet

I am obviously not the first to identify colonial structures in the Internet. In an article titled The Internet as an Extension of Colonialism, Heather McDonald correctly identifies fundamental problems with the "development" of new "markets" of Internet "consumers", primarily arguing that it creates a digital divide which creates a "lack of agency and individual freedom":

Many African people have gained access to these technologies but not the freedom to develop content such as web pages or social media platforms in their own way. Digital natives have much more power and therefore use this to create their own space with their own norms, shaping their online world according to their own outlook.

But the digital divide is certainly not the worst problem we have to deal with on the Internet today. Going back to the Declaration, we originally believed we were creating an entirely new world:

This governance will arise according to the conditions of our world, not yours. Our world is different.

How I dearly wished that was true. Unfortunately, the Internet is not that different from the offline world. Or, to be more accurate, the values we have embedded in the Internet, particularly of free speech absolutism, sexism, corporatism, and exploitation, are now exploding outside of the Internet, into the "real" world.

The Internet was built with free software which, fundamentally, was based on quasi-volunteer labour of an elite force of white men with obviously too much time on their hands (and also: no children). The mythical writing of GCC and Emacs by Richard Stallman is a good example of this, but the entirety of the Internet now seems to be running on random bits and pieces built by hit-and-run programmers working on their copious free time. Whenever any of those fails, it can compromise or bring down entire systems. (Heck, I wrote this article on my day off...)

This model of what is fundamentally "cheap labour" is spreading out from the Internet. Delivery workers are being exploited to the bone by apps like Uber -- although it should be noted that workers organise and fight back. Amazon workers are similarly exploited beyond belief, forbidden to take breaks until they pee in bottles, with ambulances nearby to carry out the bodies. During peak of the pandemic, workers were being dangerously exposed to the virus in warehouses. All this while Amazon is basically taking over the entire economy.

The Declaration culminates with this prophecy:

We will spread ourselves across the Planet so that no one can arrest our thoughts.

This prediction, which first felt revolutionary, is now chilling.

Colonial Internet

The Internet is, if not neo-colonial, plain colonial. The US colonies had cotton fields and slaves, we have disposable cell phones and Foxconn workers. Canada has its cultural genocide, Facebook has his own genocides in Ethiopia, Myanmar, and mob violence in India. Apple is at least implicitly accepting the Uyghur genocide. And just like the slaves of the colony, those atrocities are what makes the empire run.

The Declaration actually ends like this, a quote which I have in my fortune cookies file:

We will create a civilization of the Mind in Cyberspace. May it be more humane and fair than the world your governments have made before.

That is still inspiring to me. But if we want to make "cyberspace" more humane, we need to decolonize it. Work on cyberpeace instead of cyberwar. Establish clear code of conduct, discuss ethics, and question your own privileges, biases, and culture. For me the first step in decolonizing my own mind is writing this article. Breaking up tech monopolies might be an important step, but it won't be enough: we have to do a culture shift as well, and that's the hard part.

Appendix: an apology to Barlow

I kind of feel bad going through Barlow's declaration like this, point by point. It is somewhat unfair, especially since Barlow passed away a few years ago and cannot mount a response (even humbly assuming that he might read this). But then again, he himself recognized he was a bit too "optimistic" in 2009, saying: "we all get older and smarter":

I'm an optimist. In order to be libertarian, you have to be an optimist. You have to have a benign view of human nature, to believe that human beings left to their own devices are basically good. But I'm not so sure about human institutions, and I think the real point of argument here is whether or not large corporations are human institutions or some other entity we need to be thinking about curtailing. Most libertarians are worried about government but not worried about business. I think we need to be worrying about business in exactly the same way we are worrying about government.

And, in a sense, it was a little naive to expect Barlow to not be a colonist. Barlow is, among many things, a cattle rancher who grew up on a colonial ranch in Wyoming. The ranch was founded in 1907 by his great uncle, 17 years after the state joined the Union, and only a generation or two after the Powder River War (1866-1868) and Black Hills War (1876-1877) during which the US took over lands occupied by Lakota, Cheyenne, Arapaho, and other native American nations, in some of the last major First Nations Wars.

Appendix: further reading

There is another article that almost has the same title as this one: Facebook and the New Colonialism. (Interestingly, the <title> tag on the article is actually "Facebook the Colonial Empire" which I also find appropriate.) The article is worth reading in full, but I loved this quote so much that I couldn't resist reproducing it here:

Representations of colonialism have long been present in digital landscapes. (“Even Super Mario Brothers,” the video game designer Steven Fox told me last year. “You run through the landscape, stomp on everything, and raise your flag at the end.”) But web-based colonialism is not an abstraction. The online forces that shape a new kind of imperialism go beyond Facebook.

It goes on:

Consider, for example, digitization projects that focus primarily on English-language literature. If the web is meant to be humanity’s new Library of Alexandria, a living repository for all of humanity’s knowledge, this is a problem. So is the fact that the vast majority of Wikipedia pages are about a relatively tiny square of the planet. For instance, 14 percent of the world’s population lives in Africa, but less than 3 percent of the world’s geotagged Wikipedia articles originate there, according to a 2014 Oxford Internet Institute report.

And they introduce another definition of Neo-colonialism, while warning about abusing the word like I am sort of doing here:

“I’m loath to toss around words like colonialism but it’s hard to ignore the family resemblances and recognizable DNA, to wit,” said Deepika Bahri, an English professor at Emory University who focuses on postcolonial studies. In an email, Bahri summed up those similarities in list form:

  1. ride in like the savior
  2. bandy about words like equality, democracy, basic rights
  3. mask the long-term profit motive (see 2 above)
  4. justify the logic of partial dissemination as better than nothing
  5. partner with local elites and vested interests
  6. accuse the critics of ingratitude

“In the end,” she told me, “if it isn’t a duck, it shouldn’t quack like a duck.”

Another good read is the classic Code and other laws of cyberspace (1999, free PDF) which is also critical of Barlow's Declaration. In "Code is law", Lawrence Lessig argues that:

computer code (or "West Coast Code", referring to Silicon Valley) regulates conduct in much the same way that legal code (or "East Coast Code", referring to Washington, D.C.) does (Wikipedia)

And now it feels like the west coast has won over the east coast, or maybe it recolonized it. In any case, Internet now christens emperors.

09 November, 2021 07:09PM

hackergotchi for Joachim Breitner

Joachim Breitner

How to audit an Internet Computer canister

I was recently called upon by Origyn to audit the source code of some of their Internet Computer canisters (“canisters” are services or smart contracts on the Internet Computer), which were written in the Motoko programming language. Both the application model of the Internet Computer as well as Motoko bring with them their own particular pitfalls and possible sources for bugs. So given that I was involved in the creation of both, they reached out to me.

In the course of that audit work I collected a list of things to watch out for, and general advice around them. Origyn generously allowed me to share that list here, in the hope that it will be helpful to the wider community.

Inter-canister calls

The Internet Computer system provides inter-canister communication that follows the actor model: Inter-canister calls are implemented via two asynchronous messages, one to initiate the call, and one to return the response. Canisters process messages atomically (and roll back upon certain error conditions), but not complete calls. This makes programming with inter-canister calls error-prone. Possible common sources for bugs, vulnerabilities or simply unexpected behavior are:

  • Reading global state before issuing an inter-canister call, and assuming it    to still hold when the call comes back.

  • Changing global state before issuing an inter-canister call, changing it again    in the response handler, but assuming nothing else changes the state in    between (reentrancy).

  • Changing global state before issuing an inter-canister call, and not    handling failures correctly, e.g. when the code handling the callback rolls    backs.

If you find such pattern in your code, you should analyze if a malicious party can trigger them, and assess the severity that effect

These issues apply to all canisters, and are not Motoko-specific.


Even in the absence of inter-canister calls the behavior of rollbacks can be surprising. In particular, rejecting (i.e. throw) does not rollback state changes done before, while trapping (e.g. Debug.trap, assert …, out of cycle conditions) does.

Therefore, one should check all public update call entry points for unwanted state changes or unwanted rollbacks. In particular, look for methods (or rather, messages, i.e. the code between commit points) where a state change is followed by a throw.

This issues apply to all canisters, and are not Motoko-specific, although other CDKs may not turn exceptions into rejects (which don’t roll back).

Talking to malicious canisters

Talking to untrustworthy canisters can be risky, for the following (likely incomplete) reasons:

  • The other canister can withhold a response. Although the bidirectional   messaging paradigm of the Internet Computer was designed to guarantee a   response eventually, the other party can busy-loop for as long as they are   willing to pay for before responding. Worse, there are ways to deadlock a   canister.

  • The other canister can respond with invalidly encoded Candid. This will cause   a Motoko-implemented canister to trap in the reply handler, with no easy way   to recover. Other CDKs may give you better ways to handle invalid Candid, but even then you will have to worry about Candid cycle bombs that will cause your reply handler to trap. 

Many canisters do not even do inter-canister calls, or only call other trustwothy canisters. For the others, the impact of this needs to be carefully assessed.

Canister upgrade: overview

For most services it is crucial that canisters can be upgraded reliably. This can be broken down into the following aspects:

  1. Can the canister be upgraded at all?
  2. Will the canister upgrade retain all data?
  3. Can the canister be upgraded promptly?
  4. Is three a recovery plan for when upgrading is not possible?

Canister upgradeability

A canister that traps, for whatever reason, in its canister_preupgrade system method is no longer upgradeable. This is a major risk. The canister_preupgrade method of a Motoko canister consists of the developer-written code in any system func preupgrade() block, followed by the system-generated code that serializes the content of any stable var into a binary format, and then copies that to stable memory.

Since the Motoko-internal serialization code will first serialize into a scratch space in the main heap, and then copy that to stable memory, canisters with more than 2GB of live data will likely be unupgradeable. But this is unlikely the first limit:

The system imposes an instruction limit on upgrading a canister (spanning both canister_preupgrade and canister_postupgrade). This limit is a subnet configuration value, and sepearate (and likely higher) than the normal per-message limit, and not easily determined. If the canister’s live data becomes too large to be serialized within this limit, the canister becomes non-upgradeable.

This risk cannot be eliminated completely, as long as Motoko and Stable Variables are used. It can be mitigated by appropriate load testing:

Install a canister, fill it up with live data, and exercise the upgrade. If this succeeds with a live data set exceeding the expected amount of data by a margin, this risk is probably acceptable. Bonus points for adding functionality that will prevent the canister’s live data to increase above a certain size.

If this testing is to be done on a local replica, extra care needs to be taken to make sure the local replica actually performs instruction counting and has the same resource limits as the production subnet.

An alternative mitigation is to avoid canister_pre_upgrade as much as possible. This means no use of stable var (or restricted to small, fixed-size configuration data). All other data could be

  • mirrored off the canister (possibly off chain), and manually re-hydrated after an upgrade.
  • stored in stable memory manually, during each update call, using the ExperimentalStableMemory API. While this matches what high-assurance Rust canisters (e.g. the Internet Identity) do, This requires manual binary encoding of the data, and is marked experimental, so I cannot recommend this at the moment.
  • not put into a Motoko canister until Motoko has a scalable solution for stable variable (for example keeping them in stable memory permanently, with smart caching in main memory, and thus obliterating the need for pre-upgrade code.)

Data retention on upgrades

Obviously, all live data ought to be retained during upgrades. Motoko automatically ensures this for stable var data. But often canisters want to work with their data in a different format (e.g. in objects that are not shared and thus cannot be put in stable vars, such as HashMap or Buffer objects), and thus may follow following idiom:

stable var fooStable = …;
var foo = fooFromStable(fooStable);
system func preupgrade() { fooStable := fooToStable(foo); })
system func postupgrade() { fooStable := (empty); })

In this case, it is important to check that

  • All non-stable global vars, or global lets with mutable values, have a stable companion.
  • The assignments to foo and fooStable are not forgotten.
  • The fooToStable and fooFromStable form bijections.

An example would be HashMaps stored as arrays via Iter.toArray(….entries()) and HashMap.fromIter(….vals()).

It is worth pointiong out that a code view will only look at a single version of the code, but cannot check whether code changes will preserve data on upgrade. This can easily go wrong if the names and types of stable variables are changed in incompatible way. The upgrade may fail loudly in this cases, but in bad cases, the upgrade may even succeed, losing data along the way. This risk needs to be mitigated by thorough testing, and possibly backups (see below).

Prompt upgrades

Motoko and Rust canisters cannot be safely upgraded when they are still waiting for responses to inter-canister calls (the callback would eventually reach the new instance, and because of infelicities of the IC’s System API, could possibly call arbitrary internal functions). Therefore, the canister needs to be stopped before upgrading, and started again. If the inter-canister calls take a long time, this mean that upgrading may take a long time, which may be undesirable. Again, this risk is reduced if all calls are made to trustworthy canisters, and elevated when possibly untrustworthy canisters are called, directly or indirectly.

Backup and recovery

Because of the above risk around upgrades it is advisable to have a disaster recovery strategy. This could involve off-chain backups of all relevant data, so that it is possible to reinstall (not upgrade) the canister and re-upload all data.

Note that reinstall has the same issue as upgrade described above in “prompt upgrades”: It ought to be stopped first to be safe.

Note that the instruction limit for messages, as well as the message size limit, limit the amount of data returned. If the canister needs to hold more data than that, the backup query method might have to return chunks or deltas, with all the extra complexity that entails, e.g. state changes between downloading chunks.

If large data load testing is performed (as Irecommend anyways to test upgradeability), one can test whether the backup query method works within the resource limits.

Time is not strictly monotonic

The timestamps for “current time” that the Internet Computer provides to its canisters is guaranteed to be monotonic, but not strictly monotonic. It can return the same values, even in the same messages, as long as they are processed in the same block. It should therefore not be used to detect “happens-before” relations.

Instead of using and comparing time stamps to check whether Y has been performed after X happened last, introduce an explicit var y_done : Bool state, which is set to False by X and then to True by Y. When things become more complex, it will be easier to model that state via an enumeration with speaking tag names, and update this “state machine” along the way.

Another solution to this problem is to introduce a var v : Nat counter that you bump in every update method, and after each await. Now v is your canister’s state counter, and can be used like a timestamp in many ways.

While we are talking about time: The system time (typically) changes across an await. So if you do let now = and then await, the value in now may no longer be what you want.

Wrapping arithmetic

The Nat64 data type, and the other fixed-width numeric types provide opt-in wrapping arithmetic (e.g. +%, fromIntWrap). Unless explicitly required by the current application, this should be avoided, as usually a too large or negatie value is a serious, unrecoverable logic error, and trapping is the best one can do.

Cycle balance drain attacks

Because of the IC’s “canister pays” model, all canisters are prone to DoS attacks by draining their cycle balance, and this risk needs to be taken into account.

The most elementary mitigation strategy is to monitor the cycle balance of canisters and keep it far from the (configurable) freezing threshold.

On the raw IC-level, further mitigation strategies are possible:

  • If all update calls are authenticated, perform this authentication as quickly as possible, possibly before decoding the caller’s argument. This way, a cycle drain attack by an unauthenticated attacker is less effective (but still possible).

  • Additionally, implementing the canister_inspect_message system method allows the above checks to be performed before the message even is accepted by the Internet Computer. But it does not defend against inter-canister messages and is therefore not a complete solution.

  • If an attack from an authenticated user (e.g. a stakeholder) is to be expected, the above methods are not effective, and an effective defense might require relatively involved additional program logic (e.g. per-caller statistics) to detect such an attack, and react (e.g. rate-limiting).

  • Such defenses are pointless if there is only a single method where they do not apply (e.g. an unauthenticated user registration method). If the application is inherently attackable this way, it is not worth the bother to raise defenses for other methods.

   Related: A justification why the Internet Identity does not use canister_inspect_message)

A motoko-implemented canister currently cannot perform most of these defenses: Argument decoding happens unconditionally before any user code that may reject a message based on the caller, and canister_inspect_message is not supported. Furthermore, Candid decoding is not very cycle defensive, and one should assume that it is possible to construct Candid messages that require many instructions to decode, even for “simple” argument type signatures.

The conclusion for the audited canisters is to rely on monitoring to keep the cycle blance up, even during an attack, if the expense can be born, and maybe pray for IC-level DoS protections to kick in.

Large data attacks

Another DoS attack vector exists if public methods allow untrustworthy users to send data of unlimited size that is persisted in the canister memory. Because of the translation of async-await code into multiple message handlers, this applies not only to data that is obviously stored in global state, but also local data that is live across an await point.

The effectiveness of such attacks is limited by the Internet Computer’s message size limit, which is in the order of a few megabytes, but many of those also add up.

The problem becomes much worse if a method has an argument type that allows a Candid space bomb: It is possible to encode very large vectors with all values null in Candid, so if any method has an argument of type [Null] or [?t], a small message will expand to a large value in the Motoko heap.

Other types to watch out:

  • Nat and Int: This is an unbounded natural number, and thus can be arbitrarily large. The Motoko representation will however not be much larger than the Candid encoding (so this does not qualify as a space bomb).

   It is still advisable to check if the number is reasonable in size before storing it or doing an await. For example, when it denotes an index in an array, throw early if it exceeds the size of the array; if it denotes a token amount to transfer, check it against the available balance, if it denotes time, check it against reasonable bounds.

  • Principal: A Principal is effectively a Blob. The Interface   specification says that principals are at most 29   bytes in   length, but the Motoko Candid   decoder   does not check that currently (fixed in the next version of Motoko). Until then, a Principal passed as an   argument can be large (the principal in msg.caller is system-provided and   thus safe). If you cannot wait for the fix to reach you, manually check the size of the princial (via   Principal.toBlob) before doing the await.

Shadowing of msg or caller

Don’t use the same name for the “message context” of the enclosing actor and the methods of the canister: It is dangerous to write shared(msg) actor, because now msg is in scope across all public methods. As long as these also use public shared(msg) func …, and thus shadow the outer msg, it is correct, but it if one accidentially omits or mis-types the msg, no compiler error would occur, but suddenly msg.caller would now be the original controller, likely defeating an important authorization step.

Instead, write shared(init_msg) actor or shared({caller = controller}) actor to avoid using msg.


If you write a “serious” canister, whether in Motoko or not, it is worth to go through the code and watch out for these patterns. Or better, have someone else review your code, as it may be hard to spot issues in your own code.

Unfortunately, such a list is never complete, and there are surely more ways to screw up your code – in addition to all the non-IC-specific ways in which code can be wrong. Still, things get done out there, so best of luck!

09 November, 2021 05:34PM by Joachim Breitner (