PCID support on Illumos

I joined Joyent at the start of the year while Meltdown was breaking news; it was certainly an "interesting" time to start a new job. Luckily by my first week, Alex and Robert had pretty much figured out how the changes should look and made good inroads on the implementation. So I began working with Alex on his KPTI trampoline code (mainly involving breaking it with my old friend KMDB). I also picked up the PCID work which I describe here.

As you can probably tell from Alex's blog post, Meltdown is unusual for a security issue: aside from the usual operational pains of any security patch, the fix itself involved some pretty significant code changes to the low-level core of the kernel.

There's also another potential impact, and that's performance. While the actual overhead is heavily workload-dependent - and some of the reports out there seem pretty alarmist - having to switch page tables (i.e. reloading %cr3) on every kernel entry and exit has a non-trivial impact on system call cost. Nor can we keep the kernel state in the TLB. Previously, we would set PT_GLOBAL on kernel mappings so they're not flushed across a %cr3 reload, but as the CPU would happily use these TLB entries to speculate into the kernel, we must flush them.

The good news is that there's a CPU feature on reasonably recent Intel CPUs called Process Context IDs. This lets you load the lower bits of %cr3 with a small integer value. This ID is used as a tag in any TLB lookups or fills. This feature is somewhat similar to ASIDs seen on other architectures, with one notable difference. The PCID applies to TLB state implicitly, that is, there's no way to say "load from memory using this ID" in ddi_copyin() and the like.

One way of using PCIDs is to associate an ID with a struct as: that is, each time we load a process's address space into the HAT, we will use a specific PCID for it, and avoid having to flush the mappings for the previous processes. This isn't really a viable option for Illumos, though: if nothing else we suspect that the additional shootdown flushes needed (since we'd maintain TLB entries even after switching away from a process's struct as) would counteract any performance gain.

Instead we define two fixed PCID values. PCID_KERNEL, defined as 0 mainly to keep the boot process simple, is used for the kernel %cr3. Thus, all TLB loads while in the kernel will be tagged with this value. PCID_USER is used when in userspace. Now, when we switch %cr3 on kernel entry or exit, we can do a non-flushing load. This lets us keep both the kernel and the userspace mappings around across kernel/user transitions.

When we do need to invalidate TLB entries, though, things are now slightly more complicated. We are by definition in the kernel (and hence using PCID_KERNEL), but we have to account for memory addresses below USERLIMIT. In this case, we have to flush both PCID_USER (for anything that ran in user mode) and PCID_KERNEL (for any accesses the kernel may have made such as with ddi_copyin()). hat_switch() is also a little more complicated. As the %cr3 load there is non-invalidating, we have to explicitly flush everything if we're switching away from a non-kas HAT, to clear out now-stale user-space mappings. (Note that this has always been done eagerly on Illumos, even when switching to a kas HAT).

The INVPCID instruction is what enable us to flush PCID_USER while in the kernel. Unfortunately, support for INVPCID came quite some time after PCID itself. On such systems, we have to emulate, and the only way Intel gives us to do this is to load the ID into %cr3 before invalidating the TLB entries. We don't want to "pollute" PCID_USER with any extraneous kernel mappings, so this means we need to switch to the user page tables when loading PCID_USER. But, remember, KPTI requires us not to have kernel text (or stack!) mapped into these page tables. So we have to first make sure we're in the trampoline text before doing the invalidations: see tr_mmu_flush_user_range.

For those interested, Alex posted a draft webrev of the PCID changes.


Converting HTML mail via procmail

All the procmail recipes I found on a quick search failed to handle quoted-printable HTML encodings, regularly used everywhere. And those that had quoted-printable examples used tools no longer maintained - such as mimencode.

The solution is to use Perl directly:

* ^Content-Type: text/html;
* ^Content-Transfer-Encoding: *quoted-printable
| perl -pe 'use MIME::QuotedPrint; $_=MIME::QuotedPrint::decode($_);'
| lynx -dump -force_html -stdin
| formail -i "Content-Type: text/plain; charset=us-ascii"


Ripping vinyl on Linux

 I've been ripping a lot of stuff from vinyl to FLAC recently. Here's how I do it.

I have an Alesis I/O 2, which works well and seems fairly decent quality.

First, most important, step, is to stop trying to use Audacity. It's incredibly broken and unreliable. Go get ocenaudio instead. It's fairly new, but it works reliably.

After monitoring your levels, record the whole thing into ocenaudio.

First trim any obviously loud clicks such as when landing the needle. ocenaudio doesn't seem to have a "draw sample" function yet, the only thing I miss from Audacity, but deleting just a few samples is usually fine.

Normalise everything.

Then select a whole track using Shift-arrows (and Control to go faster). Press Control-K to convert it into a region, and name it if you like.
You'll see references to using zero-crossing finders to split tracks. This is always a bad idea - it's simply not reliable enough, especially with an old crackly record, isopropyl'd or not.

Zoom all the way out again, make sure the number of tracks is right.

Then File->Export Audio From Regions, making sure that the "separate files" checkbox is set.

Now it's tagging time: run "kid3 yourdirwithflacs". First import from discogs, presuming it has the release (it usually will) File->Import From Discogs. Then click 'Tag 2' in the Format Up part, along with the format you need. Save all those, then use Tools->Rename Directory to rename the containing directory. You're done.  


Recording on Linux with Alesis io|2

A little note for myself: to get low-latency monitoring, and more importantly, record at the right rate, you need to set the Configuration-Profile to "Digital Stereo Input" in pavucontrol!

Update: you also need this in ~/.pulse/daemon.conf :


Another update: PA/ALSA often seems to forget the sensible default devices, and ocenaudio starts
trying to record from the monitor devices. Solution seems to be to run pavucontrol, start ocenaudio recording, and change the drop down box to select io|2 Digital Stereo.


PayPal idiocy

This is unbelievably stupid of Paypal. I just got this email from them:

vinyl tap records would like you to use PayPal - the safer, easier way to pay and get paid online.
To send vinyl tap records your payment and see the details of this invoice, copy and paste this link into your web browser:


So much for "never click a URL in email". Even worse, if you log in separately, the request is not visible anywhere. Morons.


NatWest phishing service

I got some NatWest phishing spam the other day and was amused to notice this:

<title>NatWest - Security Information</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<link rel="stylesheet" type="text/css" href="http://www.natwest.com/microsites/global/phishing_demo/includes/css/generic.css" media="all" />
<a href="http://www.natwest.com/"><img src="http://www.natwest.com/microsites/global/phishing_demo/images/h_logo.gif" alt="NatWest - Load home page" /></a>

Enterprising of them to actually uses NatWest's explanation of phishing to ... phish.


Name and shame time

To quote 123-reg customer support:

> When will you be supporting AAAA records?

There are no current plans to implement this but notifications will be sent out if this takes place.


Avoid vps247 hosting

Late last year, I was forced to find a new host for movementarian.org, as my previous hosting provider (Blue Room Hosting, who were really great) were shutting down. I went with VPS247, as they were local to Manchester and seemed reasonable.

Unfortunately my experience has been terrible. They've failed to keep the machines on the net, regularly causing ssh sessions to die. The dmesg is full of warnings about the block drivers failing to write for more than two minutes: evidently the SAN setup they have is totally unreliable.

My VM went down for a significant amount of time and support were very slow to respond. During the total outage, there were no status updates, and no response on the support tickets or the forums. The penultimate straw was when my filesystem was massively corrupted. Even though my VM is hardly critical, I can't be doing with unreliability like this, especially when they're not reachable when problems occur.

My final straw, though, was when I discovered they'd deleted all the negative comments from the Client Comments section of their forum. That's really, really, not on.

I'm now with linode and happy (so far).


pbranch curiosities

I've started using pbranch extension for hg more seriously. It works nicely but is a little rough around the edges, in particular:

No hg qpop/push equivalent

I really miss this. I find myself constantly doing hg pgraph to figure out where I am and then typing the patch above or below.

No way to shelve a patch

With MQ, I can easily guard a patch to temporarily remove it from the queue. There doesn't seem to be a simple way to do that with pbranch.

Editing patch messages.

You use peditmessage, but because this modifies the repository, you then have to always hg pmerge -all. This pops to the top and causes a bunch of extra changesets, and it gets annoying quickly. And frustratingly, these patch messages do *not* appear in the repo history. So your code reviews of the main repo are just showered in useless merge messages, instead of the actual commit message you care about.

No pfinish

I don't know why, but there's no way to automatically commit a patch as a single changeset on the root default tip, then close the patch branch.

Inserting and deleting patches is horrible

Yuck - I really hope this gets easier soon.

Showing the current patch history

A little tip not mentioned on the pbranch site: the way to show the changelog history of the current patch is to do hg log -b patchname.

Re-enable Ctrl-Alt-Backspace in Xorg

Create the following as /etc/hal/fdi/policy/30user/10-x11-zap.fdi:

<?xml version="1.0" encoding="UTF-8"?>
<deviceinfo version="0.2">
Default X.org input configuration is defined in:
Settings here modify or override the default configuration.
See comment in the file above for more information.

To see the currently active hal X.org input configuration
run lshal or hal-device(1m) and search for "input.x11*" keys.

Hal and X must be restarted for changes here to take any effect
<match key="info.capabilities" contains="input.keys">
<merge key="input.x11_options.XkbOptions" type="string">terminate:ctrl_alt_bksp</merge>

and then restart hald and Xorg.

Disabling that goddamn GTK bell

echo 'gtk-error-bell = 0' >>$HOME/.gtkrc-2.0


Changing liferea keyboard shortcuts

Liferea has no keyboard shortcut editor itself, but "Toggle unread status" demands the wrist-breaking chord action of Control-U. It expects you to be able to edit the shortcuts via the editable menu feature of GTK+.

Unfortunately that's disabled on all modern GNOME installs, and there's no UI for re-enabling it. As usual, gconf-editor to the rescue. The key you need to change is /desktop/gnome/interface/can_change_accels. After re-starting Liferea, you can then edit via hovering over the menu item and pressing the combination. Of course, this in itself is buggy: if it clashes with a menu accelerator (as 'r' is), it will perform that action instead.

It's simpler to directly edit the accels file in your Liferea dot dir.


Epson all-in-ones: avoid like the plague

Browsing the net, you might get the impression that Epson Stylus All-in-ones are well supported under Linux. Unfortunately this is not the case. The pipslite driver you have to install is extremely flaky, and Fedora SELinux doesn't work properly with it. There's no "draft" mode for some bizarre reason; printing is extremely slow and often randomly cancels half-printed jobs due to USB resets

The scanner doesn't work at all with the iscan software, despite claims to the contrary.

Setting up JACK on Fedora 12

Audacity is somewhat of a broken joke these days, so I needed to use Ardour to record. And that meant setting up JACK. Since JACK insists on exclusivity, I also needed to route pulseaudio through JACK so I could use other apps at the same time. Unfortunately, this is a bit of a pig to figure out. I hacked it as follows:

First edit /etc/pulse/default.pa, you need to add two lines:

load-module module-jack-sink
load-module module-jack-source

In theory now, a restart of pulseaudio should start using JACK for recording and playback, if jackd is running. However, it tends not to work very well: you might find PA hanging and you have to kill -9 it.

This isn't enough of course, now when you log in again, gnome-session will try to start pulseaudio, but not jackd, so nothing works. It's far from the right way, but I edited /usr/bin/start-pulseaudio-x11 (which is started from a /etc/xdg/autostart/ script), as follows:

amixer -c 0 sset 'Input Source' 'Line'

nohup jackd -d alsa &

sleep 5

/usr/bin/pulseaudio --start "$@"

Note that I have to set the input source by hand: something in desktop start up used to do this for me, but now I'm going through JACK it has to be done by hand.


Liferea strict feed validation tip

New versions of Liferea refuse to parse any feed that fails to validate, even for relatively "minor" problems (the libxml2 recovery facility is no longer used; besides, it abandons the rest of the feed when it hits such problems). I don't want to use Google Reader, since I don't like the interface.

Typically bad feeds have things like high-bit chars or bare ampersands. Thankfully, there's a "conversion filter" feature that you can use to work around the bad feeds. On the two bad feeds, I run this filter:

[moz@pent ~]$ cat bin/fix-ampersands

sed 's/\o226/&amp;/g' | sed 's/& /\&amp;/g' | sed 's/\o243/GBP/g'


The main indicators of egotism as I intend it here are are loud self-display, insecurity, constant approval-seeking, overinflating one’s accomplishments, touchiness about slights, and territorial twitchiness about one’s expertise. My claim is that egotism is a disease of the incapable, and vanishes or nearly vanishes among the super-capable.

I’m the crippled kid who became a black-belt martial artist and teacher of martial artists. I’ve made the New York Times bestseller list as a writer. You can hardly use a browser, a cellphone, or a game console without relying on my code. I’ve been a session musician on two records. I’ve blown up the software industry once, reinvented the hacker culture twice, and am without doubt one of the dozen most famous geeks alive.

No prizes for guessing who this was.


A horrible little ElementTree gotcha

What does this print:

from lxml import etree
doc = etree.fromstring('<a><b><c/></b></a>')
newdoc = etree.ElementTree(doc.find('b'))
print newdoc.xpath('/b/c')[0].xpath('/a')

The answer is: [<Element a at 817548c>]. The first point to note is that xpath() against an element is only relative to that element: any absolute XPaths enumerate from the top of the containing tree. The second point is that the shallow copying of etree means that _Element::xpath, unlike _ElementTree::xpath, evaluates absolute paths from the top of the original underlying tree! So even though there's no <a> in newdoc, an absolute XPath on a child element can still reach it.


YouTube annoyance

How much time would it really take to order multi-part videos, so the suggestion at the end of the video is the next part? Please!


An annoying Python gotcha

Imagine you have this in mod.py:

import foo

class bar(object):

def __del__(self):

Seems fine right? In fact, there's a nasty bug here. If I try to use this module in client.py like so:

import mod
mybar = bar()

Then you're likely to get an exception when the program exits. This is because Python, for some bizarre reason, Nones out the globals in mod.py when taking down the interpreter. The actual __del__ method can be called sometime after this, and it ends up trying None.cleanup(), with the resultant AttributeError. It seems extremely bizarre that it happens in this order, but it does (a real example).


Kernel solipsism

Thomas Gleixner:

Exactly that's the point. Adding dom0 makes life easier for a group of users who decided to use Xen some time ago, but what Ingo wants is technical improvement of the kernel... The kernel policy always was and still is to accept only those features which have a technical benefit to the code base.

It boggles the mind that someone could get things so backwards. The kernel exists to provide services to the outside world, not the other way around. By all means criticise the details of the Xen dom0 code, but this argument makes zero sense. How precisely did x86_64 support provide a technical benefit to the code base?