xcb-studies/docs/XCB_Cheatsheet.md

10 KiB

XCB

XCB is a library for communicating with the X-Windows system used on Linux, FreeBSD, and other Unix-like operating systems. XCB's interface is written in C.

Understanding X, at least the parts I'm dealing with.

The X Windows System (hereafter, X) is a server. It listens for events (keyboard events, mouse events, timer events from connected programs, etc.) and "stores" the results on a display. The display is the root the of the X hierarchy, and the object to which you initially connect. The display is described by a string which encapsulates the address of the X server. The format of the string predates the URL standard.

Part of the problem with untangling the hierarchy in X-Windows is that it's meant to be extremely malleable and re-usable. The Wikipedia article says that the display has a top-level window; The Xlib Tutorial says a screen is a physical monitor, and a workstation can have more than one screen, but each screen has its own top-level window. But both of these statements are inaccurate!

A top-level window can span multiple screens, and each screen may or may not have an monitor at all. Headless X systems used for testing may have an output, which is a managed region of memory into which X is "drawing", but may have no physical device at all!

For the purposes of modern X, the server manages one or more (in commonplace practice, only one) logical screen objects, which has multiple output and crtc objects. An output is the video output manager on your device, such as your GPU, but it may just be a virtualized chunk of memory for the headless scenario described above. The X server is responsible for figuring out at start time what output objects you have and what crtc objects are connected to them, assigning crtc devices to each output, and choosing a default set-up that will support running your instance of GTK or KDE or whatever.

ASIDE: You kids have it easy. Before the existence of modern, self-describing hardware, we had to hand-enter every single one of these details, and if you got a detail wrong it was possible to burn out your video card or monitor!

For example, in a two-monitor set up X will have a single screen with a single root window spanning both monitors, but there would be two output devices, each with its own crtc, mapping a client program's output to the physical pixels on the screen. This is how it's possible to drag an application window from one monitor to another, and have it be visible as it crosses the bezels between them. The output and ctrc together act as the framebuffer manager for displays.

My goal is to enable autorotation on tablets running X, using the XCB interface. For the purposes of that fairly straightforward goal, I want to find the base screen, assert that it has a single output and a single crtc, and then send a command to the crtc object to rotate its contents to the orientation I desire. The change to the crtc will cause the output object to remap all of the pixels it is currently tracking to the new orientation. Most modern window managers are pretty good about re-arranging the screen to manage this change!

ASIDE: It makes no sense for a virtualized output device, one which has no actual display visible to the human eye, to have a crtc. It's orientation doesn't matter. If it ever is displayed to a human being it will be in a virtualizing environment such as Xephyr, in which case it will be getting its crtc information from the host X display.

ASIDE: I'm still not sure how all this interacts with a window manager that has 'workspaces', such as Gnome-Mate. Which means I could be entirely wrong about this whole thing! On the other hand, it could be as simple as every workspace having it's own pseudo-root-window, and being dependent upon the crtc object for screen dimensions and pixel mapping. I have to read this carefully.

TODO: I haven't yet figured out the bit about remapping the tablet's touchscreen inputs, so that when you place your finger or stylus on the screen X maps the pointer location to the right place.

Connecting.

To connect to X via XCB, you use the xcb_connect function. It takes two arguments, a string with the name of the display, and a pointer-to-int to the default screen's ID, which is a returned value. It returns an opaque data structure, 'xcb_connection_t'.

xcb_connection_t* xcbConnection = xcb_connect(const char* display, int* screen);

There are variants for connection-with-authorization, and connection-with-file-descriptor.

This function always returns an allocated structure, even on failure. You must test for failure with:

int error = xcb_connection_has_error(xcb_connection_t* xcbConnection);

The error is an number defined in xcb.h. See that file for the list of possible failure modes.

Disconnecting

You must close the connection when you are finished. In the event of a connection failure, you must still call this function to free the memory XCB used to report the connection failure:

void xcb_disconnect(xcb_connection_t* xcbConnection);

Getting the default screen structure

Once you've connected, you need the default screen structure. Oddly, it's at the bottom of a linked list of screen structures that you have to find by counting down from the default screen ID retrieved during the connection, using XCB's supplied traversal functions.

Doing this is so commonplace that there's a function for doing it provided by the xcb-aux extension, which on Ubuntu is accessed through libxcb-util-dev:

xcb_screen_t* screen = xcb_aux_get_screen(xConnection, default_screen_id);

The screen structure is not opaque; it contains the root window, as well as the width and height of the total display (covering all monitors!) that it is expected to manage, along with other details that, well, aren't relevant (at least, not yet, and maybe I hope not).

Getting the screen's resources

Now we're getting into RandR's portion of the business. RandR is an extension to X that allows userspace programs to manipulate the core functionality of outputs and displays. Those are the resources the X screen has to draw on. This is also the first time we're going to encounter the "standard" XCB interface.

XCB has an idiom of sending a request to the X server and storing a token (a uint32_t) that it calls a cookie. When it wants to review the reply to that request, it asks for the reply using the cookie.

It's possible to send many requests, both commands-to-set and requests-for-information, to the X server, and then retrieve the replies all at once. If the XCB interface has already received the replies, it can hand them over at once; otherwise, it'll wait for one. In this way, XCB and X can work asynchronously, batching transactions and reducing latency.

For our purposes, we're not going to do that. We're just going to ask for one object. The get command uses the rootWindow, which as I mentioned earlier is on the screen we retrieved as screen->root.

xcb_generic_error_t* error = nullptr;

xrandr_get_screen_resources_cookie_t screen_resources_cookie =
    xcb_randr_get_screen_resources(connection, screen->root);

xrandr_get_screen_resources_reply_t* screen_resources = 
    xcb_xrandr_get_screen_resources_reply(connection, screen_resources_cookie, &error);

IMPORTANT reply objects are allocated by XCB. You are responsible for free()ing them afterward. If there is an error, the reply object will be null, but the error object will contain the error response as an allocated object and you are responsible for free()ing it. Only reply and error objects are allocated; all the rest are part of the connection object and will be freed on disconnect.

Getting the outputs and crtcs

Now that we have a screen, we want to get all the output devices, find the crtc associated with it, and see the rotation! As I've been using C++, I'm going to store a collection of cookies in vector.

There are two idioms in this example; the first collects all the query cookies and then iterates through the replies; the second gets the query cookie and immediately requests the reply. While the second idiom is "slower" by an order of magnitude, on my laptop with a shared-memory connection the difference is 3000 nanoseconds vs 300 nanoseconds-- not enough for most people to notice.

Notice that I do not free the return from the xcb_randr_get_screen_resources_outputs call; that is simply interpreting the contents of the screen_resources object. As before, the screen_resources object itself will have to be freed eventually.

std::vector<xcb_randr_get_output_info_cookie_t> output_get_cookies;
std::vector<xcb_randr_get_output_info_cookie_t> output_crtc_ids;

xcb_generic_error_t* error = nullptr;

int len = xcb_randr_get_screen_resources_outputs_length(screen_resources);
xcb_randr_output_t* randr_outputs = 
    xcb_randr_get_screen_resources_outputs(screen_resources);

for (int i = 0; i < len; ++i) {
    output_get_cookies.push_back(
	    xcb_randr_get_output_info(connection, randr_outputs[i], timestamp));
}

for (const auto& cookie : cookies) {
    xcb_randr_get_output_info_reply_t* reply =
        xcb_randr_get_output_info_reply(connection, cookie, &error);
	if (error) {
	   free(error);
	   continue;
    }
	
    xcb_randr_get_crtc_info_reply_t* crtc = xcb_randr_get_crtc_info_reply(
        connection, 
		xcb_randr_get_crtc_info(connection, output->crtc, timestamp), NULL);
	
	// It's possible that there is no CRTC associated with the
	    token. This isn't an error.

	if (!crtc) {
	    continue;
	}

    std::cout << "(x: " << crtc->x << ", y: " << crtc->y << ") (width: " << crtc->width
              << ", height: " << crtc->height << ") status:" << unsigned(crtc->status)
              << " rotation: " << rotation_map(crtc->rotation) << std::endl;

    free(crtc);
	free(reply);
}

... this is as far as I've gotten. And it's all starting to make sense, but wow, what a journey just to get this far.