josuah.net

Bits of C programming

Coding style that helps to isolate unrelated parts of your programs, strive to give a sane and simple API. Note that I am never happy with what this document says.

Naming convention and project hierarchy

And then, following the file name in ./src/:

assert(), assert(), assert()

Error handling

While other programming languages planned special keywords like try/catch and features like exceptions, automatic cleanup at function exit (useful while needing to return early, like when an error occurs), C lets the user handle error with if (failure_condition) { handle_failure; }.

Some idioms permit error handling to be as inobstrusive in C as in any other languages:

Error handling if() style

	mem = malloc(3);
	if (mem == NULL)
		return NULL;
-vs-
	if ((mem = malloc(3)) == NULL)
		return NULL;

Error handling with goto

	fd = open(path, O_RDONLY);
	if (fd < 0)
		return -1;

	mem = malloc(3);
	if (mem == NULL) {
		close(fd);
		return -1;
	}

	if (do_something_1() < 0) {
		close(fd);
		free(mem);
		return -1;
	}

	if (do_something_2() < 0) {
		close(fd);
		free(mem);
		return -1;
	}

	return 0;

-vs-
	fd = open(path, O_RDONLY);
	if (fd < 0)
		return -1;

	mem = malloc(3);
	if (mem == NULL)
		goto err_close;

	if (do_something_1() < 0)
		goto err_close_free;

	if (do_something_2() < 0) {
		goto err_close_free;

	return 0;
err_close:
	close(fd);
err_close_free:
	free(mem);
	return -1;

-vs-
	fd = -1;
	mem = NULL;

	fd = open(path, O_RDONLY);
	if (fd < 0)
		goto err;

	mem = malloc(3);
	if (mem == NULL)
		goto err;

	if (do_something_1() < 0)
		goto err;

	if (do_something_2() < 0) {
		goto err;

	return 0;
err:
	close(fd); /* does nothing if fd == -1 */
	free(mem); /* does nothing if mem == NULL */
	return -1;

Known as "the only right use of goto".

Error handling with enums + switch + *_errstr

	int
	conf_read_file(char const *path, struct conf *cf)
	{
		int fd, err = 0;

		fd = open(path, O_RDONLY);
		if (fd < 0)
			return -CONF_ERR_SYSTEM;

		err = conf_parse(fd, cf);
		if (err < 0)
			goto end;
	end:
		close(fd);
		return err;
	}
...
	int
	conf_errstr(int i)
	{
		enum conf_errno err = (i > 0) ? i : -i;

		switch (err) {
		case CONF_ERR_SYSTEM:
			return "system error";
		case CONF_ERR_SYNTAX:
			return "syntax error";
		}
		assert(!"all errno should have been handled before");
		return "unknown error"; /* make compiler happy */
	}
...
	int
	main(int argc, char **argv)
	{
		struct conf cf = {0};
		int err;

		err = conf_get(path, &conf);
		switch (-err) {
		case 0:
			break;
		case -CONF_ERR_SYSTEM:
			fprintf(stderr, "%s: %s: %s\n",
			  argv[0], conf_strerror(err), strerror(errno));
			return -1;
		default:
			fprintf(stderr, "%s: %s\n",
			  argv[0], conf_strerror(err));
			return -1;
		}

		return 0;
	}

(Ab)use of types

While abused, types makes the code more opaque and harder to read. But used wisely, they are the key for handling structured data and build useful abstractions, which often come for no extra cost in compiled languages that work without a runtime type system (but a compile-time one).

struct

In this extreme case: what if more parameter are to give? Change all the functions?

	do_something(buf1, len1, prop1, buf2, len2, prop2, buf3, len3, prop3);
-vs-
	do_something(struct1, struct2, struct3);

Struct gives an outline of the data structures that the program uses:

Choose the right data structures and the program will write itself

enum

In switch statements and in code:

	switch (get_state()) {
	case 2:
		do_it_again();
		break;
	case 1:
		next_step();
		break;
	}
-vs-
	switch (get_state()) {
	case PARTIAL_DATA:
		do_it_again();
		break;
	case DONE:
		next_step();
		break;
	}
	/* compiler warning for missing TOO_MUCH_DATA */

Initialize fields:

	char *state_to_description[] = {
		NULL,
		"all done, goodbye",
		"partial data read, doing it again",
		NULL,
		NULL,
		"too much input read, erroring out",
	};
-vs-
	char *state_to_description[] = {
		[DONE] = "all done, goodbye",
		[PARTIAL_DATA] = "partial data read, doing it again",
		[TOO_MUCH_DATA] = "too much input read, erroring out",
	};

Memory management

C memory management plays around pointers: a variable holding a position in memory that is available for the program to use.

For growing a region of available memory (a buffer), it is sometimes needed to move memory somewhere else, which changes the pointer: the address in memory locating the buffer.

Pointers are useful for multiple data structures, for referring another element such as the ->next element of a linked list.

If a buffer is refered from by one of these data structures, but the buffer grows, the reference will therefore be invalid.

Combining memory management and pointers gets done through checking when a buffer is immutable (and will not change anymore), or may still be modified:

Same struct type, various size:

	struct obj {
		char *name, *description; .  length given by
		size_t len;               |- sizeof(struct obj)
		struct obj *next;         '
		char buf[];               :- variable length at
	};                                   the end of the struct

	struct obj *
	obj_parse_new(char const *input, struct obj **first)
	{
		struct obj *new;
		size_t len;

		len = strlen(input);
		new = calloc(1, sizeof *new + len);
		if (new == NULL)
			return NULL;

		memcpy(new->buf, input, len);
		new->len = len;

		if (obj_parse(new) < 0)
			goto err;

		new->next = *first;
		*first = new;
		return new;
	err:
		free(new);
		return NULL;
	}

Immutable pointers and linked lists

Not an immutable pointer: it changes the pointer as the memory grows:

	struct obj *
	obj_grow(struct obj *obj)
	{
		mem = realloc(obj, obj->len + 10)
		if (mem == NULL)
			return NULL;
		obj = mem;
		obj->len += 10;
	}

This is what we want for linked lists: it does not change the pointer as the memory grows.

	int
	obj_grow(struct obj *obj)
	{
		mem = realloc(obj->buf, obj->len + 10)
		if (mem == NULL)
			return -1;
		obj->buf = mem;
		obj->len += 10;
	}

Variable-size struct member as shown before: only when size does not change in advance.

Reentrant function without specifying the buffer

Reentrancy is a property of a function which ensure that even if it is being executed multiple times at once (due to threads), it will still work.

One cause for fuction to fail at this is when they use a global (or static) variables.

One temptation for using global variable that happen so often that even the libc is built around it is to save the developer from passing a buffer to fill.

For example localtime() is returning a pointer to a (struct tm *), but no such structure is passed to the function: a same global or static variable is used for all calls to localtime, which makes it vulnerable while using threads. localtime_r() is to be used instead.

It is possible to pass a chunk of the stack memory of the calling function without having to declare a variable through compound literal and a macro:

	strftime("%Y-%m-%d", localtime_r(clock, (structd tm *){0}));

This is implemented inline, but if localtime() was a define to this statement, then localtime() would have been reentrant with the same convenience it currently has:

	#define localtime(clock) localtime_r(clock, (structd tm *){0}))

This can also be convenient to write formatters that take structures or integers as input and return a string that can be passed directly to printf or similar:

	printf("...%s...", fmt(num))