Avoiding namespace pollution in C

Writing small C programs is fairly simple, but writing maintainable code for growing projects with multiple team members is an entirely different story. Keeping the namespace clean and predictable is one of the more important unspoken C skills to have.

What is namespace pollution?

C is one of the most popular programming languages, but its minimalist design lacks many modern language features like namespace isolation. All functions, structs, and variables live in the same namespace by default, causing issues from name collisions to code ambiguity. Where does the function init() or parse() come from? How are you supposed to wade through hundreds or thousands of autocomplete suggestions? A developer familiar with the codebase may know, but what if new team members are added, or when developers need to touch parts of the code they aren't familiar with? Relying on institutional knowledge or having to constantly search through the source code are a problem for companies of all sizes.

Prefixing library symbols

The most common way to logically group functions and symbols into namespaces is prefixing, usually with the name of the header file the symbol is defined in. For example, if a file mymath.h defines a function add(), it could be named mymath_add().

Many popular libraries like libcurl, ffmpeg and libpng follow this approach.

Some discipline is required to properly prefix every exported symbol (not only functions!), including structs and variables.

It may be helpful to define a macro function to produce the prefixed name, so it could be easily changed later. A simple text substitution macro is enough for this purpose:

#define PREFIX(name) mylib_##name

Then use it to generate function names, for example instead of

int add(int a, int b){ ...

you would write

int PREFIX(add)(int a, int b){ ...

which produces the correctly prefixed function

int mylib_add(int a, int b){ ...

This approach allows for quick changes of the prefix by adjusting the macro, while not being too complex for language servers to maintain "goto definition" and autocomplete suggestions (unlike other approaches with layered macros and expanding variables inside the replacement macro).

Keeping exported symbols minimal

Often surprising for less experienced C developers is the difference between external visibility and external linking. When a symbol is defined in a header, it is visible to source files importing it.

But even symbols defined in the library's source file (.c) are available to files importing the header, just not directly visible. Every symbol in the global scope (outside function bodies) have external linking by default!

Let's look at an example of this problem. Suppose you have two libraries, say one and two, which both define a function add() for internal use only.

Neither one.h nor two.h export any symbols:

// intentionally doesn't export anything

while the source files one.c and two.c define a function with the same name:

int add(int a, int b){
	return a+b;
}

Even though the headers don't expose the functions from each library, compiling a project using both one.h and two.h will throw an error:

/usr/bin/ld: /tmp/ccpGxtSZ.o: in function `add':  
two.c:(.text+0x0): multiple definition of `add'; /tmp/cc51n6vr.o:one.c:(.text+0x0): first defined here

The only way to avoid this conflict is by declaring these functions as static, which will restrict their linker access to the file they are defined in. Changing the function to:

static int add(int a, int b){
	return a+b;
}

will compile just fine. Always mark symbols not meant for use outside of the current file as static, and only make the non-static functions visible through header file definitions.

Attaching functions to structs

A very different approach to keeping the global namespace clean is to use opaque functions from struct member values. This works by passing a pointer of a static function as a struct member variable, so the function itself is directly tied to the struct and not cluttering the global namespace, but still available outside the library file.

For a concrete example, suppose you are writing a library geometry which exposes structs to represents points on a 2D graph (storing x and y values). These points should also be able to be moved to new locations on the graph (changing x/y values).

You could define a header geometry.h like this:

struct Point{  
	int x;  
	int y;  
	void (*move)(struct Point *self, int x, int y);  
};  
  
struct Point* Point_create(int x, int y);

Note how the move struct variable is a pointer to a function which isn't made available through the header. This is because the implementation marks it as static:

#include "geometry.h"
#include <stdlib.h>  
  
static void Point_move(struct Point *self, int x, int y){  
	self->x = x;  
	self->y = y;  
}  
  
struct Point* Point_create(int x, int y){  
	struct Point *p = malloc(sizeof(struct Point));  
	p->x = x;  
	p->y = y;  
	p->move = Point_move;  
	return p;  
}

Now, from your main.c program, you can still access the Point_move function through the move struct variable, even though it is not otherwise visible/linked externally, and cannot be used outside of the struct:

#include "geometry.h"  
#include <stdio.h>  
  
int main(){  
	struct Point *p = Point_create(2, 3);
	printf("Point is at %d, %d\n", p->x, p->y);  
	p->move(p, 15, 16);  
	printf("Point is at %d, %d\n", p->x, p->y);  
	return 0;  
}

The program will compile and run, printing:

Point is at 2, 3  
Point is at 15, 16

While not the best choice for most libraries, especially ones with many different variants of the same function will benefit from this approach, exporting only symbols through structs instead of polluting the namespace with functions that only work with a single type anyway. Think of a geometry library providing a move function for all kinds of shapes, like rectangles, circles, squares, lines, points, ... you get the idea.

Passing static functions as struct members effectively mimics the behavior of classes from other programming languages, so the concept is easily accessible for most developers.

It should be noted that passing function pointers for every instance may cause memory overhead to store the pointer, and calling the function will have a tiny delay due to indirection (tiny, barely measurable).