Offsite Links

More Of Me:

More About Me:

Understanding C is hard.

2022-03-03

Understanding C is hard. It doesn't have to be quite so difficult to understand, but it is. There are various reasons for this unfortunate fact of life, some of which appear in the following text. In part because of these reasons, and in part for much more mundane and common reasons particular to writing, there may be some blunders or incompleteness in this article. I intend to update it as I notice omissions or other problems, but perhaps you've heard what they say about good intentions: the road to hell is paved with them.

What It Is

In short, this is my first pass at an attempt to write the document I wish I had available when I first started getting into C. (That was actually long enough ago that some of what I present would have been meaningless then. Maybe this is really an attempt to write the document I would like to have available if I first started getting into C today.)

It took me too long to really get C in principle, because at first I did not know some of the points I explain here. Without some of this knowledge, I would not have crossed that line between two states: disliking C, and discovering that C programming is my favorite video game.

I broke it up into two parts. The first is a list of advice and other hopefully useful tidbits for those who wish to really "get" C and do it well. The second is a list of elaborations on conditions that underlie the first list's points. The bullet points in each list are intentionally written in such a way that their order should not matter much; I want each to be a stand-alone, reusable component of understanding the C programming language.

Part 1

This list of explanations is my attempt to help kick-start the process of making it easier for people who want to learn C to develop a good mindset, early on, for learning C well.

  • Alice Maz said writing C is easy, but writing it well is hard. This distinction needs to be drummed in from the start. A big part of the problem is the vast gulf that exists between what the C standard establishes as a terrible idea for trying to write C code and what a given environment seems to allow. Another is the simple fact that C really does let you shoot yourself in the foot if that's what you want to do, with little or no warning or handholding; it is a spinning sawblade with no guards or failsafes. If you intend to program in C, do yourself a favor and find a free draft of whatever C standard you intend to use (c99, C11, or c23, presumably), preferably the last draft before the published standard. The actual published, finalized document for any given C standard is expensive, but the last pre-standard draft should be functionally identical.

  • GNU source code should be avoided, with a huge distance between you and it, when looking for code examples. There are occasionally some good little bits of algorithm here and there: "Even a stopped clock is right twice a day." (This is only true for twelve-hour clocks.) Any amount of good code is typically drowning in a swamp of overcomplicated, poorly organized, painfully styled code that ends up being harder to maintain, producing software that's harder to use, and introducing more opportunity for bugs -- both security bugs and the more mundane varieties. Unless you know what you're doing you should never even look at most GNU code, and if you do know what you're doing you will already know to stay away from most GNU sources. If you insist on an example, though, compare OpenBSD echo with GNU echo. All of that is ignoring the fact that borrowing GNU code may create legal problems for your project.

  • Lean heavily on OpenBSD manpages for various C library functions and, when applicable, look to OpenBSD project sources for examples of how to write decent, clear code. This can help you choose both the functions you use and the manner in which you use them so that you don't regret your code (as much) later. All of this can be found online, though I admit it's easier if you just have an OpenBSD system handy. The Linux C library manpages can also be useful for more portability related information about different functions (also in searchable form similar to the official OpenBSD resource above).

  • Pointers are not scary. Only the way people are (improperly) taught pointers is scary. If pointers were properly integrated in the process of learning and writing C from day one, people with the capacity for programming C well would not have to fear pointers at all. At this point, I think it's probably best to read through a book on C programming for beginners until you get to pointers, without really practicing C, then read the stuff about pointers very closely and carefully several times, then without finishing the book go back to the beginning and start over. For every exercise and every code sample, try to figure out if and how you could write it with the addition of pointers, even if they're unnecessary, e.g. declare every variable as a pointer rather than a bare char, int, and so on. You should not necessarily write all your code this way in the real world, but you should be able to do so without fear, and start learning this as early as you can.

  • Practice (and other learning) project ideas are among the most commonly lamented hurdles I see from people who are having difficulty progressing in early programming, regardless of the language. To help with that, when you can't come up with your own ideas that fit within your skill level and perhaps push the edges of it just a bit, I suggest using a Unixy environment to help provide a straightforward context in which to come up with simple project ideas; it's well suited to inspiring ideas such as reimplementing the simplest Unixy tools (echo is a great first choice, and wc is good not long after), which is incidentally part of the approach the second edition of the legendary book The C Programming Language uses for presenting reader exercises. Building tools for the Unix userland can help a new C programmer earn about text stream handling including the creation of command line filters (take text input and produce text output as the primary mode of operation). Programming for a Unix environment also lends itself to figuring out the fork-exec pattern for tool development (which will open up a lot of potential inspiration for more practice programs). In some cases, this whole process of learning by doing can be enhanced a bit more by writing Unix shell scripts as prototypes for later reimplementations and enhancements in C.

  • Resources for learning are disappointing overall. I don't know of any comprehensively good books for learning C, unfortunately, and I would be surprised to find a comprehensively good learning resource of any other kind, though some arguably get close, and may be said to be very good within constraints. For instance, my experience with the K&R book, The C Programming Language (second edition, for ANSI C; sometimes known as the White Bible because of its white cover and importance in C programmer culture), is largely positive. It first falls into the same problem that essentially everything else does: teaching C with a certain level of delay and suppression of full, proper attention to pointers. Its other issues are relatively minor, I think. The upshot is simply this: you should keep in mind that, no matter what learning resource you pick up, you are (at minimum) likely to be missing part of the full story, and may be led wildly astray (so, judging by its reputation, stay away from Learn C the Hard Way).

Part 2

This list is my attempt to provide some grounding for understanding why the above can help, to aid the student of C better apply advice gleaned from the above, and to inform and motivate people who would develop C instructional materials to do better than those who came before. Some of it may be a bit controversial.

  • A programming language should, ideally, come with a built-in core programming paradigm. C does that. C is a pointer-oriented programming language. Because pointer-oriented programming languages were not really a thing back then, nobody realized it was a pointer-oriented programming language. Early design decisions did not take that into account and, as such, some quirks in the language obscure that fact. Because code organizational state of the art was still fairly new then, other paradigms got applied to C retroactively in ways that obscured the fact there was a pointer-oriented paradgim lurking within. We could have a better design for a pointer-oriented programming language, but even today people mostly haven't recognized the existence and value of a pointer-oriented programming paradigm, so I have yet to notice a proper pointer-oriented programming language project anywhere. Even C suffers a bit in this regard, as the standards committee doesn't seem to recognize that about the language, either.

  • Details of the C standard are exacting, but not entirely easy to understand in a casual skim. C is full of tricky corners such as "undefined behavior". Referring to the standard can help you avoid creating security vulnerabilities, crashing bugs, or other problems that may arise when relying on how the compiler works on your own computer. In reality, the fact your program compiled and seemed to work on your own machine may say nothing at all about how it will work out on someone else's, if you blunder into (for instance) implementation defined behavior in your code.

  • OpenBSD devs focus a lot on a few specific priorities rather more than most others. A couple of those priorities include low-level security and reliability concerns (e.g. how to write C in pragmatic ways that support security and reliability) and thorough documentation of foundational concerns in the OpenBSD environment. This results in OpenBSD being an excellent de facto C programming "good practices" resource via its manpages. OpenBSD's generally high quality C source code also serves as a practical resource for learning C programming well. There may be some places where the C source is not the best but, as a whole, the full corpus of the OpenBSD project's C source code tends to be high quality in a straightforward and understandable manner, and it's a good place to start when looking for example code. Extracting extensive notes from OpenBSD manpages about C library functions, to create a document of its own so that people can get a clear sense of which functions to use and which to avoid (and how to use them) for security purposes, would go some distance toward helping people learn to write C well.

  • The Unix userland is, itself, essentially a programming toolset. In fact, there's a whole book about exactly that, written by Brian Kernighan and Rob Pike: The Unix Programming Environment. Automating more complicated or tedious tasks using the Unix native toolset offers a very easy way to start thinking with a programming mindset, and the limitations of shell scripting (which can be pushed far by adding awk to your skills, but that's an "advanced" trick for shell scripting in some respects) can quickly motivate figuring out how to write the same programs in a "real" programming language (e.g. C). Working in a Windows-like environment effectively isolates the nascent programmer from where the rubber meets the road in programming to some degree, to the extent that people who learn C on Windows sometimes spend years without ever internalizing the understanding necessary to create a simple twenty-line program without the aid of an IDE.

  • Writing quality C code requires good practices, and a fairly solid understanding of both the conditions in which one writes C code and the language's capabilities, plus a fair bit of care and patience. Writing C that seems to compile and run is much easier, once getting past the initial hurdles of learning C, but if that's all the C you'll ever write you should avoid letting any of your code out into the wild. Bad C code can be very bad, indeed.

Credit

I owe thanks to oldlaptop on Libera Chat and anonymous others, who provided technical, copy, and content feedback.

Blame

It's all Chad's fault. You can blame Chad for this article.