Thursday, November 11, 2010

Hiring Programmers

This is intended as a page to reference the material already here on WardsWiki related to hiring a programmer.

Anyway, here are some references:

HiringPatterns
GuerrillaGuideToInterviewing
- In this, Joel says the #1 cardinal criteria for getting hired at Fog Creek are:
  - Smart
  - GetsThingsDone
JobInterviews
http://www.artima.com/wbc/interprog.html
http://www.jrothman.com/weblog/htpblogger.html
http://www.possibility.com/epowiki/?page=InterviewQuestions

http://wiki.pragprog.com/cgi-bin/wiki.cgi?AnneLearnsToRecruit

Programming Challenges For Interview

When interviewing it does matter what their CV says, it does matter how good their references are, and it does matter how well and clearly they express themselves. It also matters how well they program for real.
It doesn't matter if they know the difference between varchar and varchar2 or if they know what JCA stands for. What's important is that they must be problem solvers.
In the spirit of "Hiring a Juggler", here is a small selection of the programming challenges that could be set for a potential recruit.

In language of choice, write a routine that reverses a string in place
- Now write a routine that reverses each word in a string
(words are characters separated by spaces)
- Now write a routine that reverses the order of words in a string

Write a program to solve the TelegramProblem

Write a program to solve the OddWordProblem

Write a binary search routine that returns the first occurrence of the key
- If it is recursive, convert it to non-recursive, or vice-versa.
- Now write another that returns where the key should be inserted if it is missing

Find the length of a linked list, or report that it has a loop
- Solve this with a fixed amount of additional storage

Sort the linked list in place.

Given a function f that takes a real and returns a real, find a value x such that f(x) is zero

Find the longest monotone subsequence from an array of numbers. A sub-sequence is any X[j[1]], X[j[2]] ... X[j[k]] where the initial array is X[n], k<=n, 0<=j[i]<n, for all i:0<=i<n and j[i]<j[i+1] for all i: 0<= i < n-1

Write a program that given the time of the day (hours, minutes) returns the angle between the hands on a clock.
- Be on the lookout for them asking if they should return just the smaller angle, and the "clever" solution that just outputs 0 because the hands are parallel.
- See AngleBetweenClockHands

Write a function that will solve the leader election in a ring. All processes must run this same function and one and only one of them must be elected leader.

Once the code is written ask the potential recruit: Where is the bug?
Note that some of these are deliberately vague. We are checking not just the programming capability, but the analysis and communications skills. What clarifying questions are asked can be just as important as writing the code.
The above seem appropriate to ask someone fresh out of college interviewing for a junior position, but it won't tell you anything about whether a senior programmer has the qualities needed for a senior person. Unless they fail, of course, but why not set them a senior level problem to succeed or fail with? Otherwise you're wasting valuable interview time on something that gives very little information upon success.
[This is simply not true. Such an exercise gives you a lot of information for a relatively short amount of time. If the candidate can't solve (for some value of 'solve') it, they are just not senior programming material, and you are done. However, assuming they are, the process of discussing the problem will give you insight into how this person solves problems, and how they design (a good choice of toy problem will allow obvious generalizations, which this person should see). Certainly this isn't the *only* thing you should ask, but inside the time constraints of an interview it is only a simple problem that you can expect to see a detailed solution of. Other lines of questioning should probe design, portability, developer interaction, etc. issues - but you just won't have time to do everything that would be useful. The 'toy problem' question is just one tool, and a useful one at nearly every level.]

Once the code is written ask the potential recruit: Write the unit test for this code

Discussion please to follow. Please keep above the line in DocumentMode.
Write a binary search routine that returns the first occurrence of the key
I missed this at an interview. Hard to remember if you use libraries for this sort of stuff.

: (added later) I've never seen a library that implements this function. Note that it asks to return the first occurrence.
: (added still later) The C++ library very nearly implements this function. What it does implement is to return the lowest position at which a given value can be inserted into a sorted sequence without violating the ordering. This position is the position of the first element with the given value if such an element exists; otherwise it is either (one past) the end of the sequence or the position of the first element greater than the given value. So you can solve the problem by calling this function and then checking whether the returned value is off the end or refers to an element unequal to the given value.

Sorry, but I don't understand this. Are you saying that you never implement algorithms from descriptions?
The last time I implemented a binary search routine was in college. I've never had to do it on a real project because there are always libraries to do this sort of stuff. I couldn't just remember it from scratch. I am just not a man like you.
[Well, I guess you'll be fine, as long as your programming career never requires implementing algorithms. The point is not whether you can remember it, but whether, given a description, you can implement it. This sort of thing is pretty basic - you should be able to derive it without much trouble (in an interview, bug-free is probably a bit of a stretch).]
He may also need to settle for lower salary or he may lose to better candidates. Somebody who whines about having to write a binary search for the purpose of an interview should never ever get the title of senior software engineer. Unfortunately many do. So that's why Doug's claim that presenting the OddWordProblem is insulting is faulty: the "senior" title has been diluted beyond recognition to the point where we can't know on average if a "senior" is better than a [graduate] fresh out of a (good) college.
[You misunderstand. (A) I wouldn't criticize using binary search as an interview problem; as it says on ProofsCanBeSimple, "it took 14 years for the published literature to come up with a bug-free binary search". (B) I'm suggesting that harder problems be given for senior positions, not merely that easy problems should be avoided.]
[And yes, one should reconstruct binary search merely from the principle of divide and conquer, not try to memorize it! (But there's reason to think, as noted above, that most [attempts to write binary search] will have minor bugs in it, and that is just to be expected)]
Why would it be expected?
[Quoting from the uvic.ca URL below: Bentley said "Although the basic idea is simple, I have always found the details tricky - I once spent the better part of two days chasing down a bug hiding in a short partitioning loop."]
[For those who find this claim about binary search hard to believe, here's one handy reference online, if you don't own Bentley's book yourself (Bentley was quoting Knuth): http://www.cs.uvic.ca/~vanemden/teaching/sweng/webnotes/1.cleanroom.txt]
If you still don't believe it you can try BinarySearchCodingChallenge?.

If it is a more senior position, ask their opinions and justifications for HolyWar topics.
For example: "Do you feel that programming bugs are inevitable?" The important part of the answer isn't the "yes" or "no" part; it's the justification and support of the answer that is more interesting to me. For example, as an interviewer, I would probably give negative points for either "Yes, because no one's perfect." or "No, because if you have bugs in your code, it's because you're not trying hard enough." So far, the answer I would probably come up with is "Yes, but with a combination of individual discipline and practices, group discipline and practices, the bug rate in the released product can get asymptotically close to zero - if that's the overriding goal of the project." And if I were an interviewee and gave that response, and the interviewer didn't pump me for more details, I would probably give that company negative points. -- TimLesher

Good point, as long as you make it clear you're looking for more than a yes or no. It's important to make sure you ask exactly the question that you want answered, rather than expecting the candidate to know what you want. They doubtless have interviewed with both reasonable and unreasonable interviewers, and can't know exactly what sort of answer is wanted unless one is clear.

Find the length of a linked list, or report that it has a loop

Solve this with a fixed amount of additional storage

Is it possible to do this with a fixed amount of additional storage without mutating the data structure in some way?
Is that algorithm widely known enough to expect candidates to have run into it? I don't think it's obvious enough to expect someone to reinvent it if they hadn't heard of it. (You mean chasing the list simultaneously at half speed and full speed for cycle detection, right?)
Yes, it's that algorithm. No, I don't expect every candidate to invent it for themselves. Interviews are not examinations, they are dialogues. As an interviewer I am looking for information. These programming challenges form the subjects for the dialogues, the candidate's thoughts, ideas, discussion and responses gives me information about how they work and whether they'll be of value in my context.
One can always trade time for space even for silly interview problems with silly constraints.

The problem I have with the questions on this page is that they're all about inventing little algorithms. That's fun and well-covered in school, but I don't see it as having a lot of relevance in the real world. I'm much more interested in the candidate's ability to communicate, to design, to test, and to create code that embodies these three things... and to do them well under schedule pressure.
The questions above aren't like asking a juggler to juggle; they're like asking a juggler to make his own juggling balls. Interesting, and probably covered in juggling school, but only occasionally relevant to the job.
Unless of course the job is about inventing algorithms. In that case, you'll probably be hiring somebody with a hard-core math/CS background and you'll ask them questions that are a lot harder than the ones above.
-- JimShore
Interviewing is a difficult process; you are trying to learn a lot about people in a short time. One of the things programmers do is invent little algorithms, and the process of conceptualizing and solving such a toy program *is* relevant, even if the problem isn't. I certainly wouldn't hire someone on the basis of a test like this - on the other hand, I would certainly *not* hire anyone who couldn't do it (where 'do it' is somewhat flexible). Questions like this are just one tool in the interviewing tool box, and an effective on if used correctly.

The candidate's ability to communicate, design, test and create code is exactly being discovered in the act of working with them to solve a problem they haven't seen before. That's why you need an arsenal of these little challenges, to find one they haven't seen and work with them to solve it.

: Exactly.

I prefer a PairProgrammingTestDrive (working on actual production code). It's relevant and it exposes the candidate's ability to learn, think about big-picture design, absorb information quickly, engage in collective code ownership, work in a pair, engage in TestDrivenDevelopment, and write the kind of code we need.
I've encountered a lot of programmers that can crank out a good, readable algorithm. Programmers that can think in terms of objects, refactor a large system, and turn complex requirements into simple, maintainable code are a lot harder to find. Success at writing an algorithm doesn't tell me enough about the candidate.
-- JimShore

It seems there is a major missing piece in the above. How does one evaluate the results of these "challenges"?
They're not tests, you don't get a score. They are intended as an activity used for an interviewer to interact with the candidate. The assessment is then the interviewer's. The question is: "Will this person be able to perform the tasks I want them to perform?" and the answer depends on the job at stake. There is no right and wrong, there is no single answer, there is no single evaluation that can be applied.
If the results of the test are without meaning, why do them? Why waste everyone's time doing trivial tasks if the tasks do not aid in evaluation?
[The process used in solving the task is important; the task in and of itself isn't. In my experience, people who are highly resistant to this sort of thing tend to be insecure about their skills. How do you evaluate the challenge? You evaluate how the person addresses the task. Do they rush into a sloppy solution then try and fix it? Do they analyse the problem before hand? If so, how? How do they critically evaluate the solution they have arrived at and decide it is correct (or not). How do they communicate what they are doing /have done? Do they ask you questions? If so, are they insightful? All of these things are valuable clues as to the person's capabilities.]

7. Given a function f that takes a real and returns a real, find a value x such that f(x) is zero
Do you really ask job candidates to solve an equation?
Nope, I ask them to solve a problem. I watch how they go about dissecting a problem, breaking it down, looking at it from different angles, tasting different possible solutions, and choosing among them. I watch someone work.

If the results of the test are without meaning, why do them? Why waste everyone's time doing trivial tasks if the tasks do not aid in evaluation?
If you need a checklist or numerical score, I might use something like this:
Did the applicant (yes is good, +1 point)

look interested in the problem?
ask to clarify context?
ask to clarify requirements?
explain their assumptions?
explain their reasoning?
explain what they don't know?
explain where to look for what they don't know?
consider multiple alternatives?
write code that was visually clean?
write code with correct syntax?
write code that worked?
write more than one implementation?
walk through a specific case?
walk through boundary cases?
code a test case?
code more than one test case?
code the test case first?

Did the applicant (yes is bad, -1 point)

provide a single answer?
provide an instant answer?
provide an instant, wrong answer?
appear dismayed, angry, or offended?
fail to understand the problem?
refuse to work on the problem?
explain things in an unclear manner?
use lots of buzzwords?
insist on a specific interpretation?
argue about the requirements?

I can testify that in my informed opinion the above list is irremediably flawed. It is so defective and misguided that it is not worth a proper criticism, I'll just post my opinion about it. In any case, it was contributed anonymously and without justification (as if it ought to be recognized as self-evident). --CostinCozianu
I agree with Costin. The judgement of such things as positives or negatives depends on context and must be interpreted in light of the justification for such criteria. For instance, it matters whether they answer instantly because they dislike thinking hard about new problems, or because they are already extremely familiar with the issues. Etc ad nauseum.
I also think that interviewers who use simplistic critera like that deserve some negative points of their own. Interviewing should include thinking and careful assessment on the side of the interviewer as well as that of the interviewee, not just kneejerk reactions based on unquestioned conditional reflexes/prejudices.
Also, if you ever meet a perfect candidate, who has no flaws whatsoever, the odds are that they are much better than the interviewer, who isn't always qualified to even recognize perfection if they hypothetically ran across it. It's much more often important to recognize positive qualities, than to be overly anxious to discover deal-breaker faults. -- DougMerritt

I don't think this point has been made so far. One of the reasons for asking questions in an interview (programming questions, design questions, questions about pointers to arrays of functions in C) is to verify that what is on an interviewee's CV is not a fabrication. I admit that this is a hard thing to do, but I think that finding out whether an interviewee is 'enhancing' his CV is a laudable goal. If they say they have 5 years C, and they fail most of the C tests, then you've got a potential problem. I'm not saying that you wouldn't hire them, but you may want to contemplate the wisdom of hiring someone who has 'lied' to you. -- MatthewFarwell

Design Challenges For Interview

Senior questions

Design a house.
Design a breadtoaster.
Design an API for a disk.
Design a graphics library.
Design the game called Monopoly.
Design a fixed size memory allocator.
Design a button.

What you are looking for:

The interviewed is more concerned about getting more information from you and getting all facts straight rather than presenting a solution quickly that you will make useless by giving new requirements.
The interviewed takes care about safety and that the designed solution can be used only if used properly.
The interviewed conceives a very small subset of operations that allows to implement all possible operations. He can think abstractly and explain how some operations could be easily implemented using the most basic operations.

On Monopoly
For tech interviews I do, I nicked a question I was asked at an interview. "Design some of the classes to implement Monopoly". It works really well.
People who've just "been in the room" while engineering took place tend to just "um" and "ah" a lot, and they will hold the pen, but not use it.
People who've done a lot of grunt coding rather than actual design start talking about arrays of things and integers to store where you are in the arrays and some routines to work out the rents on properties. They will pick up the pen, but they'll start trying to write code.
OO design people will start saying things like "Ohh... there's properties. Oh and some of them are ownable and some aren't. And you can build on some of the ownable ones..." and they're sketching an type tree and putting behaviours in -- probably in some variant of Booch or UML class diag.. And those people can do OO design.
I've never put a functional developer through it, so I can't say how they'd answer.
The only real grief is when someone says "What's Monopoly?", in which case you have to find another boardgame you both know.
It's nice because it's not something you can "fake" or "learn" as such, you just have to be able to do it -- and you get to have a conversation about the stuff they're doing and get into their heads about how they work.
Frankly I'm bored of going to interviews for "senior" positions where I just get asked more complicated questions about exception handling that when I end up working there I find they don't use anyway... -- KatieLucas
Challenge the interviewers. Ask if they use these techniques, and why you, as a prospective senior person, need to know about them. If you fail the interview because of that then you probably didn't want the job anyway. Certainly I, when interviewing for more senior positions, want to hear someone asking "Why" - I deliberately create the situations where they ought to.
This sounds like an excellent question to pose. But I probably would rephrase it: How would you build a software version of Monopoly? This is so open-ended, the interviewee will immediately show how he approaches the problem. Does he think in objects? Functions? High-level abstractions? Data structures? Just reproducing the board game? Or adding computer-oriented features up the wazoo? Does he seek to narrow the requirements? Or does he make assumptions about the requirements? (What architecture? What kind of UI? What sort of system? What schedule? And so forth.) But while it is open-ended, the question is also contained enough to be a real problem. In fact, most design problems we face begin life as something like this.
Some of us forgot how Monopoly works.

What's "Monopoly"?

On designing a button
Ask the candidate how he would design a button. A good senior candidate will start off by asking what the requirements of the button are. The interviewer can answer with something very simple, like a blank rectangle on the screen that responds to the space bar being pressed. A good senior candidate would suggest a simple solution that fulfills only the stated requirements. Questions can then be asked about how their solution uses the existing classes of the software they are working with and how testing is done.
The interviewer can add a new requirement, such as the ability to display a text label, respond to a mouse click or take on a disabled state, and see how the candidate alters their design and testing to accommodate the new requirement. This cycle is repeated.
This can reveal a lot about the design skills of senior candidates. How much do they assume ? Do they fulfill only the current requirements or do they over design ? How do they respond to changing requirements ? Where do they use polymorphism ? How do they use delegation ? How do they refactor their design as the requirements change ? How much of their flexibility comes from following the requirements versus fearing change ? How easy was it to alter their existing design to accommodate the new requirements ?
I feel that how the candidate works in this situation predicts how they work in general. -- RodneyRyan?
Am I out if I find a way to make EverythingIsa Button? Or, if I start on an anti-hierarchy or anti-type tirade and argue that one should be able to click on a picture or change a picture to a button (and/or still be a picture) such that most traits are combinable between all widgets rather than make them mutually exclusive? After all, what is the difference between a button, a hyperlink (text), a clickable image, and a text entry-box with an On-Focus event? Indentation animation? That should not be limited to just grey rectangles. If I want to make a text box or a picture wiggle when one clicks on it, why not? I am just to provide the mechanism as a GUI kit designer, not force conventions on customers (who usually pay a premium for flexibility). No wonder my career has gone south lately. I think too much and they are afraid that thinkers are not doers :-) --top

c++ Language standard

Language standard

In 1998, the C++ standards committee (the ISO/IEC JTC1/SC22/WG21 working group) standardized C++ and published the international standard ISO/IEC 14882:1998 (informally known as C++98).^[8] For some years after the official release of the standard, the committee processed defect reports, and published a corrected version of the C++ standard, ISO/IEC 14882:2003, in 2003. In 2005, a technical report, called the "Library Technical Report 1" (often known as TR1 for short), was released. While not an official part of the standard, it specified a number of extensions to the standard library, which were expected to be included in the next version of C++. Support for TR1 is growing in almost all currently maintained C++ compilers.

The standard for the next version of the language (known informally as C++0x) is in development.

Question 20.35

comp.lang.c FAQ list · Question 20.35

Q: What is ``Duff's Device''?

A: It's a devastatingly devious way of unrolling a loop, devised by Tom Duff while he was at Lucasfilm. In its ``classic'' form, it was used to copy bytes, and looked like this:

register n = (count + 7) / 8; /* count > 0 assumed */
 switch (count % 8)
 {
 case 0:    do { *to = *from++;
 case 7:  *to = *from++;
 case 6:  *to = *from++;
 case 5:  *to = *from++;
 case 4:  *to = *from++;
 case 3:  *to = *from++;
 case 2:  *to = *from++;
 case 1:  *to = *from++;
        } while (--n > 0);
 }

where count bytes are to be copied from the array pointed to by from to the memory location pointed to by to (which is a memory-mapped device output register, which is why to isn't incremented). It solves the problem of handling the leftover bytes (when count isn't a multiple of 8) by interleaving a switch statement with the loop which copies bytes 8 at a time. (Believe it or not, it is legal to have case labels buried within blocks nested in a switch statement like this. In his announcement of the technique to C's developers and the world, Duff noted that C's switch syntax, in particular its ``fall through'' behavior, had long been controversial, and that ``This code forms some sort of argument in that debate, but I'm not sure whether it's for or against.'')
Additional links: longer explanation

Question 20.34

comp.lang.c FAQ list · Question 20.34

Q: Here's a good puzzle: how do you write a program which produces its own source code as output?

A: It is actually quite difficult to write a self-reproducing program that is truly portable, due particularly to quoting and character set difficulties.
Here is a classic example (which ought to be presented on one line, although it will fix itself the first time it's run):

char*s="char*s=%c%s%c;main(){printf(s,34,s,34);}";
main(){printf(s,34,s,34);}

(This program has a few deficiencies, among other things neglecting to #include <stdio.h>, and assuming that the double-quote character " has the value 34, as it does in ASCII.)
Here is an improved version, posted by James Hu:

#define q(k)main(){return!puts(#k"\nq("#k")");}
q(#define q(k)main(){return!puts(#k"\nq("#k")");})

Question 20.31

comp.lang.c FAQ list · Question 20.31

Q: How can I find the day of the week given the date?

A: Here are three methods:

Use mktime or localtime (see question 13.13). Here is a code fragment which computes the day of the week for February 29, 2000:

#include <stdio.h>
#include <time.h>

char *wday[] = {"Sunday", "Monday", "Tuesday", "Wednesday",
  "Thursday", "Friday", "Saturday"};

struct tm tm;

tm.tm_mon = 2 - 1;
tm.tm_mday = 29;
tm.tm_year = 2000 - 1900;
tm.tm_hour = tm.tm_min = tm.tm_sec = 0;
tm.tm_isdst = -1;

if(mktime(&tm) != -1)
 printf("%s\n", wday[tm.tm_wday]);

When using mktime like this, it's usually important to set tm_isdst to -1, as shown (especially if tm_hour is 0), otherwise a daylight saving time correction could push the time past midnight into another day.

Use Zeller's congruence, which says that if
```
J is the number of the century [i.e. the year / 100],
 K the year within the century [i.e. the year % 100],
 m the month,
 q the day of the month,
 h the day of the week [where 1 is Sunday];
```
and if January and February are taken as months 13 and 14 of the previous year [affecting both J and K]; then h for the Gregorian calendar is the remainder when the sum
```
q + 26(m + 1) / 10 + K + K/4 + J/4 - 2J
```
is divided by 7, and where all intermediate remainders are discarded. [footnote] The translation into C is straightforward:
```
h = (q + 26 * (m + 1) / 10 + K + K/4 + J/4 + 5*J) % 7;
```
(where we use +5*J instead of -2*J to make sure that both operands of the modulus operator % are positive; this bias totalling 7*J will obviously not change the final value of h, modulo 7).

Use this elegant code by Tomohiko Sakamoto:

int dayofweek(int y, int m, int d) /* 0 = Sunday */
{
 static int t[] = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4};
 y -= m < 3;
 return (y + y/4 - y/100 + y/400 + t[m-1] + d) % 7;
}

See also questions 13.14 and 20.32.
References: ISO Sec. 7.12.2.3
Chr. Zeller, ``Kalender-Formeln''

Question 13.20

comp.lang.c FAQ list · Question 13.20

Q: How can I generate random numbers with a normal or Gaussian distribution?

A: There are a number of ways of doing this.

Exploit the Central Limit Theorem (``law of large numbers'') and add up several uniformly-distributed random numbers:
```
#include <stdlib.h>
#include <math.h>

#define NSUM 25

double gaussrand()
{
 double x = 0;
 int i;
 for(i = 0; i < NSUM; i++)
  x += (double)rand() / RAND_MAX;

 x -= NSUM / 2.0;
 x /= sqrt(NSUM / 12.0);

 return x;
}
```
(Don't overlook the sqrt(NSUM / 12.) correction, though it's easy to do so accidentally, especially when NSUM is 12.)

Use a method described by Abramowitz and Stegun:

#include <stdlib.h>
#include <math.h>

#define PI 3.141592654

double gaussrand()
{
 static double U, V;
 static int phase = 0;
 double Z;

 if(phase == 0) {
  U = (rand() + 1.) / (RAND_MAX + 2.);
  V = rand() / (RAND_MAX + 1.);
  Z = sqrt(-2 * log(U)) * sin(2 * PI * V);
 } else
  Z = sqrt(-2 * log(U)) * cos(2 * PI * V);

 phase = 1 - phase;

 return Z;
}

Use a method discussed in Knuth and due originally to Marsaglia:

#include <stdlib.h>
#include <math.h>

double gaussrand()
{
 static double V1, V2, S;
 static int phase = 0;
 double X;

 if(phase == 0) {
  do {
   double U1 = (double)rand() / RAND_MAX;
   double U2 = (double)rand() / RAND_MAX;

   V1 = 2 * U1 - 1;
   V2 = 2 * U2 - 1;
   S = V1 * V1 + V2 * V2;
   } while(S >= 1 || S == 0);

  X = V1 * sqrt(-2 * log(S) / S);
 } else
  X = V2 * sqrt(-2 * log(S) / S);

 phase = 1 - phase;

 return X;
}

These methods all generate numbers with mean 0 and standard deviation 1. (To adjust to some other distribution, multiply by the standard deviation and add the mean.) Method 1 is poor ``in the tails'' (especially if NSUM is small), but methods 2 and 3 perform quite well. See the references for more information.Additional links
References: Knuth Sec. 3.4.1 p. 117
Box and Muller, ``A Note on the Generation of Random Normal Deviates''
Marsaglia and Bray, ``A Convenient Method for Generating Normal Variables''
Abramowitz and Stegun, Handbook of Mathematical Functions
Press et al., Numerical Recipes in C Sec. 7.2 pp. 288-290

Question 20.21a

comp.lang.c FAQ list · Question 20.21a

Q: Does C have circular shift operators?

A: No. (Part of the reason why is that the sizes of C's types aren't precisely defined--see question 1.2--but a circular shift makes most sense when applied to a word of a particular known size.)
You can implement a circular shift using two regular shifts and a bitwise OR:

(x << 13) | (x >> 3) /* circular shift left 13 in 16 bits */

Question 20.17

comp.lang.c FAQ list · Question 20.17

Q: Is there a way to switch on strings?

A: Not directly. Sometimes, it's appropriate to use a separate function to map strings to integer codes, and then switch on those:

#define CODE_APPLE 1
#define CODE_ORANGE 2
#define CODE_NONE 0

switch(classifyfunc(string)) {
 case CODE_APPLE:
  ...

 case CODE_ORANGE:
  ...

 case CODE_NONE:
  ...
}

where classifyfunc looks something like

static struct lookuptab {
 char *string;
 int code;
} tab[] = {
 {"apple", CODE_APPLE},
 {"orange", CODE_ORANGE},
};

classifyfunc(char *string)
{
 int i;
 for(i = 0; i < sizeof(tab) / sizeof(tab[0]); i++)
  if(strcmp(tab[i].string, string) == 0)
   return tab[i].code;

 return CODE_NONE;
}

Otherwise, of course, you can fall back on a conventional if/else chain:

if(strcmp(string, "apple") == 0) {
  ...
 } else if(strcmp(string, "orange") == 0) {
  ...
 }

(A macro like Streq() from question 17.3 can make these comparisons a bit more convenient.)
See also questions 10.12, 20.16, 20.18, and 20.29.
References: K&R1 Sec. 3.4 p. 55
K&R2 Sec. 3.4 p. 58
ISO Sec. 6.6.4.2
H&S Sec. 8.7 p. 248

Question 20.15c

comp.lang.c FAQ list · Question 20.15c

Q: How can I swap two values without using a temporary?

A: The standard hoary old assembly language programmer's trick is:

a ^= b;
b ^= a;
a ^= b;

But this sort of code has little place in modern, HLL programming. Temporary variables are essentially free, and the idiomatic code using three assignments, namely

int t = a;
 a = b;
 b = t;

is not only clearer to the human reader, it is more likely to be recognized by the compiler and turned into the most-efficient code (e.g. perhaps even using an EXCH instruction). The latter code is obviously also amenable to use with pointers and floating-point values, unlike the XOR trick. See also questions 3.3b and 10.3.
Additional links: further reading

Question 20.15b

comp.lang.c FAQ list · Question 20.15b

Q: People claim that optimizing compilers are good and that we no longer have to write things in assembler for speed, but my compiler can't even replace i/=2with a shift.

A: Was i signed or unsigned? If it was signed, a shift is not equivalent (hint: think about the result if i is negative and odd), so the compiler was correct not to use it.

Question 20.12

comp.lang.c FAQ list · Question 20.12

Q: What is the most efficient way to count the number of bits which are set in an integer?

A: Many ``bit-fiddling'' problems like this one can be sped up and streamlined using lookup tables (but see question 20.13). Here is a little function which computes the number of bits in a value, 4 bits at a time:

static int bitcounts[] =
 {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4};

int bitcount(unsigned int u)
{
 int n = 0;

 for(; u != 0; u >>= 4)
  n += bitcounts[u & 0x0f];

 return n;
}

Question 20.11

comp.lang.c FAQ list · Question 20.11

Q: Can I use base-2 constants (something like 0b101010)?
Is there a printf format for binary?

A: No, on both counts, although there are various preprocessor tricks you can try (see the links below). You can convert base-2 string representations to integers with strtol. If you need to print numbers out in base 2, see the example code in question 20.10.
Additional links:

example by Jack Klein

preprocessor trick by Karl Heuer

prettier preprocessor trick by Bill Finke

Question 20.10

comp.lang.c FAQ list · Question 20.10

Q: How can I convert integers to binary or hexadecimal?

A: Make sure you really know what you're asking. Integers are stored internally in binary, although for most purposes it is not incorrect to think of them as being in octal, decimal, or hexadecimal, whichever is convenient. The base in which a number is expressed matters only when that number is read in from or written out to the outside world, either in the form of a source code constant or in the form of I/O performed by a program.
In source code, a non-decimal base is indicated by a leading 0 or 0x (for octal or hexadecimal, respectively). During I/O, the base of a formatted number is controlled in the printf and scanf family of functions by the choice of format specifier (%d, %o, %x, etc.) and in the strtol and strtoul functions by the third argument. During binary I/O, however, the base again becomes immaterial: if numbers are being read or written as individual bytes (typically with getc or putc), or as multi-byte words (typically with fread or fwrite), it is meaningless to ask what ``base'' they are in.
If what you need is formatted binary conversion, it's easy enough to do. Here is a little function for formatting a number in a requested base:

char *
baseconv(unsigned int num, int base)
{
 static char retbuf[33];
 char *p;

 if(base < 2 || base > 16)
  return NULL;

 p = &retbuf[sizeof(retbuf)-1];
 *p = '\0';

 do {
  *--p = "0123456789abcdef"[num % base];
  num /= base;
 } while(num != 0);

 return p;
}

(Note that this function, as written, returns a pointer to static data, such that only one of its return values can be used at a time; see question 7.5a. A better size for theretbuf array would be sizeof(int)*CHAR_BIT+1; see question 12.21.)
For more information about ``binary'' I/O, see questions 2.11, 12.37, and 12.42. See also questions 8.6 and 13.1.
Additional links: A long reply I sent to someone who was asking how to write a ``binary to decimal'' conversion function
References: ISO Secs. 7.10.1.5,7.10.1.6

Question 20.9b

comp.lang.c FAQ list · Question 20.9b

Q: How do I swap bytes?

A: V7 Unix had a swab function, but it seems to have been forgotten.
A problem with explicit byte-swapping code is that you have to decide whether to call it or not, based on the byte order of the data and the byte order of the machine in use. Question 20.9 shows how, but it's a nuisance.
A better solution is to define functions which convert between the known byte order of the data and the (unknown) byte order of the machine in use, and to arrange for these functions to be no-ops on those machines which already match the desired byte order. A set of such functions, introduced with the BSD networking code but now in wide use, is ntohs, htons, ntohl, and htonl. These are intended to convert between ``network'' and ``host'' byte orders, for ``short'' or ``long'' integers, where ``network'' order is always big-endian, and where ``short'' integers are always 16 bits and ``long'' integers are 32 bits. (This is not the C definition, of course, but it's compatible with the C definition; see question 1.1.) So if you know that the data you want to convert from or to is big-endian, you can use these functions. (The point is that you always call the functions, making your code much cleaner. Each function either swaps bytes if it has to, or does nothing. The decision to swap or not to swap gets made once, when the functions are implemented for a particular machine, rather than being made many times in many different calling programs.)
If you do have to write your own byte-swapping code, the two obvious approaches are again to use pointers or unions, as in question 20.9. Here is an example using pointers:

void byteswap(char *ptr, int nwords)
{
 char *p = ptr;
 while(nwords-- > 0) {
  char tmp = *p;
  *p = *(p + 1);
  *(p + 1) = tmp;
  p += 2;
 }
}

And here is one using unions:

union word
 {
 short int word;
 char halves[2];
 };

void byteswap(char *ptr, int nwords)
{
 register union word *wp = (union word *)ptr;
 while(nwords-- > 0) {
  char tmp = wp->halves[0];
  wp->halves[0] = wp->halves[1];
  wp->halves[1] = tmp;
  wp++;
 }
}

These functions swap two-byte quantities; the extension to four or more bytes should be obvious. The union-using code is imperfect in that it assumes that the passed-in pointer is word-aligned. It would also be possible to write functions accepting separate source and destination pointers, or accepting single words and returning the swapped values.
References: PCS Sec. 11 p. 179

Question 20.7

comp.lang.c FAQ list · Question 20.7

Q: How can I manipulate individual bits?

A: Bit manipulation is straightforward in C, and commonly done. To extract (test) a bit, use the bitwise AND (&) operator, along with a bit mask representing the bit(s) you're interested in:

value & 0x04

To set a bit, use the bitwise OR (| or |=) operator:

value |= 0x04

To clear a bit, use the bitwise complement (~) and the AND (& or &=) operators:

value &= ~0x04

(The preceding three examples all manipulate the third-least significant, or 2**2, bit, expressed as the constant bitmask 0x04.)
To manipulate an arbitrary bit, use the shift-left operator (<<) to generate the mask you need:

value & (1 << bitnumber)
 value |= (1 << bitnumber)
 value &= ~(1 << bitnumber)

Alternatively, you may wish to precompute an array of masks:

unsigned int masks[] =
  {0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80};

 value & masks[bitnumber]
 value |= masks[bitnumber]
 value &= ~masks[bitnumber]

To avoid surprises involving the sign bit, it is often a good idea to use unsigned integral types in code which manipulates bits and bytes.
See also questions 9.2 and 20.8.
References: K&R1 Sec. 2.9 pp. 44-45
K&R2 Sec. 2.9 pp. 48-49
ISO Sec. 6.3.3.3, Sec. 6.3.7, Sec. 6.3.10, Sec. 6.3.12
H&S Sec. 7.5.5 p. 197, Sec. 7.6.3 pp. 205-6, Sec. 7.6.6 p. 210

Question 20.6

comp.lang.c FAQ list · Question 20.6

Q: If I have a char * variable pointing to the name of a function, how can I call that function? Code like

extern int func(int, int);
 char *funcname = "func";
 int r = (*funcname)(1, 2);

r = (*(int (*)(int, int))funcname)(1, 2);

doesn't seem to work.

A: By the time a program is running, information about the names of its functions and variables (the ``symbol table'') is no longer needed, and may therefore not be available. The most straightforward thing to do, therefore, is to maintain that information yourself, with a correspondence table of names and function pointers:

int one_func(), two_func();
int red_func(), blue_func();

struct { char *name; int (*funcptr)(); } symtab[] = {
 "one_func", one_func,
 "two_func", two_func,
 "red_func", red_func,
 "blue_func", blue_func,
};

Then, search the table for the name, and call via the associated function pointer, with code like this:

#include <stddef.h>

int (*findfunc(char *name))()
{
 int i;

 for(i = 0; i < sizeof(symtab) / sizeof(symtab[0]); i++) {
  if(strcmp(name, symtab[i].name) == 0)
   return symtab[i].funcptr;
  }

 return NULL;
}

...

 char *funcname = "one_func";
 int (*funcp)() = findfunc(funcname);
 if(funcp != NULL)
  (*funcp)();

The callable functions should all have compatible argument and return types. (Ideally, the function pointers would also specify the argument types.)
It is sometimes possible for a program to read its own symbol table if it is still present, but it must first be able to find its own executable (see question 19.31), and it must know how to interpret the symbol table (some Unix C libraries provide an nlist function for this purpose). See also questions 2.15, 18.14, and 19.36.
References: PCS Sec. 11 p. 168

Question 20.5

comp.lang.c FAQ list · Question 20.5

Q: How can I write data files which can be read on other machines with different word size, byte order, or floating point formats?

A: The most portable solution is to use text files (usually ASCII), written with fprintf and read with fscanf or the like. (Similar advice also applies to network protocols.) Be skeptical of arguments which imply that text files are too big, or that reading and writing them is too slow. Not only is their efficiency frequently acceptable in practice, but the advantages of being able to interchange them easily between machines, and manipulate them with standard tools, can be overwhelming.
If you must use a binary format, you can improve portability, and perhaps take advantage of prewritten I/O libraries, by making use of standardized formats such as Sun's XDR (RFC 1014), OSI's ASN.1 (referenced in CCITT X.409 and ISO 8825 ``Basic Encoding Rules''), CDF, netCDF, or HDF. See also questions 2.12,12.38, and 12.42.
References: PCS Sec. 6 pp. 86, 88

Question 20.3

comp.lang.c FAQ list · Question 20.3

Q: How can I open files mentioned on the command line, and parse option flags?

A: Here is a skeleton which implements a traditional Unix-style argv parse, handling option flags beginning with -, and optional filenames. (The two flags accepted by this example are -a and -b; -b takes an argument.)

#include <stdio.h>
#include <string.h>
#include <errno.h>

main(int argc, char *argv[])
{
 int argi;
 int aflag = 0;
 char *bval = NULL;

 for(argi = 1; argi < argc && argv[argi][0] == '-'; argi++) {
  char *p;
  for(p = &argv[argi][1]; *p != '\0'; p++) {
   switch(*p) {
   case 'a':
    aflag = 1;
    printf("-a seen\n");
    break;

   case 'b':
    bval = argv[++argi];
    printf("-b seen (\"%s\")\n", bval);
    break;

   default:
    fprintf(stderr,
     "unknown option -%c\n", *p);
   }
  }
 }

 if(argi >= argc) {
  /* no filename arguments; process stdin */
  printf("processing standard input\n");
 } else {
  /* process filename arguments */

  for(; argi < argc; argi++) {
   FILE *ifp = fopen(argv[argi], "r");
   if(ifp == NULL) {
    fprintf(stderr, "can't open %s: %s\n",
     argv[argi], strerror(errno));
    continue;
   }

   printf("processing %s\n", argv[argi]);

   fclose(ifp);
  }
 }

 return 0;
}

(This code assumes that fopen sets errno when it fails, which is not guaranteed, but usually works, and makes error messages much more useful. See also question20.4.)
There are several canned functions available for doing command line parsing in a standard way; the most popular one is getopt (see also question 18.16). Here is the above example, rewritten to use getopt:

extern char *optarg;
extern int optind;

main(int argc, char *argv[])
{
 int aflag = 0;
 char *bval = NULL;
 int c;

 while((c = getopt(argc, argv, "ab:")) != -1)
  switch(c) {
  case 'a':
   aflag = 1;
   printf("-a seen\n");
   break;

  case 'b':
   bval = optarg;
   printf("-b seen (\"%s\")\n", bval);
   break;
 }

 if(optind >= argc) {
  /* no filename arguments; process stdin */
  printf("processing standard input\n");
 } else {
  /* process filename arguments */

  for(; optind < argc; optind++) {
   FILE *ifp = fopen(argv[optind], "r");
   if(ifp == NULL) {
    fprintf(stderr, "can't open %s: %s\n",
     argv[optind], strerror(errno));
    continue;
   }

   printf("processing %s\n", argv[optind]);

   fclose(ifp);
  }
 }

 return 0;
}

The examples above overlook a number of nuances: a lone ``-'' is often taken to mean ``read standard input''; the marker ``--'' often signifies the end of the options (proper versions of getopt do handle this); it's traditional to print a usage message when a command is invoked with improper or missing arguments.
If you're wondering how argv is laid out in memory, it's actually a ``ragged array''; see the picture in question 20.2. See also questions 8.2, 13.7, and 19.20.
References: K&R1 Sec. 5.11 pp. 110-114
K&R2 Sec. 5.10 pp. 114-118
ISO Sec. 5.1.2.2.1
H&S Sec. 20.1 p. 416
PCS Sec. 5.6 pp. 81-2, Sec. 11 p. 159, pp. 339-40 Appendix F
Schumacher, ed., Software Solutions in C Sec. 4 pp. 75-85

Question 7.30

comp.lang.c FAQ list · Question 7.30

Q: Is it legal to pass a null pointer as the first argument to realloc? Why would you want to?

A: ANSI C sanctions this usage (and the related realloc(..., 0), which frees), although several earlier implementations do not support it, so it may not be fully portable. Passing an initially-null pointer to realloc can make it easier to write a self-starting incremental allocation algorithm.
Here is an example--this function reads an arbitrarily-long line into dynamically-allocated memory, reallocating the input buffer as necessary. (The caller must freethe returned pointer when it is no longer needed.)

#include <stdio.h>
#include <stdlib.h>

/* read a line from fp into malloc'ed memory */
/* returns NULL on EOF or error */
/* (use feof or ferror to distinguish) */

char *agetline(FILE *fp)
{
 char *retbuf = NULL;
 size_t nchmax = 0;
 register int c;
 size_t nchread = 0;
 char *newbuf;

 while((c = getc(fp)) != EOF) {
  if(nchread >= nchmax) {
   nchmax += 20;
   if(nchread >= nchmax) { /* in case nchmax overflowed */
    free(retbuf);
    return NULL;
   }
#ifdef SAFEREALLOC
   newbuf = realloc(retbuf, nchmax + 1);
#else
   if(retbuf == NULL) /* in case pre-ANSI realloc */
    newbuf = malloc(nchmax + 1);
   else newbuf = realloc(retbuf, nchmax + 1);
#endif
      /* +1 for \0 */
   if(newbuf == NULL) {
    free(retbuf);
    return NULL;
   }

   retbuf = newbuf;
  }

  if(c == '\n')
   break;

  retbuf[nchread++] = c;
 }

 if(retbuf != NULL) {
  retbuf[nchread] = '\0';

  newbuf = realloc(retbuf, nchread + 1);
  if(newbuf != NULL)
   retbuf = newbuf;
 }

 return retbuf;
}

(In production code, a line like nchmax += 20 can prove troublesome, as the function may do lots of reallocating. Many programmers favor multiplicative reallocation, e.g. nchmax *= 2, although it obviously isn't quite as self-starting, and can run into problems if it has to allocate a huge array but memory is limited.)
References: ISO Sec. 7.10.3.4
H&S Sec. 16.3 p. 388

Wednesday, November 10, 2010

Question 15.4

comp.lang.c FAQ list · Question 15.4

Q: How can I write a function that takes a variable number of arguments?

A: Use the facilities of the <stdarg.h> header.
Here is a function which concatenates an arbitrary number of strings into malloc'ed memory:

#include <stdlib.h>  /* for malloc, NULL, size_t */
#include <stdarg.h>  /* for va_ stuff */
#include <string.h>  /* for strcat et al. */

char *vstrcat(const char *first, ...)
{
 size_t len;
 char *retbuf;
 va_list argp;
 char *p;

 if(first == NULL)
  return NULL;

 len = strlen(first);

 va_start(argp, first);

 while((p = va_arg(argp, char *)) != NULL)
  len += strlen(p);

 va_end(argp);

 retbuf = malloc(len + 1); /* +1 for trailing \0 */

 if(retbuf == NULL)
  return NULL;  /* error */

 (void)strcpy(retbuf, first);

 va_start(argp, first);  /* restart; 2nd scan */

 while((p = va_arg(argp, char *)) != NULL)
  (void)strcat(retbuf, p);

 va_end(argp);

 return retbuf;
}

(Note that a second call to va_start is needed to re-start the scan when the argument list is processed a second time. Note the calls to va_end: they're important for portability, even if they don't seem to do anything.)A call to vstrcat looks something like

char *str = vstrcat("Hello, ", "world!", (char *)NULL);

Note the cast on the last argument; see questions 5.2 and 15.3. (Also note that the caller must free the returned, malloc'ed storage.)
vstrcat accepts a variable number of arguments, all of type char *. Here is an example which accepts a variable number of arguments of different types; it is a stripped-down version of the familiar printf function. Note that each invocation of va_arg() specifies the type of the argument being retrieved from the argument list.
(The miniprintf function here uses baseconv from question 20.10 to format numbers. It is significantly imperfect in that it will not usually be able to print the smallest integer, INT_MIN, properly.)

#include <stdio.h>
#include <stdarg.h>
#ifdef MAIN

void miniprintf(const char *, ...);

main()
{
 miniprintf("Hello, world!\n");
 miniprintf("%c %d %s\n", '1', 2, "three");
 miniprintf("%o %d %x\n", 10, 10, 10);
 miniprintf("%u\n", 0xffff);
 return 0;
}

#endif

extern char *baseconv(unsigned int, int);

void
miniprintf(const char *fmt, ...)
{
 const char *p;
 int i;
 unsigned u;
 char *s;
 va_list argp;

 va_start(argp, fmt);

 for(p = fmt; *p != '\0'; p++) {
  if(*p != '%') {
   putchar(*p);
   continue;
  }

  switch(*++p) {
  case 'c':
   i = va_arg(argp, int);
   /* *not* va_arg(argp, char); see Q 15.10 */
   putchar(i);
   break;

  case 'd':
   i = va_arg(argp, int);
   if(i < 0) {
    /* XXX won't handle INT_MIN */
    i = -i;
    putchar('-');
   }
   fputs(baseconv(i, 10), stdout);
   break;

  case 'o':
   u = va_arg(argp, unsigned int);
   fputs(baseconv(u, 8), stdout);
   break;

  case 's':
   s = va_arg(argp, char *);
   fputs(s, stdout);
   break;

  case 'u':
   u = va_arg(argp, unsigned int);
   fputs(baseconv(u, 10), stdout);
   break;

  case 'x':
   u = va_arg(argp, unsigned int);
   fputs(baseconv(u, 16), stdout);
   break;

  case '%':
   putchar('%');
   break;
  }
 }

 va_end(argp);
}

See also question 15.7.
References: K&R2 Sec. 7.3 p. 155, Sec. B7 p. 254
ISO Sec. 7.8
Rationale Sec. 4.8
H&S Sec. 11.4 pp. 296-9
CT&P Sec. A.3 pp. 139-141
PCS Sec. 11 pp. 184-5, Sec. 13 p. 242

Question 19.25

comp.lang.c FAQ list · Question 19.25

Q: How can I access memory (a memory-mapped device, or graphics memory) located at a certain address?
How can I do PEEK and POKE in C?

A: Set a pointer, of the appropriate type, to the right number (using an explicit cast to assure the compiler that you really do intend this nonportable conversion):

unsigned int *magicloc = (unsigned int *)0x12345678;

Then, *magicloc refers to the location you want. [footnote] If the location is a memory-mapped I/O register, you will probably also want to use the volatilequalifier: ``volatile unsigned int *magicloc''. (If you want to refer to a byte at a certain address rather than a word, use unsigned char *.)
Under MS-DOS, you may find a macro like MK_FP() handy for working with segments and offsets. As suggested by Gary Blaine, you can also declare tricky array pointers which allow you to access screen memory using array notation. For example, on an MS-DOS machine in an 80x25 text mode, given the declaration

unsigned short (far * videomem)[80] =
  (unsigned short (far *)[80])0xb8000000;

you can access the character and attribute byte at row i, column j with videomem[i][j].
Many operating systems execute user-mode programs in a protected mode where direct access to I/O devices (or to any address outside the running process) is simply not possible. In such cases you will have to ask the operating system to carry out I/O operations for you.
See also questions 4.14 and 5.19.
References: K&R1 Sec. A14.4 p. 210
K&R2 Sec. A6.6 p. 199
ISO Sec. 6.3.4
Rationale Sec. 3.3.4
H&S Sec. 6.2.7 pp. 171-2

Question 19.9

comp.lang.c FAQ list · Question 19.9

Q: How do I send escape sequences to control a terminal or other device?

A: If you can figure out how to send characters to the device at all (see question 19.8), it's easy enough to send escape sequences. In ASCII, the ESC code is 033 (27 decimal), so code like

fprintf(ofd, "\033[J");

sends the sequence ESC [ J .
Some programmers prefer to parameterize the ESC code, like this:

#define ESC 033

 fprintf(ofd, "%c[J", ESC);

Question 19.5

comp.lang.c FAQ list · Question 19.5

Q: How do I read the arrow keys? What about function keys?

A: Terminfo, some versions of termcap, and some versions of curses have support for these non-ASCII keys. Typically, a special key sends a multicharacter sequence (usually beginning with ESC, '\033'); parsing these can be tricky. (curses will do the parsing for you, if you call keypad first.)
Under MS-DOS, if you receive a character with value 0 (not '0'!) while reading the keyboard, it's a flag indicating that the next character read will be a code indicating a special key. See any DOS programming guide for lists of keyboard scan codes. (Very briefly: the up, left, right, and down arrow keys are 72, 75, 77, and 80, and the function keys are 59 through 68.)
References: PCS Sec. 5.1.4 pp. 56-7

Question 19.4

comp.lang.c FAQ list · Question 19.4

Q: How can I clear the screen?
How can I print text in color?
How can I move the cursor to a specific x, y position?

A: Such things depend on the terminal type (or display) you're using. You will have to use a library such as termcap, terminfo, or curses, or some system-specific routines, to perform these operations.
Functions in the curses library to look for are clear, move, standout/standend, and attron/attroff/attrset; the last three work with attribute codes such as A_REVERSE. In MS-DOS libraries, there are typically functions named gotoxy and clrscr or _clearscreen; you can also use the ANSI.SYS driver or low-level interrupts. Under termcap or terminfo, use tgetstr to retrieve strings like cl, so/se, and cm for clear screen, standout mode, and cursor motion respectively, then output the strings; using cm additionally requires calling tgoto. Some baroque terminals require attention to other ``capabilities'' as well; study the documentation carefully. Be aware that some older terminals may not support the desired capabilities at all.
Most modern terminal emulation schemes support the ANSI escape sequences for cursor motion and visual attributes, so if you're willing to sacrifice portability, you can print those sequences directly. Here is a tiny example to whet your appetite:

printf("\033[2J");  /* clear screen */
printf("\033[%d;%dH", 10, 20); /* move cursor (row 10, col 20) */
printf("Hello, ");
printf("\033[7mworld\033[0m!"); /* inverse video */

Here is a link to some more explanation, and brief lists of codes. The portable way of emitting these sequences (if you're not going whole-hog and using curses) is to use termcap or terminfo; here is an example.
For clearing the screen, a halfway portable solution is to print a form-feed character ('\f'), which will cause some displays to clear. Even more portable (albeit even more gunky) might be to print enough newlines to scroll everything away (although of course this leaves the cursor at the bottom of the screen, not the top). As a last resort, you could use system (see question 19.27) to invoke an operating system clear-screen command.
References: PCS Sec. 5.1.4 pp. 54-60, Sec. 5.1.5 pp. 60-62
Strang, Programming with curses
Strang, Mui, and O'Reilly, termcap & terminfo

Question 19.3

comp.lang.c FAQ list · Question 19.3

Q: How can I display a percentage-done indication that updates itself in place, or show one of those ``twirling baton'' progress indicators?

A: These simple things, at least, you can do fairly portably. Printing the character '\r' will usually give you a carriage return without a line feed, so that you can overwrite the current line. The character '\b' is a backspace, and will usually move the cursor one position to the left.
Using these characters, you can print a percentage-done indicator:

for(i = 0; i < lotsa; i++) {
  printf("\r%3d%%", (int)(100L * i / lotsa));
  fflush(stdout);
  do_timeconsuming_work();
 }
 printf("\ndone.\n");

or a baton:

printf("working: ");
 for(i = 0; i < lotsa; i++) {
  printf("%c\b", "|/-\\"[i%4]);
  fflush(stdout);
  do_timeconsuming_work();
 }
 printf("done.\n");

See also question 12.4.
References: ISO Sec. 5.2.2

Question 20.40

comp.lang.c FAQ list · Question 20.40

Q: Where can I get extra copies of this list?

A: An up-to-date copy may be obtained from ftp.eskimo.com in directory u/s/scs/C-faq/. You can also just pull it off the net; it is normally posted to comp.lang.c on the first of each month, with an Expires: line which should keep it around all month. A parallel, abridged version is available (and posted), as is a list of changes accompanying each significantly updated version.
The various versions of this list are also posted to the newsgroups comp.answers and news.answers. Several sites archive news.answers postings and other FAQ lists, including this one; two sites are rtfm.mit.edu (directories pub/usenet/news.answers/C-faq/ and pub/usenet/comp.lang.c/) and ftp.uu.net (directoryusenet/news.answers/C-faq/). If you don't have ftp access, a mailserver at rtfm.mit.edu can mail you FAQ lists: send a message containing the single word ``help'' to mail-server@rtfm.mit.edu for more information. See the meta-FAQ list in news.answers for more information.
An extended version of this FAQ list has been published by Addison-Wesley as C Programming FAQs: Frequently Asked Questions (ISBN 0-201-84519-9). An errata list is at http://www.eskimo.com/~scs/C-faq/book/Errata.html and on ftp.eskimo.com in u/s/scs/ftp/C-faq/book/Errata .

Question 18.15d

comp.lang.c FAQ list · Question 18.15d

Q: I need code for performing multiple precision arithmetic.

A: Some popular packages are the ``quad'' functions within the BSD Unix libc sources (ftp.uu.net, /systems/unix/bsd-sources/.../src/lib/libc/quad/*), the GNU MP library ``libmp'', the MIRACL package (see http://indigo.ie/~mscott/), the ``calc'' program by David Bell and Landon Curt Noll, and the old Unix libmp.a. See also questions 14.12 and 18.16.
References: Schumacher, ed., Software Solutions in C Sec. 17 pp. 343-454

Question 18.15

comp.lang.c FAQ list · Question 18.15

Q: Where can I get a BNF or YACC grammar for C?

A: The definitive grammar is of course the one in the ANSI standard; see question 11.2. Another grammar by Jim Roskind is available at ftp.eskimo.com inu/s/scs/roskind_grammar.Z. A fleshed-out, working instance of the ANSI C90 grammar (due to Jeff Lee) is on ftp.uu.net (see question 18.16) in usenet/net.sources/ansi.c.grammar.Z (including a companion lexer). [footnote] The FSF's GNU C compiler contains a grammar, as does the appendix to K&R2.
The comp.compilers archives contain more information about grammars; see question 18.3.
References: K&R1 Sec. A18 pp. 214-219
K&R2 Sec. A13 pp. 234-239
ISO Sec. B.2
H&S pp. 423-435 Appendix B

Question 18.14

comp.lang.c FAQ list · Question 18.14

Q: I need code to parse and evaluate expressions.

A: Two available packages are ``defunc,'' posted to comp.sources.misc in December, 1993 (V41 i32,33), to alt.sources in January, 1994, and available from sunsite.unc.edu in pub/packages/development/libraries/defunc-1.3.tar.Z, and ``parse,'' at lamont.ldgo.columbia.edu. Other options include the S-Lang interpreter, available via anonymous ftp from amy.tch.harvard.edu in pub/slang, and the shareware Cmm (``C-minus-minus'' or ``C minus the hard stuff''). See also questions 18.16and 20.6.
There is also some parsing/evaluation code in Software Solutions in C (chapter 12, pp. 235-55).

Question 18.2

comp.lang.c FAQ list · Question 18.2

Q: How can I track down these pesky malloc problems?

A: A number of debugging packages exist to help track down malloc problems; one popular one is Conor P. Cahill's ``dbmalloc'', posted to comp.sources.misc in 1992, volume 32. Others are ``leak'', available in volume 27 of the comp.sources.unix archives; JMalloc.c and JMalloc.h in the ``Snippets'' collection; MEMDEBUG from ftp.crpht.lu in pub/sources/memdebug ; and Electric Fence. See also question 18.16.
A number of commercial debugging tools exist, and can be invaluable in tracking down malloc-related and other stubborn problems:

CodeCenter (formerly Saber-C) from Centerline Software.
Insight (now Insure?), from ParaSoft Corporation.
Purify, from Rational Software (formerly Pure Software, now part of IBM).
ZeroFault, from The ZeroFault Group.

Question 18.1

comp.lang.c FAQ list · Question 18.1

Q: I need some C development tools.

A: Here is a crude list of some which are available.

a C cross-reference generator: cflow, cxref, calls, cscope, xscope, or ixfw
a C beautifier/pretty-printer: cb, indent, GNU indent, or vgrind
a revision control or configuration management tool: CVS, RCS, or SCCS
a C source obfuscator (shrouder): obfus, shroud, or opqcp
a ``make'' dependency generator: makedepend, or try cc -M or cpp -M
tools to compute code metrics: ccount, Metre, lcount, or csize; there is also a package sold by McCabe and Associates
a C lines-of-source counter: this can be done very crudely with the standard Unix utility wc, and somewhat better with grep -c ";"
a C declaration aid (cdecl): check volume 14 of comp.sources.unix (see question 18.16) and K&R2
a prototype generator: see question 11.31
a tool to track down malloc problems: see question 18.2
a ``selective'' C preprocessor: see question 10.18
language translation tools: see questions 11.31 and 20.26
C verifiers (lint): see question 18.7
a C compiler!: see question 18.3

(This list of tools is by no means complete; if you know of tools not mentioned, you're welcome to contact this list's maintainer.)
Other lists of tools, and discussion about them, can be found in the Usenet newsgroups comp.compilers and comp.software-eng.
See also questions 18.3 and 18.16.

Question 13.6

comp.lang.c FAQ list · Question 13.6

Q: How can I split up a string into whitespace-separated fields?
How can I duplicate the process by which main() is handed argc and argv?

A: The only Standard function available for this kind of ``tokenizing'' is strtok, although it can be tricky to use [footnote] and it may not do everything you want it to. (For instance, it does not handle quoting.) Here is a usage example, which simply prints each field as it's extracted:

#include <stdio.h>
#include <string.h>
char string[] = "this is a test"; /* not char *; see Q 16.6 */
char *p;
for(p = strtok(string, " \t\n"); p != NULL;
   p = strtok(NULL, " \t\n"))
 printf("\"%s\"\n", p);

As an alternative, here is a routine I use for building an argv all at once:

#include <ctype.h>

int makeargv(char *string, char *argv[], int argvsize)
{
 char *p = string;
 int  i;
 int argc = 0;

 for(i = 0; i < argvsize; i++) {
  /* skip leading whitespace */
  while(isspace(*p))
   p++;

  if(*p != '\0')
   argv[argc++] = p;
  else {
   argv[argc] = 0;
   break;
  }

  /* scan over arg */
  while(*p != '\0' && !isspace(*p))
   p++;
  /* terminate arg: */
  if(*p != '\0' && i < argvsize-1)
   *p++ = '\0';
 }

 return argc;
}

Calling makeargv is straightforward:

char *av[10];
 int i, ac = makeargv(string, av, 10);
 for(i = 0; i < ac; i++)
  printf("\"%s\"\n", av[i]);

If you want each separator character to be significant, for instance if you want two tabs in a row to indicate an omitted field, it's probably more straightforward to use strchr:

#include <stdio.h>
#include <string.h>

char string[] = "this\thas\t\tmissing\tfield";
char *p = string;

while(1) {  /* break in middle */
 char *p2 = strchr(p, '\t');
 if(p2 != NULL)
  *p2 = '\0';
 printf("\"%s\"\n", p);
 if(p2 == NULL)
  break;
 p = p2 + 1;
}

All the code fragments presented here modify the input string, by inserting \0's to terminate each field (meaning that the string must be writable; see question 1.32). If you'll need the original string later, make a copy before breaking it up.
References: K&R2 Sec. B3 p. 250
ISO Sec. 7.11.5.8
H&S Sec. 13.7 pp. 333-4
PCS p. 178

Question 13.3

comp.lang.c FAQ list · Question 13.3

Q: Does C have anything like the ``substr'' (extract substring) routine present in other languages?

A: Not as such. (One reason it doesn't is that, as mentioned in question 7.2 and section 8, C has no managed string type.)
To extract a substring of length LEN starting at index POS in a source string, use something like

char dest[LEN+1];
 strncpy(dest, &source[POS], LEN);
 dest[LEN] = '\0'; /* ensure \0 termination */

or, using the trick from question 13.2,

char dest[LEN+1] = "";
 strncat(dest, &source[POS], LEN);

or, making use of pointer instead of array notation,

strncat(dest, source + POS, LEN);

(The expression source + POS is, by definition, identical to &source[POS] --see also section 6.)

Tuesday, November 9, 2010

Question 12.39

comp.lang.c FAQ list · Question 12.39

Q: I'm writing a ``filter'' for binary files, but stdin and stdout are preopened as text streams. How can I change their mode to binary?

A: There is no standard way to do this. On Unix-like systems, there is no text/binary distinction, so there is no need to change the mode. Some MS-DOS compilers supply a setmode call. Otherwise, you're on your own.

Question 12.36b

comp.lang.c FAQ list · Question 12.36b

Q: How can I arrange to have output go two places at once, e.g. to the screen and to a file?

A: You can't do this directly, but you could write your own printf variant which printed everything twice. Here is a sample logprintf function which prints to both stdout and a preopened log file:

#include <stdio.h>
#include <stdarg.h>

extern FILE *logfp;

void logprintf(char *fmt, ...)
{
 va_list argp;
 va_start(argp, fmt);
 vfprintf(stdout, fmt, argp);
 va_end(argp);
 va_start(argp, fmt);
 vfprintf(logfp, fmt, argp);
 va_end(argp);
}

Now, whenever you call logprintf (which you can call with format strings just like printf), it prints both to stdout and to logfp, which you have presumably opened to your desired log file. Another way to arrange this would be

void f2printf(FILE *fp1, FILE *fp2, char *fmt, ...)
{
 va_list argp;
 va_start(argp, fmt); vfprintf(fp1, fmt, argp); va_end(argp);
 va_start(argp, fmt); vfprintf(fp2, fmt, argp); va_end(argp);
}

where f2printf is just like fprintf except that you give it two file pointers (e.g. stdout and logfp) and it prints to both of them.
Both of these techniques obviously require you to use explicit calls to logprintf or f2printf. There is no known way in Standard C to arrange implicitly (i.e. via some call analogous to freopen) that one stream, which you print to once with a normal call like fprintf, print to two places at once. [footnote]
See also question 15.5.

Question 12.35

comp.lang.c FAQ list · Question 12.35

Q: How can I tell if standard input or output is redirected (i.e. whether ``<'' or ``>'' was used on the invocation command line)?

A: You can't tell directly, but you can usually look at a few other things to make whatever decision you need to. If you want your program to take input fromstdin when not given any input files, you can do so if argv doesn't mention any input files (see question 20.3), or perhaps if you're given a placeholder like ``-'' instead of a filename. If you want to suppress prompts if input is not coming from an interactive terminal, on some systems (e.g. Unix, and usually MS-DOS) you can useisatty(0) or isatty(fileno(stdin)) to make the determination.

Question 19.30

comp.lang.c FAQ list · Question 19.30

Q: How can I invoke another program or command and trap its output?

A: Unix and some other systems provide a popen function, which sets up a stdio stream on a pipe connected to the process running a command, so that the calling program can read the output (or alternatively supply the input). Using popen, the last example from question 19.28 would look like

extern FILE *popen();

 sprintf(cmdbuf, "sort < %s", datafile);

 fp = popen(cmdbuf, "r");

 /* ...now read sorted data from fp... */

 pclose(fp);

(Do be sure to call pclose, as shown; leaving it out will seem to work at first but may eventually run you out of processes or file descriptors.)
If you can't use popen, you may be able to use system, with the output going to a file which you then open and read, as the code in question 19.28 was doing already. [footnote]
If you're using Unix and popen isn't sufficient, you can learn about pipe, dup, fork, and exec.
(One thing that probably would not work, by the way, would be to use freopen.)
References: PCS Sec. 11 p. 169

Question 12.34

comp.lang.c FAQ list · Question 12.34

Q: Once I've used freopen, how can I get the original stdout (or stdin) back?

A: There isn't a good way. If you need to switch back, the best solution is not to have used freopen in the first place. Try using your own explicit output (or input) stream variable, which you can reassign at will, while leaving the original stdout (or stdin) undisturbed. For example, declare a global

FILE *ofp;

and replace all calls to printf( ... ) with fprintf(ofp, ... ). (Obviously, you'll have to check for calls to putchar and puts, too.) Then you can set ofp tostdout or to anything else.
You might wonder if you could skip freopen entirely, and do something like

FILE *savestdout = stdout;
 stdout = fopen(file, "w"); /* WRONG */

leaving yourself able to restore stdout later by doing

stdout = savestdout;  /* WRONG */

but code like this is not likely to work, because stdout (and stdin and stderr) are typically constants which cannot be reassigned (which is why freopen exists in the first place).
It may be possible, in a nonportable way, to save away information about a stream before calling freopen to open some file in its place, such that the original stream can later be restored. The most straightforward and reliable way is to manipulate the underlying file descriptors using a system-specific call such as dup or dup2, if available (example). Another is to copy or inspect the contents of the FILE structure, but this is exceedingly nonportable and unreliable.
Under some systems, you might be able to reopen a special device file (such as /dev/fd/1 under modern versions of Unix) which is still attached to (for example) the original standard output. You can, under some systems, explicitly re-open the controlling terminal (see question 12.36), but this isn't necessarily what you want, since the original input or output (i.e. what stdin or stdout had been before you called freopen) could have been redirected from the command line.
All of this pertains to stdio redirection initially performed by freopen, affecting the I/O calls within the same program that called freopen. If what you're trying to do is capture the result of a subprogram execution, freopen probably won't work anyway; see question 19.30 instead.
Additional links: examples