Audience Dialogue

Know Your Audience: chapter 2, part C
Sampling: choosing respondents

12. Sampling inside clusters

When the starting point of a cluster has been chosen, how is the rest of the cluster then decided? The interviewer finds at the randomly selected address, then follows a set of rules to work out which addresses will be chosen for interviews. The important thing is to have some rules. Don't let the interviewer choose - because the houses where interviewers prefer to go are not typical. Interviewers always prefer to visit rich homes rather than poor ones, homes where somebody is there, and homes that are easy to reach.

Interviewers also prefer to talk to people who are similar to them. In developing countries, interviewers are usually well educated, and don't like speaking to people they see as ignorant. This causes a bias in many surveys, making it appear that populations are better educated and wealthier than in fact they are. Although cluster sequence rules are arbitrary, they must be followed.

In each cluster, the address selected at random is usually not surveyed. This may seem strange, but the chances are that the population list from which that address was drawn is incomplete. Excluding the address actually selected partially compensates for the under-representation of other addresses on the population list.

A common set of rules for making a cluster is:

(1) Find the address selected at random.

(2) Going around the street block anti-clockwise, ignore the address next door to the one selected at random. Make your first call at the next address to that: two to the right of it, when looking from across the street - or two higher in number, if there are a number of dwellings at one address.

(3) Continue to call at every second address, going anti-clockwise around the block. (Turn left at every street corner.)

(4) If you get right around the block without having located enough addresses to make up the cluster, cross the road outside the address originally selected, and start to go around the neighbouring block, again anti-clockwise, again taking every second dwelling.

(5) If you run out of houses, and there’s a section of road where nobody lives (for at least 1 kilometre), cross the road and come back along the other side.

(6) What do you do if you run out of households? This seldom happens, but it needs to be anticipated. A simple solution is to extend the next cluster by the number of households missing. For example, if the plan is for 12 households in a cluster, but one cluster only has 10, the closest unfinished cluster should have another 2 added.

cluster map

In this map, the starting point (marked Start) is just before the dwelling marked 1. Every second dwelling is ignored. Those marked x were selected, but did not result in interviews (due to refusals, etc.) and had to be replaced. Note the route taken when the interviewers went around the block and reached the starting point again: they crossed the road, turned around, and kept going in the opposite direction - still turning left whenever necessary.

Why every second household, and not every household? (This is called a skip interval of 2 households.) Mainly because neighbouring households tend to be more similar to each other. So using a skip interval brings more variety into the cluster, while still keeping it reasonably compact. The fewer households in a cluster, the more addresses should be skipped. But when a cluster includes more than about 50 dwellings, including skipped ones, it becomes too large (specially in rural areas), and some of the cost savings disappear.

Why go around the block, and not continue in a straight line? Because this would favour households living on main roads - which are often richer than those living in side streets. Where everybody lives on a long road (as they do in some parts of the world) there are no street blocks: observing the above rules, the route will simply follow the road.

Why keep turning left, and not right? This is completely arbitrary; it's just a convention. Change it, if you like - but don't give interviewers a choice in each cluster.

Instructions to interviewers should make it clear exactly what you mean by cluster size. What happens when a household is visited and nobody is home, or the occupants refuse to take part in the survey? Are these households counted in the cluster, or not?

The simplest solution is to keep going, adding more households to the end of the route, until interviews have been done at the required number of households. However, substitutes are not usually taken until a dwelling has been visited at least three times, in an attempt to find somebody home.

13. How many respondents in each household?

Another factor to take into account when designing a sample for a door-to-door survey is the number of respondents in each participating household.

For a personal-interview survey, when each respondent is questioned directly by the interviewer, it’s easiest to interview only one person per household. If, as is common, others are present during an interview, those who have already heard all the questions may give different answers from the initial respondent. If most of the questions relate to facts which would be known by anybody in the household (e.g. "how many television sets are at this address?") having extra people present may produce more accurate results. But for questions asking about personal attitudes, it is best not to have anybody else present, so that the selected respondent will feel free to give his or her true opinion.

An exception to interviewing only one person occurs when the focus of the survey is on something that is not particularly common. A survey of computer users, for example, may begin with the interviewer asking "Does anybody in this household use a computer?" and interviewing all computer users, if the household had more than one.

A estimation problem which occurs when only one person in a household is interviewed: people in small households will be over-represented. Among all households contacted for a survey, people living alone will have a 100% probability of being interviewed. But in a household with four eligible persons, each of these people will have only a 25% chance.

In Australia, about 10% of adults live alone, but these make up 20% of all households, and their media use habits are quite different from those of larger households. In developing countries, which generally have more people per household, single-person households are rarer, so it will not distort results so much to interview one person in each household. The easiest way to compensate for an excess of small households in the survey is for the interviewer to find out how many adults live in each household visited. Then multiple interviews can be made at larger households.

By "larger" households, I mean 3 or more people in developed countries, and 4 or more in developing countries (where households tend to be larger).

If you interview more people in larger households, this can slightly increase the accuracy of the survey, but you will be unable to determine the exact sample size in advance. The simplest solution is to base your calculations on one person per household. Not many households will have two interviews, so the final sample size will be perhaps 5% to 10% larger than you planned.

If you survey all people in the household (except perhaps children), this solves one problem, but creates several others:

Another approach is to interview all household members at the same time, using a single questionnaire. We used this in a survey of Aboriginal people in central Australia. In the evenings, they usually sat outdoors in small groups, listening to portable radios. The interviewers would approach one of these groups, play brief taped extracts from radio programs, and ask the respondents’ opinions of each program. But the questionnaire was different from a normal one: instead of ticking boxes for "like it", "dislike it", and "not sure", the interviewer would write in the number of people giving each possible answer.

In a survey where respondents fill in their own questionnaires, and these are collected later by the interviewer, it’s normal to give a questionnaire to each person in the household. This boosts the sample size at little extra cost, but also helps prevent people filling in questionnaires intended for another member of the household.

14. Choosing respondents within households

A common mistake in survey research is to interview the first person met in each household. This will produce a badly skewed sample, nullifying any care that has been taken in producing a representative sample of households. This is important for any survey, but particularly for surveys measuring radio and TV audiences.

What is the problem with interviewing the first person the interviewer meets? It's because the more time somebody spends at home, the more chance they have of being interviewed, with this method of choosing respondents. People who spend a lot of time at home have different habits from people who are out a lot. For example, most radio and TV viewing is done at home, so if the first person found in each household is interviewed, the survey will overestimate the amount of listening and viewing.

For the same reason, surveys carried out in streets and public places will usually underestimate radio and TV audiences.

In Australia, some types of people (e.g. women and younger people) are much more likely than others to answer the door (or the telephone) when an interviewer visits. In other countries, such as Western Samoa, it is normal for the oldest man in the household to greet any strangers.

The best approach is for the interviewer to speak to the first person met, work out who should be interviewed, then to interview the appropriate person. There are three main methods for choosing a respondent: the birthday method, the Kish Grid, and quotas.

The birthday method

Most market research books recommend asking for the person who last had a birthday (or who next will have one). In theory, everybody in a household has an equal chance of being selected by this last birthday or next birthday method, but my research has found this does not produce the correct balance of sexes and age groups. Also, it only works in households where everybody knows everybody else's birthday. In countries where birthdays are not celebrated, many people don't know their family's birthdays.

The Kish Grid

This is a table of numbers, named after the statistician who invented it. The number of people in the household is discovered, and a random number is chosen to select a particular person.

My research in Australia found that the Kish method can cause a high refusal rate: elderly women, in particular, are often suspicious when the first question in a survey is "How many people live in your household?" — particularly if they live alone. In developing countries, where few old people live alone, this may not be a problem. Here’s an example of a Kish grid, with instructions. This is based on 8 households per cluster, interviewing 1 person per household.

Instructions for using Kish Grid

  1. Find out how many people living in the household are eligible to be interviewed. Include people who sleep there, but are not there when you visit. Ignore children aged under 15.
  2. The youngest (excluding children under 15) is number 1, the second youngest is number 2, and so on.
  3. The first household where you do an interview is household 1, the second is household 2, and so on, up to household 8 - the last in the cluster.
  4. Look up the column for the household number, and the row for the number of eligible people. The number in the cell where the column and row meet is the person to interview. For example, if household 2 has 3 adults, interview the 2nd youngest (shown in bold type). If that person is not there when you call, arrange to come back later.
Eligible
people

Household

1

2

3

4

5

6

7

8

1

1

1

1

1

1

1

1

1

2

1

2

1

2

1

2

1

2

3

1

2

3

1

2

3

1

2

4

1

2

3

4

1

2

3

4

5

1

2

3

4

5

3

4

5

6

1

2

3

4

5

6

3

6

7

1

2

3

4

5

6

7

4

8

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

10 or more

1

2

3

4

5

6

7

8

The reason for numbering the household members from the youngest upwards (instead of the seemingly more obvious oldest downwards) is that younger people are more difficult to find at home, so the above grid gives young people a slightly higher chance of being interviewed.

Quota selection within a household

When selecting a respondent within a household, the most practical method is often a type of quota sampling. Though quota sampling was criticized earlier in this chapter, most of its problems do not apply when selecting a member of a household.

A common approach is to interview a woman in half of all households, and a man in the other half — in most parts of the world, where the sex balance is close to 50/50.

To ensure a good balance of old and young people, age-based quotas can also be applied. One of the simplest quota systems is to ask to interview the oldest person in the household (in half the households visited) and the youngest eligible person (in the other half of the households).

Household quotas can be based on other factors apart from sex or age group. It can be useful in radio and TV surveys to have separate quotas for people who stay home most of the time (housewives, retired and unemployed people), and those who spend less time at home: workers and students. Of course, such quotas must be based on known figures, usually on Census data. If 60% of the eligible population are workers and students, and 40% stay home, and the quotas reflect these percentages, 60% of respondents will be workers and students.

15. Sampling people in public places

If survey results are to be projected to the general population, a bad way of selecting a sample is to interview people in the street or at a shopping centre, particularly on a weekday. Workers and students are under-represented in such surveys, as are people who are too busy to be interviewed, and those who seldom walk around streets or shopping centres. About the only valid use of shopping-centre surveys is when the population of interest is shoppers. Market research companies do many surveys in shopping centres, usually about products bought in shops. Because their target audience is shoppers, these surveys are reasonably representative.

Another problem with surveys in public places is that they often greatly under-estimate broadcast audiences - because people who spend a lot of time in public places (and therefore less at home, where they might watch TV or listen to radio) are more likely to be interviewed.

A partial solution to this problem is to control for how much time a respondent is likely to spend away from home, by setting quotas based on employment status and age, as shown in Census data. For example, if 15% of the whole population are students aged under 25, then 15% of respondents should be in the same category. This method is far from perfect, but it produces more accurate results than not using quotas.

Occasionally, there’s no alternative to doing surveys in public places. In cities in Papua New Guinea, for example, the crime rate is horrendous. Houses are surrounded by high fences with locked gates, and guarded by fierce dogs. Interviewers cannot get access during the day, and it is dangerous to go to unknown places at night. It’s not possible to do a survey by telephone, as less than 1% of households have a phone. Nor is it possible to do a mail survey, because the literacy rate is less than 50%. So surveys in public places are the only feasible alternative - despite their problems.

16. Checklist of sampling decisions to be made

This checklist applies to a door-to-door survey, which uses the most complex sampling. For other types of survey, which do not use clusters, items 5, 6, and 7 do not apply.

1. Decide on the exact area to be surveyed. If possible, get a map of this area. Also, try to get census data for the area.

2. Will there be one questionnaire per person, or one per household? If one per person, what will be the minimum age? (Usually between 10 and 18.)

3. Decide on the sample size - always a compromise between the funding available and the need for accuracy. If you’re doing the survey yourself, and it’s your first one, I suggest 100. If this later turns out to be too small, you’ll now be able to do a second survey, with a larger sample - with your newly gained experience, you’ll do it better than the first one. Otherwise, I recommend a sample size of about 300. This is on the small side, but will usually provide detailed enough information.

4. Decide how the sampling will be done. If a population list is available, use it. Otherwise, find the method which best gives everybody in the surveyed population the same chance of being interviewed.

5. Decide on the cluster size. Suggestion: between 4 and 20. A size of 8 to 10 usually works well. At the same time, decide the number of clusters. If you are interviewing one person per household, the sample size is the cluster size times the number of clusters.

6. Can you sample respondents directly, or will you have to use another sampling method within each district? If the latter, each district will have to be visited before interviewing, to draw a local sample.

7. Decide on the route interviewers will take from the starting address - e.g "always turn left, and skip two households after each interview".

8. Decide how many people per household to interview: 1 per household, or 2 in larger households, or every adult.

9. Decide which method you will use to choose the respondents within households: last-birthday, Kish grid, quota, or everybody.

10. Decide on your substitution policy: if some people refuse to be surveyed, will they be replaced? By somebody in the same household, or by adding another household to the end of the cluster, or what?

Conclusion: is sample design really necessary?

"Is it really worthwhile to go to all this trouble, just to get a sample?" you may wonder. "Why not just interview anybody?" Occasionally, an informal method of sampling will give reasonably accurate answers. The problem is that if you do a survey that takes such shortcuts, you will never know how inaccurate your findings are.

Market research companies, by repeated testing and comparison of results from various surveys, may have found they can get away with statistically imperfect sampling, but it’s harder for inexperienced researchers to justify such shortcuts.

If you are doing a survey whose results are likely to encounter some opposition, people who do not like the results may challenge the survey’s validity. If you can demonstrate that the sample was drawn by correct probability methods, the survey’s results are more likely to withstand scrutiny.

Even if you intend to use the results only for your own purposes, there is little point in doing a survey unless the results are as accurate as possible.