Differences between open coding and a priori coding
Published byat November 4th, 2021 , Revised On November 16, 2021
In research, a code is a ‘label.’ It is generally a word—but can also be a short phrase—that symbolically assigns an important, essence-capturing, and/or evocative attribute of a piece of data. That piece of data can be a text, a line from a text, a word, or even a visual piece of data such as an image, video, etc.
Codes are assigned to such kinds of data that constitute interview transcripts, participant observation field notes, journals, documents, literature, artefacts, photographs, videos/images, websites, e-mail correspondence, etc.
Coding in research
The term coding deduced from the Greek meaning ‘to discover,’ refers to a process in research where codes are assigned to the data. The codes are labels. And the parts of data they are assigned are responses gathered from respondents/participants.
Note: In research, ‘participant’ and ‘respondent’ are not the same, even though in some places, they might be used interchangeably. The former is used when the research tool is a survey, questionnaire, and the like. But the latter is used when a data collection instrument like an interview has been used.
Example from everyday life
The product catalogues that grocery stores, as well as other types of market stores, use mostly have a list of categories. Those ‘categories’ in themselves are labels. To keep it simple, one can say those labels or categories are codes. And each of those codes represents the items that are categorized under that label alone.
For instance, the category of ‘toiletries’ will contain only those products that are related to personal care. Similarly, the category of ‘disposables’ will only contain plastic, disposable items within it and not, let’s say, fruits or vegetables.
Now that the basic principle behind coding is clear, it can be redefined as a transitional process between data collection and a more extensive data analysis. Once the responses have been gathered, they are then analysed and coded.
Encoding and decoding
Two main sub-processes constitute coding itself and they are decoding and encoding.
- Encoding: coming up with the code itself and assigning it to some piece of information; in other words, ‘labelling’ a text, a line of text, or even a word.
- Decoding: looking at, analysing, and deciphering the meaning of a text/line of text/word from the text.
Characteristics of coding
Before coding any piece of data, it is important to keep the following two characteristics in mind:
- Coding is heuristic, implying that it is an exploratory, problem-solving technique without specific formulae to follow.
- Coding is only the initial step toward an even more rigorous and evocative analysis and interpretation of a report. As such, it is not just labelling, it is linking: “It leads you from the data to the idea, and from the idea to all the data pertaining to that idea” (Richards & Morse, 2007, p. 137).
Coding for patterns
In larger, complete data sets, some or even many of the same codes are used repeatedly throughout. This is both natural and intentional. It is natural in the sense that there are many repetitive patterns of action and consistencies in human affairs. And it is deliberate because one of the coder’s primary goals is to find these repetitive patterns of action and consistencies in human affairs as documented in the data.
Stages of coding
Coding follows three steps:
- Identifying the codes: Look at the data. Analyse it. Identify which kind of text has to be coded. Is it a transcript of an interview? Is it an image? Develop codes and assign them to the text accordingly. For instance, in the example mentioned above, products from a grocery store catalogue (‘cleaner,’ ‘package material,’ etc.) might constitute labels or codes.
- Grouping codes to make categories: Codes that are similar to one another can be grouped under the same family, or ‘category,’ such as the example mentioned above regarding grocery store catalogue. Every product is grouped into certain categories, such as the household category of products.
- Grouping categories to develop themes: If codes can be similar, so can the categories they are grouped under. Naturally, similar categories are grouped under a single theme in coding. In the grocery store catalogue example, for instance, categories of ‘household’ and ‘cleaning/packaging’ can be grouped under a single theme, ‘domestic’ or ‘uneatables.’
From codes to categories
Codes are grouped to develop categories. When one searches for patterns in coded data to categorize them, they might sometimes group things not just because they are exactly or very much alike, but because they might also have something in common. Paradoxically, that commonality might consist of differences, even.
For example, every individual from a specific region might have a strong opinion about who should be leading that specific region. The fact that everyone has an individual opinion about that issue is a commonality. But as for whom everyone believes should be leading the country, that is where differences might occur.
Therefore, while assigning categories to codes, the following trends in coding should be kept forefront (Hatch, 2002, p. 155):
- Similarity (things happen the same way)
- Difference (they happen in predictably different ways)
- Frequency (they happen often or seldom)
- Sequence (they happen in a certain order)
- Correspondence (they happen with other activities or events)
- Causation (one appears to cause another)
Coding practices – When to code
The majority of qualitative researchers will code their data both during and after data collection. This is an analytic tactic, for coding is analysis (Miles & Huberman, 1994, p. 56).
Ways of coding data
There are two very popular ways to go about the process of coding. They are:
- A priori coding
- Open coding (also known as emergent coding)
A priori coding
In a priori coding, codes are developed beforehand; they are pre-determined. But in open coding, as the name suggests, a researcher comes up with codes at the time of data analysis. They are not predetermined. Prior knowledge, read literature, and theories all help in the development of a priori codes, categories and themes.
Furthermore, a priori coding:
- Keeps the researchers more focused
- Might lead the researchers to miss important information, for only that part is coded for which an appropriate code has been predetermined. All remaining data for which no codes have been predetermined is not considered.
Steps to a priori coding
Step #1: Read and analyse previous literature.
Step #2: Develop codes accordingly.
Step #3: Look for chunks within data that fit a given code. Apply the code. Leave the rest of the data which cannot be assigned a code.
On the other hand, open coding is very extensive and time-consuming. It involves assigning codes, revisiting them to identify categories (axial coding) and then developing themes from there on out. Prior assumptions about the topic are left behind; the data in front of the researcher does the thinking. It guides the formation of codes itself.
It is also called emergent coding for as the name suggests, codes ‘emerge’ at the time of data analysis. No piece of information from within the data is left out. This way, every kind of response—especially in the case of interviews, reports, etc.—gets acknowledged and reported later during data interpretation.
Open coding is analytical and inductive. It codes using two main tasks: by making comparisons and by asking questions. This is why grounded theory is mostly termed as the ‘constant comparative method of analysis’ (Glaser & Strauss, 1967).
Steps to open coding
Step #1: 1. Convert the data into small, discrete ‘chunks.’
Step #2: Assign an appropriate label for each chunk.
The same code can be used again for similar chunks.
Example from a research study
Suppose a study aims at answering the following question:
“What happens to undergraduate students’ environmental practices when they are engaged in repeated discussions on lifestyles and environmental problems?”
To answer the question, data is gathered from undergraduate students who had participated in repeated discussions on lifestyles and environmental problems. Then coding begins. It can either be done via open coding or via a priori coding. Both are discussed below:
In this case, a researcher might want to code certain words, phrases, or sentences from the respondent’s answers based on the concept they closely reflect. They can be based on the following codes:
|Thinking of1 and talking about2 our
consumer patterns3 made me understand4 the relationship between consumption and environmental impact. I have decided to buy5 only those things which are necessary6.
2, 3 Talk on consumer patterns
4 Understanding the environmental impact
5, 6 Decisions about purchasing necessary items
|Questioning and discussions7 on our
lifestyles8 has really raised our awareness about our own actions9 which impact the environment. We had never experienced such discussions in our schools or colleges.
|7 Discussion on lifestyles
8, 9 Awareness about personal environmental actions
|A priori coding
In this case, the researcher might decide to code the data about undergraduate students engagement in repeated discussion on environmental issues in terms of the following themes:
|Thinking of and talking about our consumer patterns made me understand the relationship between consumption and environmental impact. I have decided to buy only those things which are really necessary. I will save energy and plant trees.||Understanding|
|Questioning and discussions on our lifestyles have really raised our awareness about our own actions and globalization. We had never experienced such discussions in our schools or colleges.||Awareness about globalisation|
1. Are open coding and inductive coding the same?
When a researcher induces codes directly by looking at the data, the resulting codes are called inductive codes. Therefore, open coding is in itself an inductive process. Essentially, open coding is inductive, whereas a priori coding is deductive.
2. How to decide which coding method to use?
The answer to that question is multifaceted. It depends on:
- Research question(s)
- Method(s) of data collection
- Instrument(s) used to gather data (most important factor)
- Aim(of) of research/study and the like
3. What are the main pros and cons of open and a priori coding?
Even though both are effective methods of coding data, both have their advantages and disadvantages, like everything else. Open coding is very extensive and time-consuming, but it accounts for all kinds of details mentioned by respondents. On the other hand, even though a priori coding is easier, it leaves some chunks of data it cannot code.
In other words, open coding compromises on time and effort to come up with the best quality of coded data. Contrarily, a priori coding compromises over quality to come up with quick codes, categories and themes.
4. Is thematic analysis the same as data coding?
No, as coding is a process that comes within the thematic analysis. In simple words, thematic analysis is an analysis of data based on themes, unlike content analysis, where words are simply counted based on a predetermined goal/criterion. Themes cannot be analysed without coding and then categorising them first. Only when these two stages have been completed can data be themed.