Blog Post: A data what? How I learned to compile a data dictionary
During the first week of my internship with the Independent Evaluation Unit (IEU), the Head of the Unit asked me to create a data dictionary. Now, this came as a surprise! I had no idea what a data dictionary was, let alone how to create one! And so, with a nervous nod of the head, I decided to take this as a challenge, as an opportunity to learn and grow.
After some research, I discovered that a data dictionary is a store of definitions about objects in a database, such as the nature of the data, the names of all the database tables, details about who owns the data, the age of the data, whether the data is subject to intellectual property rights, and so on.
The irony of creating a data dictionary is that the people who see your hard work are not the people you are doing it for – the people who will use the database for research. Only the database manager sees the data dictionary. She or he manages it so researchers can find and manage the data they need more efficiently.
But no sooner had I begun to understand data dictionaries, I faced another surprise. Like most things in the IEU, our data dictionary is not your stock, and standard data dictionary described above. That would be too easy for the IEU.
IEU’s data dictionary is different. How different? Instead of just metadata that describes the primary attributes, the IEU data dictionary aims to tell the story of how variables are identified. This is especially useful when analyzing lengthy documents that could be subject to different interpretations. Clearly defining the standardized meaning of the variables is critical. Why? It ensures that the data is understood the same by anyone who will carry out the extraction and analysis.
Some of the key lessons that I discovered are listed below.
1) Know your data source. That is, do enough research on the document you wish to extract variables. Conduct a meeting with stakeholders to discuss key ideas and research hypotheses.
2) Get more information. Investigative skills are critical. Talk with universities and experts for further understanding.
3) Create a zero draft. This is the initial framework of the data dictionary and is subject to change as the process continues. Once you have read the source document and have identified all the variables that are essential to answer your research questions, it is time to create your data dictionary. These variables are entered into a spreadsheet and binary coded.
4) Pilot your data dictionary. This stage requires deciphering whether the chosen variables are in the source document. Typically, you use five percent of the source documents for this phase.
5) Modify the data dictionary. Revisit the dictionary to make revisions depending on the results from the pilot phase. Ask your colleagues if they can understand the dictionary.
6) Voila! You have a complete data dictionary. You are now ready to proceed to the extraction phase and begin mining the data.
No doubt, more lessons are headed my way as I continue working on the IEU’s unique data dictionary. But that’s the thing about learning, it never ends. It’s a lifetime process. I am in for the long haul!
Did you like this blog post? Let us know your thoughts and follow us on Twitter to stay updated with the latest posts!