Skip to content

katiebuntic/intro-to-research-data-management-carpentries

Repository files navigation

Introduction to Research Data Management Lesson

Lesson Description

This lesson aims to teach those just starting to undertake research how to manage their data and files.

Target Audience

  • Masters/PhD/Postdoc researchers at the beginning of their projects.
  • Basic digital skills required (e.g., file management, Excel, some version control exposure).
  • No programming experience necessary.

Prerequisites

  • Basic Excel use (open/save tables)
  • File/folder management on a computer
  • A research project or dataset in progress

Learning Objectives

After completing this course, the learners should be able to:

  • Define research data and distinguish between different data types.
  • Structure research materials using clear file naming conventions and a logical folder hierarchy
  • Describe methods of data collection that make data cleaner and easier to analyse
  • Detect inconsistencies and errors in a tabular dataset ("dirty data")
  • Use a set of basic techniques to remove/correct errors and inconsistencies in tabular data ("cleaning data")
  • Use version control to track different versions of files, and switch between them.

Authors

Dataset & Narrative

Dataset:

  • Size:
  • Types:
  • Requires noise/messiness injection for teaching
  • Licensing:

Narrative:

A fictional researcher, Alex, inherits disorganised MET data. Learners help clean and structure it.

Challenges include:

  • Unclear file naming (final_final_v3.csv)
  • Scattered/misplaced files
  • Dirty data: duplicates, missing values, format errors, inconsistent naming

Episodes

1. What is Research Data?

  • Data types
  • Sources of data
  • What is research data management (collection, storage, organisation, sharing, etc)?

Need to write objectives

2. Structuring research materials

  • Naming conventions
  • Folder structures
  • Version Control
  • Introduction to version control software, Git/ Github

Objectives

After following this episode, learners will be able to:

  • Organise their research data into a standard folder structure
  • Name files with a consistent naming convention
  • Understand why version control is important, and how to incorporate this into your naming conventions
  • Explain why version control software such as Git/GitHub can be useful for certain types of data.

3. Tabular data collection

  • Have a look at a 'dirty' data set
  • Is there a standard set of responses?
  • Is it free text?
  • How do you control what data is being collected?
  • Asking the right questions
  • Data dictionaries

Objectives

After following this episode, learners will be able to:

  • List variable types and formats
  • Identify inconsistencies in data that can cause problems during analysis
  • Describe methods that can be used during data collection and data entry that can prevent inconsistencies
  • Write guidance for how to collect and enter data
  • Create a data dictionary describing a dataset

4. How to clean a tabular dataset (using Excel)

  • Finding inconsistencies
  • Missing data
  • Capitalisation
  • Spelling mistakes
  • Pros and cons of Excel

Objectives

After following this episode, learners will be able to:

  • Describe what data cleaning is and why it is important
  • Find and resolve inconsistencies within a tabular dataset programmatically (e.g datetime, numeric precision)
  • Identify missing values within a tabular dataset using filters
  • Correct spelling mistakes using spell check tools and find + replace
  • Standardise text formats using spreadsheet functions
  • Describe the pros and cons of using spreadsheets for data collection and cleaning
  • [Note: update for using R?]

5. Introduction to R

Need to write objectives

About

Intro to Research Data Management

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •