Introduction to Research Data Management Lesson

Lesson Description

This lesson aims to teach those just starting to undertake research how to manage their data and files.

Target Audience

Masters/PhD/Postdoc researchers at the beginning of their projects.
Basic digital skills required (e.g., file management, Excel, some version control exposure).
No programming experience necessary.

Prerequisites

Basic Excel use (open/save tables)
File/folder management on a computer
A research project or dataset in progress

Learning Objectives

After completing this course, the learners should be able to:

Define research data and distinguish between different data types.
Structure research materials using clear file naming conventions and a logical folder hierarchy
Describe methods of data collection that make data cleaner and easier to analyse
Detect inconsistencies and errors in a tabular dataset ("dirty data")
Use a set of basic techniques to remove/correct errors and inconsistencies in tabular data ("cleaning data")
Use version control to track different versions of files, and switch between them.

Authors

Victoria Yorke-Edwards (@vyorkeedwards)
Kimberly Meechan (@K-Meech)
Katie Buntic (@katiebuntic)

Dataset & Narrative

Dataset:

Size:
Types:
Requires noise/messiness injection for teaching
Licensing:

Narrative:

A fictional researcher, Alex, inherits disorganised MET data. Learners help clean and structure it.

Challenges include:

Unclear file naming (final_final_v3.csv)
Scattered/misplaced files
Dirty data: duplicates, missing values, format errors, inconsistent naming

Episodes

1. What is Research Data?

Data types
Sources of data
What is research data management (collection, storage, organisation, sharing, etc)?

Need to write objectives

2. Structuring research materials

Naming conventions
Folder structures
Version Control
Introduction to version control software, Git/ Github

Objectives

After following this episode, learners will be able to:

Organise their research data into a standard folder structure
Name files with a consistent naming convention
Understand why version control is important, and how to incorporate this into your naming conventions
Explain why version control software such as Git/GitHub can be useful for certain types of data.

3. Tabular data collection

Have a look at a 'dirty' data set
Is there a standard set of responses?
Is it free text?
How do you control what data is being collected?
Asking the right questions
Data dictionaries

Objectives

After following this episode, learners will be able to:

List variable types and formats
Identify inconsistencies in data that can cause problems during analysis
Describe methods that can be used during data collection and data entry that can prevent inconsistencies
Write guidance for how to collect and enter data
Create a data dictionary describing a dataset

4. How to clean a tabular dataset (using Excel)

Finding inconsistencies
Missing data
Capitalisation
Spelling mistakes
Pros and cons of Excel

Objectives

After following this episode, learners will be able to:

Describe what data cleaning is and why it is important
Find and resolve inconsistencies within a tabular dataset programmatically (e.g datetime, numeric precision)
Identify missing values within a tabular dataset using filters
Correct spelling mistakes using spell check tools and find + replace
Standardise text formats using spreadsheet functions
Describe the pros and cons of using spreadsheets for data collection and cleaning
[Note: update for using R?]

5. Introduction to R

Need to write objectives

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github/workflows		.github/workflows
episodes		episodes
instructors		instructors
learners		learners
profiles		profiles
site		site
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
config.yaml		config.yaml
draft_notes.md		draft_notes.md
index.md		index.md
intro-to-research-data-management-carpentries.Rproj		intro-to-research-data-management-carpentries.Rproj
links.md		links.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction to Research Data Management Lesson

Lesson Description

Target Audience

Prerequisites

Learning Objectives

Authors

Dataset & Narrative

Dataset:

Narrative:

Challenges include:

Episodes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

License

katiebuntic/intro-to-research-data-management-carpentries

Folders and files

Latest commit

History

Repository files navigation

Introduction to Research Data Management Lesson

Lesson Description

Target Audience

Prerequisites

Learning Objectives

Authors

Dataset & Narrative

Dataset:

Narrative:

Challenges include:

Episodes

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Packages