olihawkins

An R package for simple data wrangling

7 Jul 2018 11:28 GMT

I recently started a new role at work where one of my tasks is helping statisticians to develop data science skills. I've noticed that one of the most challenging obstacles people encounter when first learning to program is how much you need to learn in order to become productive.

It takes time to become a good programmer — it's a learning experience that never really ends — but there is an inflection point when you become more fluent and the time you spent learning how do each thing for the first time starts to pay off.

The question I keep coming up against is how to motivate people who are learning to program in a professional setting to persevere through the initial learning period, when doing something in a new way is less efficient than doing it the old way.

Part of the answer is to show people the remarkable things that can only be done in the new way. But perhaps even more important is lowering the barrier to entry: reducing the time it takes beginners to learn simple and useful things.

With that in mind I wrote an R package called cltools, which is designed to make common data wrangling tasks easier to perform. These are all things that an experienced R user could do with base R or tidyverse functions. But the point is to reduce the level of skill people need in order to do useful work with R.

The package is primarily designed to help statistical researchers and data journalists covering public policy fields. It focuses on the simple data wrangling tasks that these researchers do most often. Things like calculating row and column percentages, creating indices, and deflating prices to real terms.

Let me know if you have any suggestions for ways to make it better.