staypuft: object validation and serialization

& should this even be a package?

Scott Chamberlain ( @sckottie)


pain point: serialization

converting data in one format to another format

especially painful when complex

other languages have good ideas

marshmallow - a Python library


A lightweight library for converting complex objects to and from simple Python datatypes.

back to R

similar art in R

  • assertr (assertions for analysis pipeline)

  • validate (very similar to assertr AFAICT)

  • errorlocate (find errors in datasets)

  • any others?


why?/use cases

  • data validation: lots of potential users

  • remote data sources can change: schemas help validate and catch changes

  • use in scripts (most researchers): help raise issues with scripts as time goes on and data inputs change

  • using R with plumbr or similar: convert data to serve to API or consume from API request bodies

To do

  • Nested data works - but needs more testing

  • Add more 'field' types: url, email, (domain specific types)

  • Add support for user-defined fields

  • Probably add an easier to use interface, less R6'y

wait ...
should this even be a package though?

When should I not make a pkg?

  • the pkg doesn't solve actual use cases

  • there's significant overlap with existing solutions

    • and maintainers are responsive

  • there's higher priority/lowering hanging fruit

Use cases

For staypuft, likely many users

Everyone deals with objects in R

& I'm not against sillyness

elephant in the room ...

aren't you just re-making S4?

higher priority/lower hanging fruit

  • I've got many other packages

  • Many of which have many users

  • What if new package has a huge impact though?

    • How would I know?


staypuft future is unclear

if you're interested: