staypuft: object validation and serialization

& should this even be a package?



Scott Chamberlain ( @sckottie)

rOpenSci

pain point: serialization


converting data in one format to another format


especially painful when complex

other languages have good ideas


marshmallow - a Python library

marshmallow


A lightweight library for converting complex objects to and from simple Python datatypes.

back to R

similar art in R

  • assertr (assertions for analysis pipeline)

  • validate (very similar to assertr AFAICT)

  • errorlocate (find errors in datasets)


  • any others?

 ropensci/staypuft


why?/use cases

  • data validation: lots of potential users

  • remote data sources can change: schemas help validate and catch changes

  • use in scripts (most researchers): help raise issues with scripts as time goes on and data inputs change

  • using R with plumbr or similar: convert data to serve to API or consume from API request bodies

To do

  • Nested data works - but needs more testing

  • Add more 'field' types: url, email, (domain specific types)

  • Add support for user-defined fields

  • Probably add an easier to use interface, less R6'y

wait ...
should this even be a package though?

When should I not make a pkg?

  • the pkg doesn't solve actual use cases

  • there's significant overlap with existing solutions

    • and maintainers are responsive

  • there's higher priority/lowering hanging fruit

Use cases


For staypuft, likely many users


Everyone deals with objects in R

& I'm not against sillyness

elephant in the room ...

aren't you just re-making S4?

higher priority/lower hanging fruit


  • I've got many other packages

  • Many of which have many users

  • What if new package has a huge impact though?

    • How would I know?

So...


staypuft future is unclear

if you're interested:

 ropensci/staypuft

 scotttalks.info/staypuft