Machine-readable distributions database?

Does one exist, or should we create one?

Richard D. Morey
3 min readOct 28, 2016
A basic definition of a distribution in the stat-distributions-js project.

A few years ago, I had an idea: a machine-readable database of statistical distributions that contained all the information needed to algorithmically generate reference information for statistical distributions. One could select a distribution in the database, get information about its properties, see and manipulate graphical representations of that distribution, or load the database into a programming environment (e.g. R) and immediately have access to densities, CDFs, moments, sample generation, etc.

Excerpt from the reference list of distributions generated by the stat-distributions-js project.

Currently — as far as I’m aware — information about statistical distributions is distributed in mostly human-readable form, in a way that makes it difficult to algorithmically generate documentation, demonstrations, visualisations of the distributions and relationships between distributions, etc. I’m imagining an authoritative reference on statistical distributions that is machine-readable.

A visualization generated from the stat-distributions-js database.

I started to generate such a database in javascript. The result can be found on GitHub: richarddmorey/stat-distributions-js. The demo shows a list of distributions that is generated on the fly, as well as interactive visualizations. This is a very basic implementation of the core idea.

However, the project was insufficient for a variety of reasons, not the least of which was that I was just learning javascript at the time. The definitions of the distributions are locked in javascript objects, which are limited. It is not generic enough to be useful everywhere, and contains, for instance, no information about relationship between distributions.

I would like to begin working on this core idea again. I think an XML database of distributions (e.g., like this example) would be an extremely useful project, for teaching and for reference. Imagine being able to do the following:

  • Add a new distribution to the central distributions database, and immediately have basic sampling routines, density/cumulative density/quantile functions, and documentation available in your favorite statistical program.
  • Programmatically generate interactive visualizations for any parameterization of any distribution in the database.
Graphics like this could be programmatically generated from the project (from Leemis and McQueston, 2008)
  • Programmatically, and interactively, generate visualizations of relationships between distributions, selectively focusing on specific relationships (e.g., distributions that are simple functions of normal variates, special cases, or Bayesian conjugacy).
  • Search the database for distributions with particular properties (e.g., support on the positive integers, distributions that represent the minimum of another distribution).
  • When creating a reference section for teaching article or book, simply export excerpts from the database in human-readable tables.

If you know of a similar project, I’d like to know about it. If no such project exists, I would like to create it. This would represent a substantial undertaking to do well, so if you’d like to help let me know.

--

--

Richard D. Morey
Richard D. Morey

Written by Richard D. Morey

Statistical modeling and Bayesian inference, cognitive psychology, and sundry other things

No responses yet