On a bit of a whim, I just added support for SparkML pipelines into Vizier's Scala cells (an easy 1-2 hr of effort). It ended up highlighting a lot of the value of building research software to existing standards for data science software. A few years back we completely rewrote Mimir ("todo notes" dataset annotations) for Spark Logical Plans. Now, without any extra effort or intent, Mimir annotations propagate through SparkML pipelines as well!


Recent data exploration for work has also highlighted Vizier's main value proposition: Scala was pretty much the choice for building a ML pipeline, but I didn't think twice about tossing in a SQL cell for some ETL, or a Python cell for data vis in bokeh. It all just works seamlessly together.

· · Web · 0 · 0 · 0
Sign in to participate in the conversation
X marks the spot

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!