📚 LLM-as-a-Judge: a Complete Guide on Using LLMs for Evaluations. Get your copy

Community

AMA with Stefan Krawczyk: from building ML platforms at Stitch Fix to an open-source startup on top of the Hamilton framework

Last updated:

November 27, 2024

Published:

March 19, 2023

contents‍

Start testing your AI systems today

Get demo

We invite ML practitioners to share their experiences with the Evidently Community during the Ask-Me-Anything series.

This time, we talked with Stefan Krawczyk. Stefan is the CEO and Co-founder at DAGWorks, an open-core platform for ML dataflows. Before starting his own company, Stefan led the Model Lifecycle team at Stitch Fix, where he focused on building the self-service ML platform for over 100 data scientists. He is also the co-creator of Hamilton, a general-purpose micro-framework for creating dataflows from python functions.

We chatted about how to build an ML platform, what data science teams do wrong about ML dataflows, and what it takes to start an open-source company.

‍Sounds interesting? Read on for the recap of the AMA with Stefan.

Building internal ML platforms

How do you approach building an ML platform, and what factors should be considered during development?

I have a post on my learnings from my time at Stitch Fix that gets at this question well.

To summarize, there are five key learnings:

Build for a particular vertical/use case first and deliver incremental value, where either you inherit something that works or target a specific team that will adopt your work as soon as it's ready.
Don't build for every user equally. Let sophisticated users fend for themselves until it's proven that you should invest your time in them.
Don't leak underlying vendor/implementation details if you can. Provide your own thin wrapper around underlying APIs to ensure you have more options you can control when you have to make platform changes.
Live your users' lifecycles. Remember that you provide for and shape the experience of users using your platform, so don't forget the macro context and the implications of your UX. Drink your champagne/eat your dog food to ensure you foresee/understand resonating impacts of what you provide.
Think about providing two layers of APIs to keep your platform development nimble:
(a) Think about a bounded foundational API layer. What are base-level primitives/capabilities you want your platform to provide, and thus what is a good base to build on top of for yourself?
(b)Think about an opinionated higher-level API layer. This layer should be much simpler for the average user than your lower foundational API layer. To handle more complex cases, it should still be possible for more advanced users to drop down to your bounded lower-level foundational API.

Building up vertically vs. all at once — *"Building up vertically" vs. "all at once". Source: Stefan's blog "What I learned building platforms at Stitch Fix*"

Looking back at your experience building the ML platform at Stitch Fix, is there anything you would do differently?

If I were to start over today, I would pull many more open-source things in because they've reached a state of maturity that maintaining something yourself isn't necessary. That wasn't the case six years ago, so we built many things in-house.

The other side of that is that I would have started more projects with the intent to open source :) We had some cool things, but they were so coupled with Stitch Fix that it would have been a lot of effort to open-source them.

There's also more maturity in the vendor market, so there is also the potential to pay someone else for it.

What is your take on the buy VS build dilemma for data and ML platform teams? How would you suggest approaching this decision for different platform components or as a general guideline?

Generally, I would say:

You need to think around the lines of what will be core to your business.
You need to understand the maturity of the solution you need and how long it should last.

Then from the above two, you can do some math to figure out whether it's valuable to invest your own money or bring someone else in.

Hamilton and DAGWorks

For those of us who are not familiar with Hamilton, could you please explain what it does?

From a computer science perspective, it is a micro-framework for describing dataflows. Dataflows is the term to describe compute + data movement.

It came from helping a data science team at Stitch Fix manage their feature engineering process. For example, rather than a procedural python script with pandas, you can use Hamilton to describe the equivalent computation, but with added benefits of software engineering practices coming naturally. E.g., code is always unit testable, documentation friendly, forces you to decouple logic from context, and you can visualize what is happening quickly.

You get all that for free because you describe everything in declarative functions. For more, I recommend trying Hamilton in a browser, you don’t need to install anything to run the hello world example! Note it may take a while to load for you, depending on where you are in the world!

‍Recently you did your public YC Launch with DAGWorks. Could you please share your vision of what you want the product to be and what is the ultimate challenge you solve?

We want to:

stop the pain and suffering of ML pipeline (aka ETL/Workflow) maintenance/hand-off
and enable teams to iterate on features and models faster by helping provide the "platform abstractions" that simplify what they need to engineer to build and own everything. We enable more self-service and help people move from development to production/maintain production faster.‍

Our angle is that we want to connect to your existing infrastructure to do it. E.g., MLFlow, Airflow, and Snowflake are just implementation details, so data scientists and ML engineers shouldn't have to know how best to connect with them because we can provide those connectors.

What led you to start a startup, and what are the most important factors for its growth, especially in the ML field?

I got the thought implanted when I went to Stanford over a decade ago. Throughout my career, I've been trying to find opportunities to learn how companies are built, what makes a good product, etc. I also learned I am pretty opinionated, so I would have a chip on my shoulder if I didn't start a company.

The reception Hamilton got when we open-sourced it was energizing! We are passionate about helping data science and ML practitioners do more without having to be awesome software engineers, and I have the right co-founder to execute with.

The most important factors for growth are getting people to use what you're building and giving you valuable feedback. The best measure is the readiness to pay for the product. With open source, things differ a little between companies regarding the best metrics. But in any case, curating a community, much like Evidently has done here, is one important factor, as creating awareness about ML/DS tooling is a big challenge. Otherwise, GitHub stars are a reasonable proxy to measure against for an open-source project. If you can put telemetry in, track that growth.

How did your past experiences at Stitch Fix, Idibon, Nextdoor, and LinkedIn influence your decision to start DAGWorks?

In terms of past experiences:

Understanding the breadth of problems is one key insight:

types of modeling,
types of organizations (some that are good with data, others that need to learn),
types of end users and the challenges at various scales.

Seeing many things puts me in a good position to not think everyone has Google-like problems, for example. It also helps me drive to a particular area of focus, e.g., no one helps DS/MLEs structure their code bases to enable junior folks to not make a mess of things.

I have been on platform teams, but most companies don't have them. Knowing the leverage you can get by building abstractions to simplify common workflows gives me hope that we can show value and have someone pay for our solution.

What's your experience building an open-source startup? What challenges and benefits come with this model?

It's hard. In our case, what is open-source is only part of the product. So it's like managing two things! A lot of opinions on what to call what, how you separate/synergize the open source and closed source, etc.

But having something open-source allows you to connect more easily with users than just cold-calling someone. So from that perspective, it's great!

Who is the buyer for DAGWorks? I imagine it is meant for large organizations.

We are building a flexible product. Hamilton is a library, and you can get started easily. The DAWorks Platform can offer something to an individual open-source user too.

But yes, we will target larger organizations since they have more complex integration needs (e.g., lineage, integrations with various tools, etc.). Building on top of Hamilton, we can do some cool and interesting stuff more easily than one can do now.

How many data engineering pieces do you tackle in your day-to-day work?

Right now, not much. It's on my TODO to write a little Hamilton dataflow to process metrics for what we're working on.

I saw you were on the front page of Hackernews. Any tips on how to achieve such a feat?

Yep, apply to YC! YC = ycombinator.com

MLOps best practices

What do data science teams often do wrong about ML dataflows?

One thing that comes to mind is code being "over-coupled" to the context.

It's very easy to write some code that tightly couples it to the context for which it was initially developed. E.g., you assume some structure about the problem, like model shape, input types, etc. This helps you get your model out, but it becomes a challenge to maintain over time as your business and context change. Let's say you want to build a model for the US and the UK; how do you augment your pipeline or reuse its parts so you're not duplicating logic or creating one monolithic process that needs to run for everything?

Thinking of what comes next isn't there — and you don't want to over-engineer — so a bit of experience is required to feel the right amount of coupling for the project stage.

What are some best practices for creating efficient and scalable ML pipelines, and how can they be maintained over time?

If you want "efficient" computing, you should go all in on one framework. E.g., PyTorch + ecosystem or TensorFlow + ecosystem.

If you want efficiency in terms of human capital — which I think is the higher cost — then you need something like Hamilton to structure your code so that it is modular, testable, and maintainable, or you need to bring someone with SWE skills to help refactor and give pointers as to how to structure your code.

Individuals build models, but teams maintain them. So you need practices that work for the team. E.g., everyone should be happy to take over someone else's work to ensure maintainability over time. Which generally means better SWE practices for the code in the pipelines.

Do you believe that data and ML stacks will eventually converge, or is there enough difference to justify the existence of ML-specific stacks?

I see ML workflows being a superset of data analytics workflows. But the target users are quite different. I think they will likely converge a bit technology-wise, but top-level product-wise, probably not.

If a model is decaying, what standard practice should one follow apart from retraining the ML model?

Apart from retraining the ML Model? That is my go-to here :)

I think, trying to understand the reason for the decay. E.g., is the data skewed in some way? Or are the inputs to inferences broken? And then, if you can't retrain, target the model to only make inference on what you think it can handle best.

Writing technical content

Stitchfix has historically had some of the best content marketing for data science, the article on Bayesian time series is still highly relevant, and it is a seven-year-old article! Do you have any hints on how you balance creating such high-quality content while at the same time not neglecting the day-to-day business?

We had a group at Stitch Fix - called "Tech branding" who met bi-weekly to discuss this. I was part of that.

So it's about coming up with a set of values to uphold and then getting the right reviewers to review draft posts. At Stitch Fix, there were so many great communicators that it wasn't that hard to manage. But we had to push people to think about good posts, etc. So people did this as a side task, with the outcome being a win for branding for Stitch Fix and a win for personal branding for them by writing a good post.

* The discussion was lightly edited for better readability.