Launch HN: Baselit (YC W23) – Automatically Reduce Snowflake Costs

48 points

1/20/1970

11 days ago

by sahil_singla

Comments

ukd1

How does this differ from https://espresso.ai ?

11 days ago

karamazov

Chiming in, I'm one of the founders of Espresso AI - we do both query optimization and warehouse optimization, both of which are hands-off. In particular we're beta-testing a fully-automated solution for query optimization (it's taken a lot of engineering!).

Based on the responses here I think we're a superset of where baselit is today, but I could be wrong.

11 days ago

sahil_singla

Would love to see how you’re doing warehouse optimization. Is there a demo video I can look at?

11 days ago

sahil_singla

They are more focused on query optimizations whereas we do warehouse optimizations. We are inclined towards warehouse optimizations due to it being completely hands-off.

11 days ago

ukd1

cool - they're kinda complimentary?

11 days ago

altdataseller

Espresso does warehouse too so they’re competitors

11 days ago

sahil_singla

Yeah, kind of.

Though I'm not exactly sure how their product works, I saw from the landing page that it's broadly focused on query optimization.

We've done a lot of experimentation with query optimizations, both with and without LLMs, and we don't think it's possible to build a fully automated solution. However, a workflow solution might be feasible.

11 days ago

mustansirm

Not a Snowflake user, but I'm curious as to your business model. What barriers are there to prevent Snowflake from reverse engineering your work and including it as part of their native experience? Is the play here an eventual aquisition?

11 days ago

jaggederest

It has been my experience working on similar projects for cutting down e.g. aws spend that the primary billers often have a really hard time accepting or incorporating bill-reducing features. All the incentives they have are geared to want increased spend, regardless of the individual preferences of any members of the company, and so that inertia is really hard to overcome.

11 days ago

sahil_singla

That resonates with what we have heard from our customers.

11 days ago

sahil_singla

Our belief is that building a good optimization tool is not aligned with Snowflake's interests. Instead they seem to be more focused on enabling new use cases and workloads for their customers (their AI push, for example, with Cortex). On the other hand, helping Snowflake users cut down costs is our singular focus.

11 days ago

fock

or to phrase it differently: what kind of market is this, where big companies are herded into tarpits of SaaS which apparently have exactly the same problems as running it the old way had (namely inefficient usage of ressource). Just now you have to pay some symbiotic start-up instead of hiring some generic performance-person.

11 days ago

bluelightning2k

It's not really in their interests?

11 days ago

candiddevmike

What happened to your other idea?

11 days ago

mritchie712

not OP, but for us, LLM's just aren't good enough yet to write analytical SQL queries (and they may never be good enough using pure SQL). Some more context here: https://news.ycombinator.com/item?id=40300171

11 days ago

sahil_singla

+1. We came to a similar conclusion when we were working on this idea.

11 days ago

datadrivenangel

Productizing cost optimization experience! Great to see more options in this space, as so many companies are surprised by the costs of cloud.

For the warehouse size experimentation, how do you value processing time?

11 days ago

sahil_singla

We optimize warehouse sizes for a dbt project as a whole. Users can set a maximum project runtime as one of the parameters for experimentation. The optimization honors this max runtime while tuning warehouse sizes for individual models.

11 days ago

michaelmior

How does this differ from Keebo?

https://keebo.ai/

11 days ago

sahil_singla

We are different from Keebo in the way we approach warehouse optimization. Keebo seems to dynamically change the size of a warehouse - we have found that to be somewhat risky, especially when it's downsizing. Performance can take a big hit in this case. So we've approached this problem in two ways:

1. Route queries to the right-sized warehouse instead of changing the size of a particular warehouse itself. This is part of our dbt optimizer module. This ensures that performance stays within acceptable limits while optimizing for costs.

2. Baselit's Autoscaler optimally manages the scaling out of a multi-cluster warehouse depending on the load, which is more cost effective than upsizing the warehouse.

11 days ago

gregw2

Does anyone support this sort of optimization for AWS Redshift?

I built some lambdas that looked at Queuelength and turned off Redshift Concurrency Scaling for WLM queues to mitigate costs for less critical afternoon workloads but it was always cruder than I wanted.

11 days ago

iknownthing

Does it use AI?

11 days ago

sahil_singla

No AI yet - all algorithms are deterministic under the hood. Although we are considering tinkering with LLMs for query optimization, as part of our roadmap.

11 days ago

mritchie712

We (https://www.definite.app/) were also working on AI for SQL generation. I can see why you pivoted, it doesn't really work! Or at least well enough to displace existing BI solutions.

edit: context below is mostly irrelevant to snowflake cost optimization, but relevant if you're interested in the AI for SQL idea...

I'm pretty hard headed though, so we kept going with it and the solution we've found is to run the entire data stack for our customers. We do ETL, spin up a warehouse (duckdb), a semantic layer (cube.dev) and BI (dashboards / reports).

Since we run the ETL, we know exactly what all the data means (e.g. we know what each column coming from Stripe really means). All this metadata flows into our semantic layer.

LLM's aren't great at writing SQL, but they're really good at writing semantic layer queries. This is for a couple reasons:

1. better defined problem space (you're not feeding the LLM irrelevant context from a sea of tables)

2. the query format is JSON, so we can better control the LLM's output

3. the context is richer (e.g. instead of table and column names, we can provide rich, structured metadata)

This also solves the Snowflake cost issue from a different angle... we don't use it. DuckDB has the performance of Snowflake for a fraction of the cost. It may not scale as well, but 99% of companies don't need the sort of scale Snowflake pitches.

11 days ago

brunoa-ca

It may be worth having a look at Raia (https://raia.live)

I'm actively developing this product.

One of the things I added is something called "Assistant Profiles". Given the fact that you know the DB structure, you can create a custom Assistant Profile and adjust it to fit the underlying DB better, which improves the results quite a lot

You can then expand the connection to other external systems and automate a lot of the analysis processes your users may have

I'm happy to work with you to make it work for your use case

5 days ago

iknownthing

Kind of surprised to hear that given the number of companies I've seen pitching natural language to SQL queries.

11 days ago

ericzakariasson

was also doing that last year with outfinder.co, but like the others said, it's really hard

10 days ago

redwood

Can you clarify what you mean by "they're really good at writing semantic layer queries"?

Re JSON query format: you mean that's what you're using?

11 days ago

mritchie712

Yes, the queries for the semantic layer we're using are in JSON. Here's an example query:

    {
      "measures": ["stories.count"],
      "dimensions": ["stories.category"],
      "filters": [
        {
          "member": "stories.isDraft",
          "operator": "equals",
          "values": ["No"]
        }
      ],
      "timeDimensions": [
        {
          "dimension": "stories.time",
          "dateRange": ["2015-01-01", "2015-12-31"],
          "granularity": "month"
        }
      ],
      "limit": 100,
      "offset": 50,
      "order": {
        "stories.time": "asc",
        "stories.count": "desc"
      },
      "timezone": "America/Los_Angeles"
    }

10 days ago

kwillets

I was thinking about an AI to feed you the proper Snowflake sales pitch each time a query runs expensive or fails a benchmark. At my previous org it could replace several headcount.

11 days ago