Devin is now generally available

155 points

1/21/1970

16 days ago

by neural_thing

Comments

winkle

First place I usually go is the terms of service and what they are granting themselves rights to. Not excited about how broad this is "3.2 License: By using the Services, you hereby grant to Cognition, its affiliates, successors, and assigns a non-exclusive, worldwide, royalty-free, fully paid, sublicensable, transferable license to reproduce, distribute, modify, and otherwise use, display, and perform all acts with respect to the Customer Data as may be necessary for Cognition to provide the Services to you."

16 days ago

CaptainFever

"as may be necessary for Cognition to provide the Services to you" kind of makes sense IMO. Does that mean they'll only use the license (note: they only get a license, not ownership) to provide services to you? Is it a restriction?

16 days ago

hazmazlaz

Yes, that clause/phrase restricts the company's rights with respect to their license to your data. Essentially, a clause like that is necessary for users to interact with the service. Makes sense when you think about it, how can they provide service if they can't use the data you provide them?

It's a pretty typical clause you'll see in most SaaS policies.

Source: I work for a SaaS, but I am not a lawyer, caveat emptor.

15 days ago

winkle

I want to pay for their product, but not enough that I have to ask my lawyer about the language. I did see that one of the features of the enterprise plan is custom terms, but that's not the plan I'm interested in.

15 days ago

portaouflop

How do you use other Saas products or is this the first one you consider using ?

12 days ago

bigs

I always wonder how enforceable these blanket rights would be in court. Didn’t Meta claim to own end users’ photos in the T&Cs back around 2009 and it got challenged and shot down (ianal)?

16 days ago

CaptainFever

I did some Googling on this.

https://web.archive.org/web/20111103081406/http://consumeris...

Original article that caused the outrage. In particular, the TOS did not say they owned your pictures, but it did give them a license that was quite broad, which included using your likeness in advertisements. However, the change that caused the outrage was that the license no longer expired on account deletion nor content removal.

https://www.npr.org/2009/02/17/100783689/facebook-users-angr...

News article about the outrage.

https://www.nytimes.com/2009/02/19/technology/internet/19fac...

News article about the walkback.

I could not find anything about it being challenged in court.

16 days ago

Topfi

No public testing, no benchmarks, no clear information on context window size or restrictions for extensive use, no comparison with the newest Claude Sonnet 3.5 or O1, nothing.

What we do get is a price of $ 500,- per month from a company that has been caught lying about this very product [0] and has never allowed independent testing.

Cognition, I am sorry to tell you, but there is no reason to trust you. In fact, there are multiple good reasons no to, even if you offered Devin at a fraction.

If this were e.g. Anthropic launching a new beyond Opus size model that was still performant and came with "chain-of-thought" capabilities, a far more extensive context window that still fully passes needle in haystack and is absolutely solid in sourcing from provided files, keeps on track even when provided with large documents, has few or no restrictions on usage and comes with extensive, verifiable benchmarks that showcase this offering being a significant upgrade over other models, maybe such a price could be justified.

You know why Cognition? Because they haven’t actively lied. What they did instead was let people use their models and actually test the advantages. Even Claude Instant way back when had certain use cases that made them have their own niche and showed they could execute before expanding with 2 and the larger context, then 3 with more applications. You never did any of that, you never gave anyone reason to believe what you claim, you didn’t even release benchmarks. See the difference?

Seems more like a simple cash grab, attempting to ride the O1 wave. OpenAI has a hard time justifying their Pro pricing, you doubling that makes this an out of season April fools joke. Waiting for the inevitable reporting that this is just another API wrapper for Claude or ChatGPT with our old faithful RAG.

[0] https://www.youtube.com/watch?v=tNmgmwEtoWE&pp=ygUJZGV2aW4gY...

15 days ago

preommr

From the second video: "We can focus on the things that excite us rather than just the maintenancing [maintenance] work".

But these are the kinds of problems that help shape the product. The software archictecture should be a compression of a deep and intuitive understanding of the problem space. How can you develop that knowledge if you're just delegating it to a black box that can't operate at a near-human level?

I've used ai based tools to great success, but on an ad-hoc basis, for specific and small functions or modules. To do the integration part requires an understanding of what abstraction is appropriate where. I don't think these tools are good that.

16 days ago

cowsup

Good software can be art. And like all art, we have hit the stage in which code can also be cranked out en masse, thoughtlessly, for a quick buck. It was only inevitable.

16 days ago

a-arbabian

Mike from Vesta (first demo video) claims Devin saved "at least a hundred hours" debugging API integrations. That seems crazy to me - API integrations rarely take that long, and any engineer would spot issues like wrong API keys almost immediately. The tool might be more valuable for non-engineers creating initial drafts, but by the time you've written all the detailed specs for Devin, a mid-level engineer could have made significant progress on the task.

16 days ago

jlund-molfese

I wish API integrations never took that long! But it's dependent on who you're integrating with and what your product looks like. I'm the engineering manager of the payroll integrations team at a company that does workplace savings plans.

Sometimes even when you're making calls to dozens of different endpoints they're easy, but other times, you end up guessing at how to access undocumented functionality within a GraphQL API that has introspection turned off, or working around entity modeling that's completely different from your system and requires a lot of translation. Or you work with an API whose indexes variably start from 1, 0, -1, and -2 in different endpoints. These generally aren't hard technical challenges to solve, and something like Devin that could take care of most surface-level problems you see while integrating with some XML API from 2007 would be welcome.

There are companies like https://www.tryfinch.com and https://www.merge.dev that try to solve these issues, but their abstractions also reduce flexibility and aren't a perfect for all HRIS integration use cases right now.

16 days ago

rguldener

I agree. Integrations can be incredibly cumbersome if you have to learn each API from scratch.

There are also more flexible solutions like https://www.nango.dev

It handles the API-specific complexities (auth, retries, webhooks, per-customer config, pre-built templates) but allows you to implement the exact use case + data model you need.

It's open source/source available.

(disclaimer: I am a founder)

16 days ago

babyent

Hey that’s really cool. I read the FAQ on connection.

So if there are 10 users, the free tier lets me give them the ability to add up to 3 integrations? Is it 3 per user?

Thanks

16 days ago

neom

Been using https://www.laminar.run/ here and there and found it a good mix of abstract and being able to get in there.

16 days ago

mike_yu

clearly nobody else has spent all the time i have integrating really old mortgage software :(

16 days ago

jonny_eh

I doubt Devin could write an integration for an underspecified legacy API like that. Whenever I have had to, I've needed to talk to support/engineering on the other side.

16 days ago

mike_yu

that's definitely still the case. devin drafts my emails for issues it runs into (which i tell it to do) and i send them off.

this is definitely slower than if i were doing it full time, but i run a company. i go from customer meeting to customer meeting and spend 5-10 min a day taking whatever is blocking devin and pasting it into an email to the partner to get a response for devin.

16 days ago

mike_yu

although i do agree with you w.r.t integrating like, modern software with well documented/good apis

16 days ago

Taylor_OD

Debugging is a pretty vague word. I know a LOT of api endpoints with shit documentation. Could Devin generate documentation for a vast number of api endpoints that could have theoretically taken a hundred hours to write?

16 days ago

paradite

The trend of AI tools to make a bold claim at launch, just have lots of caveats caveats caveats caveats when actually releasing to public.

16 days ago

Yusefmosiah

Looking for comprehensive benchmarks with Devin vs Cursor + Claude 3.6 vs ChatGPT o1 Pro.

In my own experience using Cursor with Claude 3.5 Sonnet (new) and o1-preview, Claude is sufficient for most things, but there are times when Claude gets stumped. Invariably that means I asked it to do too much. But sometimes, maybe 10-20% of the time, o1-preview is able to do what Claude couldn’t.

I haven’t signed up for o1 Pro because going from Cursor to copy/pasting from ChatGPT is a big DevX downgrade. But from what I’ve heard o1 Pro can solve harder coding problems that would stump Claude or o1-preview.

My solution is just to split the problem into smaller chunks that make it tractable for Claude. I assume this is what Devin’s doing. Or is Devin using custom models or an early version of the o1 (full or pro) API?

16 days ago

cbhl

This predates the o1 release, but the folks behind Devin did do some early evaluation of o1 vs 4o vs Devin back in September:

https://x.com/cognition_labs/status/1834292718174077014

I'd expect a very different experience with Devin vs the IDE-forks -- it provides status updates in Slack, runs CI, and when it's done it puts up a pull request in GitHub.

16 days ago

Yusefmosiah

Thanks, but that comparison is for old models, a different, non-shipped version of Devin called “Devin-base”, and doesn’t include Claude.

Slack integration, automatically pushing to CI, etc., are relatively low-value compared to the questions of “does it write better code than alternatives?”, “can I depend on it to solve hard problems?”, “will I still need a Cursor and/or ChatGPT Pro subscription to debug Devin’s mistakes?”

16 days ago

gexla

Should have come with a prominent warning at the app site that you're heading towards a $500 sub. I'm sure it's mentioned in places I didn't see it. Ideally, you would agree to the sub before you even create an account. This could save LOADS of signups from people who aren't your intended users.

16 days ago

anticensor

They have a $50 tier too, but that one is not currently open to new members.

13 days ago

mfdupuis

I'm curious to see how this plays out when it comes to deploying and maintaining production-grade apps. I know relatively little about infrastructure and DevOps, but that's the stuff that actually always seems complicated when it goes from going to MVP to production. This question feels particularly important if we're expecting PMs and designers to be primary users.

That said, I'm super excited about this space and love seeing smart folks putting energy into this. Even if it's still a bit aspirational, I think the idea of cutting down time spent debugging and refactoring and putting more power in the hands of less technical folks is awesome.

16 days ago

waldenyan20

hey guys - Walden here, one of the founders. Excited to have you try out Devin. Reach out here if you have any questions!

16 days ago

Buttons840

Hi Walden,

my name is Devin and I don't like sharing a name with a product. Will you please consider changing the name?

There is always the chance that someone named Devin will do something that gives your product a bad name. Perhaps some new scandal will involve someone named Devin or something.

I'd also like you to imagine that a hot new erotic AI was named "Walden", and people said things like "I was talking with Walden last night" as a euphemism. How would that make you feel?

16 days ago

mrieck

I'd try it out if you allowed paying $50 for some credits instead of requiring subscription.

Even if that version is limited to only editing public Github repos. $500 to see how well it works is too much.

16 days ago

anticensor

$50/month in the subscription price in the personal tier (currently not accepting new users) includes 50 credits per month in it, and $500/month teams tier includes 250 credits per month in it. This is what I see with my current user and when I try to sign up as a new user respectively.

I'd like to see that $50/month tier reopened to subscribers, and a $0/month+credits tier added (1 concurrent active session only, constrained to small VM spec with immutable rootfs (regular devin VMs have writable rootfs), no automatic knowledge generation, no snapshots, though playbooks allowed).

> Even if that version is limited to only editing public Github repos

Not possible to constrain like that with the current Devin architecture.

16 days ago

badFEengineer

The price seems reasonable, but my main hesitation is on data storage + third party providers- there doesn't seem to be much available information on:

* will you store my code + train on workflows that Devin does for me? * are you piping data to other third party providers (i.e. anthropic, openAI)?

16 days ago

JTyQZSnP3cQGa8B

Why don't any LLM show examples of C++ applications? I have yet to see a tool like that which I would be happy to use at work.

16 days ago

elashri

Or CUDA code, as this will be somehow ironic given that LLMs inference engines and training are CUDA code in some way.

16 days ago

anticensor

It can do that too, I tried that too.

16 days ago

anticensor

I tried it with C and C++ code, it can do them but not very well.

16 days ago

menaerus

How large the repositories are that it can "reason" about?

16 days ago

anticensor

It's smart enough to figure out the relevant part to change once it scans the codebase. It could do Linux kernel if not for the Linus' policy.

16 days ago

menaerus

Thanks. For example, if I feed it with the 10 MLoC repository, how long does it take before it can start working through the problem?

Would it really work well with the mixture of at least C and assembly which you implied it would with Linux kernel example?

16 days ago

anticensor

> For example, if I feed it with the 10 MLoC repository, how long does it take before it can start working through the problem?

The initial scan may take about a hour with a repository that size, and the knowledge base buildup will take about a week (but that one happens during the coding process). It will not continuously scan the entire codebase once it builds up the knowledge of the repository.

16 days ago

menaerus

Fascinating and scary at the same time.

Is this the beginning when intelligence, domain expertise and ability to research becomes commodity?

16 days ago

cloudking

When crafting projects from scratch, does your system actually fix it's own errors?

That seems to be the challenge with Cursor Agent in it's current form, it generates a bunch of code that has bugs and requires a lot of iteration.

16 days ago

swyx

as someone who has been trying you guys out for the past 8 months... you need a speed lever. default devin is way too slow for me :/ i asked scott for a "demo mode" first time we met

16 days ago

waldenyan20

latest update is around 3-4x faster than it was back in Apr but we are working on making it much faster still!

16 days ago

kordlessagain

How is that done?

16 days ago

neural_thing

Devin got a lot faster for me recently, made it a lot more enjoyable to use

16 days ago

anticensor

You should really add an option to spawn a VM with immutable rootfs, current VMs all have writable rootfs which cost a lot to run, immutable VMs could be much much cheaper to operate (possibly enabling free tiers even).

Also to mention, "suggest knowledge" modal is broken (it silently ignores changes made if you edit the suggested knowledge).

Another issue, sleep&snapshot system is still prone to race conditions in certain cases.

16 days ago

k2xl

What model does it use under the hood?

How much context window does it load when it is solving tasks?

How does it determine which files to load into context?

16 days ago

anticensor

It's a finetuned version of o1-preview sized distillation of o1-pro if I remember correctly, with an Azure Ubuntu VM with writable filesystem and internet access.

16 days ago

thekevan

Can you only use it with a $500 / month subscription?

The word "try" is VERY different than the actual case, which is "pay for use".

If the answer to the first line is yes, how do I request my email be deleted? I started to sign up but I am not a use case for $500 a month at the moment.

16 days ago

anticensor

It's monthly subscription plus prepaid compute credits (called ACU in the UI).

16 days ago

adamgordonbell

I'm excited to try it. I use aider quite a bit and tried opendevin at some point.

What is the pricing story?

Can I use it as side project dev or is the target enterprise customers only / mainly?

16 days ago

waldenyan20

we have plenty of small, early stage teams that use Devin but it's optimized designed to fit into a team's workflow. you can of course give it a try and see if it's a good fit for your projects!

16 days ago

anticensor

Any estimates regarding when the personal tier ($50/month+credits) will resume accepting signups?

16 days ago

tsak

Is it just me that finds it ironic that you're looking for software developers?

16 days ago

anticensor

Hey, can you fix the issue where the editor times out and Devin gets stuck?

16 days ago

thekhatribharat

How does one estimate the number of ACUs required to finish a task?

16 days ago

anticensor

It spends about 2 to 10 ACU per hour in the small VM, and ten times as much on the large one. No credits spent during sleep and "waiting for response" time as far as I observed.

16 days ago

waldenyan20

a helpful benchmark is that a typical frontend task is about 1-2 ACUs, but really depends on the complexity of the task

16 days ago

yuppiemephisto

Does it work with more obscure languages like Lean 4?

16 days ago

anticensor

It can work with any language, as it interacts with the VM and can read compiler messages.

16 days ago

marcusverus

How should I go about arranging a demo?

15 days ago

throw83288

Not really product related: The current trajectory of LLMs/Agents, what is your career advice to someone in school for Computer Science right now?

16 days ago

papichulo2023

Are you a human founder?

16 days ago

waldenyan20

very much so!

16 days ago

xena

Can you upload a picture on the company domain with your face, and holding a piece of paper containing your name, the date, the time, and the current bitcoin block number? That would make us more likely to believe you are properly human.

16 days ago

adamgordonbell

It seems like a lot of the magic is providing LLMs with tools that let it work like a human would. This approach makes more sense to me then the model of expecting an LLM to just emit a giant block of code for a change, given a pile of RAG context.

( removed pricing q, as I missed it is $500 / month for whole teams. I get why that is the pricing, but doesn't work for me to try it in side projects sadly )

16 days ago

k2xl

It says at the top $500 per month

16 days ago

steve_adams_86

Starting at. If you want, they will take more of your money too.

16 days ago

binarynate

Am I the only one who laments this trend of using a common first name as a product name? When I see this, my first reaction is that the company lacks any empathy for people who have the name they're co-opting.

https://www.washingtonpost.com/technology/interactive/2021/p...

https://archive.is/w8r58

16 days ago

slickdork

As someone named Devin who works in tech, I greatly hope this project fails. :)

16 days ago

Buttons840

No, I'm Devin.

At least our names got attached to an upstanding product, and one that is likely to languish and fail. We're not the next "Alexa", I hope.

16 days ago

arockwell

100% agree. It is shitty and rude. Not to mention it does not even make sense.

16 days ago

stuckkeys

Not sure about the “rude” part. It really depends on the person. But yes, it can get annoying rally fast. Therefore “shitty” indeed. But yeah, I do think it is very cheezy and lazy when companies do this. When I talked to someone that worked there, I guess it was because of the hard constant “X” -it would make a better Hollywood movie if they said Artificial. Language. Expanded. Xenomorphic. Amplified. A. L. E. X. A.

16 days ago

wpm

Devin comes from "dev in chat", a common phrase in livestream chat rooms to signal that the developer of the game or product being showcased was present.

16 days ago

solarpunk

why not just call it dev

16 days ago

hiatus

Dev is also a human name.

16 days ago

alexjplant

The short version of my name is one letter away from "Alexa". You can imagine how many comments and jokes about Amazon's AI assistant I've been party to for the past decade. Although it may be hard for you to believe I actually don't really care, much as you probably don't care about the hot dogs bearing your name that you see when you walk down the cold aisle in the grocery store. Should they instead call the anthropomorphized AI assistant something like "W'rkncacnter" to preclude the possibility of name collisions (chaotic entities imprisoned in alien stars notwithstanding)?

16 days ago

psygn89

My Japanese mom always thought it was weird to put peoples names to destructive forces like hurricanes. I think she said in Japan use some numbering system (might be as simple as incrementing, I don't remember).

16 days ago

daveguy

The US did this for a long time -- only numbering storms. In 1953 they switched to a list of names, female only. Then 25 years later to male and female names. It is kinda weird, and if they're destructive enough the name is retired. I think the idea is that people would pay more attention to human names in the warning process as the hurricanes approach land.

16 days ago

BrawnyBadger53

I could definitely see the support response to a major storm being better when it's easier to communicate and identify a specific storm.

16 days ago

laptopdev

When I was 7, my family's Japanese foreign exchange student was being introduced to me. She bursted out laughing saying my nick name Dev Dev sounded like "fart fart" or "fat fart".

Had the nickname fart fart until my sister moved out of the house.

Maybe you could confirm, but ChatGPT tells me in Japanese Debu colloquially and offensively means "fat" or "chubby", and Bu is an onomotapoeia for a fart noise, like "prrt" in English.

16 days ago

ben_w

> "W'rkncacnter" to preclude the possibility of name collisions (chaotic entities imprisoned in alien stars notwithstanding)?

Bold move, but imagine the patch notes:

• Fixed bug where assistant attempted to unmake the fabric of reality

• Resolved issue where “Set alarm for 7 AM” triggered a rampancy cascade

• Improved pronunciation of “Lh’owon” for calendar appointments

Probably still a better bet than Durandal, definitely an improvement over Tycho.

And then there was Leela…

16 days ago

binarynate

It appears your name is Alex, so I'm not surprised that the Alexa product name doesn't bother you. I suspect you would feel different if your name was Alexa. If the product was named Nate, it would bother me. There are plethora of other options for product names that companies can use besides common first names.

16 days ago

nprateem

But I bet you're never late for your train

16 days ago

zamadatix

I think it's different when the product is an tool you call by name to use vs just the name of the tool. E.g. the article is about "Alexa" and I'm not sure most people even realize there are ways to use it without saying "Hey Alexa" every time. Without that type of callback association it's not a very serious concern.

16 days ago

mewpmewp2

I don't care about it potentially being a real name, because I doubt it would be a household item, but somehow the name itself for this particular product seems offputting.

If it had to be a name for a product, it seems like to give me some sort of cheap male grooming or AXE body spray product vibes.

16 days ago

decGetAc

Probably, I share a name with a product and I couldn't care less. It's wild that some would feel bad much less consider it lacking empathy.

I don't like first name product names for other reasons but not because they share a name with humans named the same

16 days ago

bravetraveler

They gotta be Joshing us! How's Dic-I mean, Richard?

Just having fun. I see what you mean and vaguely support it... I just won't lose anything over it

16 days ago

Cataleya

[flagged]

16 days ago

hiatus

How is it lacking empathy? Devin is not something invoked by voice, so I fail to see the comparison to Alexa.

edit:

> Eschew flamebait. Avoid generic tangents. Omit internet tropes.

> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.

16 days ago

debacle

I couldn't find anywhere a list of languages that this tool supports. What makes this tool better than e.g. cursor?

16 days ago

didip

Aren't you guys afraid that Copilot will simply crushed you? They have all the training data afterall.

16 days ago

apwell23

There are third party CI software like circleCI that didn't get crushed by github because its a high touch business that they don't want to get into.

There are many niches to be captured.

16 days ago

paradite

I thought circleCI wasn't doing too well?

16 days ago

anticensor

No, Devin is an autopilot.

16 days ago

anticensor

Can you also add Discord, Telegram, Gitlab, Forgejo integrations for those whose use them for their software development discussions?

14 days ago

Oras

> Small frontend bugs and edge cases - tag Devin in Slack threads

And other points where it should shine. How does it compare to using Cursor? Is it the slack integration?

16 days ago

waldenyan20

the workflow is quite different from Cursor or Copilot - Devin is an asynchronous tool. A common way to use Devin is to kick off a few sessions in the morning, while you work on other higher priority tasks. It feels a lot more like working with a colleague that you can tag in Slack or go back and forth with on PR comments

16 days ago

anticensor

Devin has 2 hour, 6 hour and 24 hour inactivity limits before it pauses the work and temporarily deprovisions (sleep) the VM, so you have to supervise it every so often.

16 days ago

jonny_eh

Does it open PRs?

16 days ago

anticensor

Yes, it does if you request it to do so. You can tell it to not do too.

16 days ago

allusernamesare

How does Devin compare to lovable.dev ? I've been thoroughly impressed by their ability to build and host functioning apps from very basic prompts.

16 days ago

daft_pink

Is there any evidence this works better than Claude 3.5?

16 days ago

projectileboy

I work with a team at Nubank that has been using Devin. I would say that it doesn't quite make sense to compare it to Claude 3.5, because Devin isn't really like Copilot; it's more like an assistant to which you can assign a project. We're using it only for particular use cases, but for those particular use cases it's like having a superpower.

16 days ago

amkkma

Based on this, what is the outlook for software dev generally, and junior and mid level devs?

16 days ago

throw83288

More specifically: What kind of advice does GP have for Computer Science students in school right now?

I've been frankly terrified of the pace of LLM development since 2022.

16 days ago

servercobra

Do you have any examples of the kinds of projects you would assign it to?

16 days ago

Yusefmosiah

The reason it makes sense to compare them is there are problems that Claude 3.5 (or o1) can’t solve. Can Devin solve them? If yes, it’s easily worth the $500. If no, it’s a harder sell.

16 days ago

jonny_eh

> We're using it only for particular use cases

Can you share concrete examples?

16 days ago

projectileboy

I can’t really be too specific. But I can say that at least one pattern of problem it tackles very effectively is: “we’re migrating from X to Y, and it’s going to touch a ton of files, and the nature of that migration is much more involved than what we can reasonably hope to accomplish with sed and a bash script.”

16 days ago

anticensor

I tasked Devin with writing a project proposal (in a topic I am not going to disclose here) with multiple documents including feasibility analysis, grant applications, legal analysis and post-implementation training materials and it was almost perfect at it.

16 days ago

jonny_eh

Amazing claims, if only it could be publicly shared and scrutinized.

16 days ago

mike_yu

i use this every day and a lot of the magic is in the workflow and agent layer -- claude 3.5 can generate a snippet of code for you but it isn't going to open a browser, read api docs, actually make calls to the api, debug, run the code and make sure it builds and works, etc

16 days ago

bfeynman

Anthropic and OpenAI have certainly been working on this behind the scenes, while they try to see how much better they can get models, they will let others pay for the current state until they find it valuable. The shift we are seeing now is already happening, and they are taking an even larger macroscopic approach by creating computer/tool use, along with the context protocol, so that when it's released it will work with almost any IDE and system...

16 days ago

xpasky

Why wouldn't it? Just give it a shell tool. (Something like claude.vim, perhaps.)

16 days ago

kordlessagain

Like this: https://github.com/Mittaai/webwright

16 days ago

kgilpin

People are saying it’s apples and oranges, but with Computer Use taken into account, this seems like a fair question.

https://docs.anthropic.com/en/docs/build-with-claude/compute...

16 days ago

daft_pink

I wish they offered a computer use reference implimentation on Windows instead of a linux docker container.

16 days ago

WesleyJohnson

Any plans or capabilities for something local? Not a locally hosted Devin, mind you, but a way to interact with on-prem source control repos?

16 days ago

nextworddev

Devin really wasted a lot of time going GA because they lost a lot of their initial buzz

16 days ago

DidYaWipe

Might be an interesting headline if it said what "Devin" is.

15 days ago

adastra22

You never say what Devin is.

16 days ago

ar_ca

[dead]

16 days ago

Topfi

I was about to state that there is nothing here that 4o or Sonnet couldn’t do with very limited prompting, then I noticed that the hamburger menu on mobile doesn’t even work and had to retract that statement. Both wouldn’t have made such a mistake.

Thanks, this only cements where Devin lies in comparison and explains the lack of benchmarks and independent testing…

15 days ago

yuppiemephisto

Ok, describe your experience.

16 days ago

ar_ca

I'm impressed with Devin's capabilities. Its good at building standard web applications and implementing common patterns. I is particularly effective for enterprises needing basic web pages or solutions that follow established development patterns.

While Devin handles routine development tasks well, it still requires oversight and guidance when dealing with complex integrations or custom business logic. It was helpful in reducing the time spent on boilerplate code and basic setup tasks.

16 days ago

brilee

Why does this response sound llm generated to me? Maybe it's the phrase "however it's worth noting"

16 days ago

samatman

And starting with "based on".

Definitely LLM slop. Shameless.

16 days ago

bn-l

Account created 17 hours ago.

So dang clumsy. I mean come on.

15 days ago

behnamoh

[flagged]

16 days ago

projectileboy

I understand why you would have that impression, but for what it's worth, my team has been helping other teams internally to use Devin, and we also did due diligence and experimented with OpenHands, as well as try rolling our own solution. While OpenHands is cool for what it is, neither it nor our homegrown solution came within a mile of doing what Devin could do.

16 days ago

kittikitti

Are they actually just using 3rd party calls or are they hosting the GPT themselves?

16 days ago

KMnO4

Well yeah, and GPT is just a bunch of matrix multiplications configured in a specific way.

The bells and whistles are what turn it into something useful.

16 days ago

spookie

How useful and at what computing cost? Genuine questions. Because from what I gather there are quite a number of "loops" to check for correctness of output, etc... that makes it expensive fast. Not talking about money, but compute.

16 days ago