I tested Devin.

What AI Service is truly worth it?

Before diving into my experience, let me break down the onboarding process. Devin requires a 500usd+ monthly subscription and needs some setup work. You start by connecting your GitHub (they support multiple users in an org), then for each repo you work with, Devin needs to be set up similar to onboarding a new dev team member. The process includes auto-cloning, installing dependencies, and setting up continuous package updates. There’s also a knowledge setup phase where you basically teach Devin about your codebase. Oh, and there’s optional Slack integration that works as a reminder system for tasks and updates.

For the best results with Devin, you need to tell it how to check its progress, break down big tasks, share detailed requirements upfront, and you can even run multiple sessions in parallel. After setup, you just select your repo context and start giving it tasks.

You have to go to the devin webapp ui, which is basically a chat ui, but a bit more intricate. I will not lie, this was cool. As you have the typical chatbot ui in the left, with the chat history collapsed, and the chat ui. Then on the right half, you have Devin’s workspace, which consists of you being able to “follow devin”, see the shell, browser, editor, and planner all via diff tabs. Its pretty cool, they beat almost everyone in this UI category.

Aside the UI, I have tried giving Devin only menial frontend tasks, around 4 or so. It was just to edit some Next.js and shadcn styling. It took approximately 3-4 hours to do 5/6 fairly small components I told it to edit, and it particularly gave errors editing the header section. For some reason, it took about an hour (55 minutes to be exact) to mess around with threeJS and creating some component. I eventually told it to stop using threeJS, or it would’ve been an infinite loop. After the entire process was done, it didn’t even open a PR even though I explicitly told it to automatically create a PR after its task is deemed complete. The total cost of this experiment disregarding Devin sleep costs (as sleep is 0 ACU), was approximately 40 ACU in a span of approximately 3 hours. An ACU is a agent compute unit, and it’s quite costly, as you get 750 ACU for 500 a month, so each ACU is approx 2/3 of a USD in value. In the end, Devin did yield a static page in Next.js and shadcn, but there were a ton of errors, and it simply just didn’t look right. I had to manually go over its PR and fix Ui components. Like, don’t get me wrong, the Ui for the Devin stuff is nice and all, but I expect much more functionality and usability for 500 USD a month, especially when it is supposedly the entry tier. Maybe I was in the wrong, because Devin clearly states it does better with shorter commands to do tasks, but who knows. Shouldn’t a dedicated LLM be able to queue up multiple tasks given by a user and just run a loop on a queue to ensure all desired functionality works?

Honestly, there is something similar to what I described. It is called Cline, an extension in VSCode. It basically loops over a queue of the user’s instructions and the functionalities they desire, and it works best with openrouter api key, so you can use any model on a whim and pay-as-you-go. I personally recommend using Claude sonnet 3.5, as it’s leaps and bounds above all the other LLMs when it comes to coding and sheer understanding of code, readability, scalability, and adaptability. Cline uses Claude well, as the prompts to Claude ensure that it can use Claudes computer navigation system which is in beta, and can help debug and test, and find new solutions, whereas other current solutions keep going in circles and get lost. On average, you can probably use cline for approximately 4-7 days and spend 10 dollars in credits. I think that’s way more cost effective, and it actually works incredibly fast and well. Obviously, this solution also has its downside, as it’s still more expensive than the claude / gpt subscriptions which run 20 dollars a month, but it is most time efficient and convenient, especially if used with wispr flow. You can just tell cline what to do via voice, then cline will take care of the rest.

Last but not least, we have Claude pro itself and the beta of RepoPrompt. Basically, repo prompt allows you to open a certain repo, but only feed specific context, so you don’t have to feed in an enormous context to the LLM to work with. The good thing? It copies entire file tree and xml tree, and you get to add your desired command / functionality to add in high lvl language. Then, you copy the entire thing and paste it into claude and it’ll print out xml formatted instructions for you to add to your repo. Obviously, this is prob the most tedious task. HOWEVER, repo prompt also has an apply tab, so you can just apply the diff via the xml instructions given by claude (or your llm of choice). And did I say that repoprompt is FREE? (I know, I know, claude pro aint free, but I definitely think it’s one of those things people should splurge on.)

(Basically ferrying between gpts atp lmao. Claude, cline, open router, v0, Devin)

TLDR;

Between spending 500 USD a month on Devin vs. spending 500 USD a month of the Twitter Pro API, I think I’ll just burn my money as a source of heat. Like seriously, your competitors are much more resourceful for a fraction of the cost. Well actually, just Devin’s competitors, since the unofficial Twitter API providers are much more expensive due to them being nearly 10 times faster.

** Unofficial Twitter API is usually just reverse engineering and decompiling the mobile Twitter app and using its endpoints in an unconventional way, yielding approx 400ms resposne times. Combine this with a close proximity to twitter data centers in Dallas can theoretically drop response times to sub 300ms.

But yeah, this is a glimpse into what I’ve learned and tested so far during the dev locked-in sesh in SF. Obviously, LLMs aren’t perfect, so there is still human intervention on my part to edit their silly errors. However, most of the time, Cline, Claude, and RepoPrompt make me time and task efficient, and aren’t too prone to error. So, I use them sparingly. That’s all for now! Stay tuned, as I prob need a break soon as well. LOL. ttyl