Two Weeks With GPT-5: The Improvements Are Real, But One Issue Bugs Me

I’ve been running GPT-5 through its paces for about two weeks now. I’ve written probably over a hundred prompts across different use cases—work tasks, personal projects, creative experiments, technical debugging. It’s time to share an honest assessment.

First, the good news: the improvements over GPT-4 are noticeable and meaningful. This isn’t a incremental update. OpenAI made real progress in several areas that matter for daily use.

But there’s one issue that keeps annoying me, and I think it’s worth discussing honestly.

## What’s Genuinely Better

**Contextual understanding has improved**

I tested this with a specific scenario: I asked GPT-5 to analyze a customer complaint, then provide handling recommendations while considering the company’s brand voice and long-term customer relationship implications.

Previous versions would typically give me something either too corporate (“We sincerely apologize for any inconvenience…”) or too casual (“Just give them a refund”). The response felt generic—like it was following a template rather than reasoning about the situation.

GPT-5’s response surprised me. It actually balanced the competing priorities. The language felt more natural for a real customer service context, acknowledging the issue while proposing concrete solutions that aligned with what a premium brand might want.

I’m not saying this is revolutionary, but it showed me the model could handle multi-layered constraints in a way that felt thoughtful rather than mechanical.

**Long documents work better**

I’ve been working with longer documents lately—technical specifications, lengthy reports, comprehensive documentation. The context handling in GPT-5 feels more stable.

With GPT-4, long documents sometimes caused inconsistencies. References to earlier parts of the document would get lost, or the model would contradict itself halfway through a response.

GPT-5 handles this better. I uploaded a 50-page industry report and asked for an analysis of its key arguments. The response stayed coherent throughout, and the conclusions aligned with the evidence presented in the earlier sections. I haven’t pushed it to really extreme lengths (I don’t have many 200K token documents lying around), but for documents up to 100 pages, it performs solidly.

**Code quality has taken another step forward**

I write code for work—nothing too complex, mostly automation scripts, data processing pipelines, and the occasional web application. GPT-4 was already helpful, but debugging was often a multi-turn process. I’d describe an error, get a suggestion, try it, describe the next error, and so on.

With GPT-5, my first-time success rate has improved noticeably. Not every piece of code works perfectly on the first attempt, but the percentage of code that runs correctly without modification has gone up significantly.

More than that, GPT-5 sometimes anticipates edge cases I didn’t think to mention. It might include error handling or warnings that show it’s thinking ahead rather than just reacting to what I explicitly asked for.

**The responses feel more natural**

This one’s hard to quantify, but the conversational flow feels smoother. Responses don’t feel as obviously “generated.” They read more like something a knowledgeable person might actually say, rather than something optimized to look like AI output.

This matters more than I expected. When I’m writing something that will be seen by others, I want it to sound human. GPT-5 moves closer to that goal.

## The Problem That Bugs Me

Now, the issue I mentioned.

**The restrictions feel overzealous at times.**

I understand why OpenAI implements content policies. AI safety is real, and some restrictions exist for good reasons. But GPT-5’s guardrails sometimes get in the way of completely legitimate use cases.

Example: I wanted to analyze the rhetoric in a political article and have GPT-5 evaluate whether the arguments were presented objectively. Not to agree or disagree with the conclusions—just to assess the presentation style and whether the author was making claims or hedging appropriately.

GPT-5 declined. “I can’t evaluate the objectivity of content” or some variation of that. It wouldn’t even engage with the question.

I understand the concern about AI making judgments about truth or objectivity. But “analyze the rhetorical style” is a completely normal analytical task that humans do constantly. A journalism student might be asked to do exactly this as homework.

Another example: I was doing competitive analysis for a business project and wanted GPT-5 to help me think through what strategies a competitor might realistically pursue based on their recent moves. Standard competitive analysis work. But GPT-5 flagged it as potentially supporting “anti-competitive behavior.”

I got it. The model is trying to err on the side of caution. But this kind of over-correction makes the tool less useful for perfectly normal professional tasks.

Is there a middle ground? I hope so. Perhaps OpenAI could offer different tiers of access where professional users can opt into more flexibility in exchange for accepting more responsibility.

## Comparing to the Competition

Since I use multiple AI tools, here’s how GPT-5 stacks up in my experience:

Aspect	GPT-5	Claude 4	Gemini
Creative writing	Excellent	Very Good	Good
Code quality	Excellent	Excellent	Good
Analytical reasoning	Strong	Very Strong	Strong
Real-time information	Good	Limited	Good
Conversational flow	Excellent	Very Good	Good
Restrictions	Significant	Moderate	Moderate
Integration ecosystem	Strong	Good	Excellent

My take: GPT-5 is excellent for most tasks, and the restrictions are a real but manageable drawback. If the restrictions don’t affect your use cases, you’ll likely find GPT-5 worth the upgrade.

## Who Should Upgrade?

**Worth it if:**
– You use AI extensively for professional work
– You need the best available language understanding
– The creative writing quality matters for your work
– You’re currently on GPT-4 and finding its limitations frustrating

**Stick with GPT-4 (or use free tier) if:**
– Your current AI tool meets your needs
– You’re a casual user
– You find the existing models “good enough”
– The subscription cost matters to you

## Final Thoughts

GPT-5 is a genuine step forward. The improvements in reasoning, context handling, and output quality are real and noticeable in daily use. I’ve found myself relying on it more than I expected, even for tasks I used to do manually.

The restrictions are a legitimate frustration, though. They reflect OpenAI being cautious, which I respect. But cautious can become annoying when it starts blocking reasonable requests.

Overall, I’d say the upgrade is worthwhile for power users. For casual users, the free tier or GPT-4 might be sufficient for now.

That’s my two weeks of honest experience. Questions welcome in the comments.

Neil

Neil Shum is a 10-year internet industry veteran with experience spanning product management, startup founding, and AI-native product development.

Starting his career at a Fortune 500 tech company, Neil spent his early years deep in product strategy and user research. In 2018, he co-founded a H5 game startup that scaled to 500,000 users before being acquired in 2022.

These days, Neil focuses on exploring how AI is reshaping product design, user experience, and business models. He’s particularly interested in the practical side of AI adoption—what works, what doesn’t, and what founders and product teams should actually pay attention to.

When not analyzing AI tools or writing about emerging trends, you’ll find him testing new AI products, mentoringearly-stage founders, or reading way too many newsletters about LLMs.

Leave a Comment Cancel Reply