Anthropic's latest model can take 'The Great Gatsby' as input
">
Historically and even today, poor memory has been an impediment to the usefulness of text-generating AI. As a recent piece in The Atlantic aptly puts it, even sophisticated generative text AI like ChatGPT has the memory of a goldfish. Each time the model generates a response, it takes into account only a very limited amount of text -- preventing it from, say, summarizing a book or reviewing a major coding project.
But Anthropic's trying to change that.
Today, the AI research startup announced that it's expanded the context window for Claude -- its flagship text-generating AI model, still in preview -- from 9,000 tokens to 100,000 tokens. Context window refers to the text the model considers before generating additional text, while tokens represent raw text (e.g., the word "fantastic" would be split into the tokens "fan," "tas" and "tic").
So what's the significance, exactly? Well, as alluded to earlier, models with small context windows tend to "forget" the content of even very recent conversations -- leading them to veer off topic. After a few thousand words or so, they also forget their initial instructions, instead extrapolating their behavior from the last information within their context window rather than from the original request.
Given the benefits of large context windows, it's not surprising that figuring out ways to expand them has become a major focus of AI labs like OpenAI, which devoted an entire team to the issue. OpenAI's GPT-4 held the previous crown in terms of context window sizes, weighing in at 32,000 tokens on the high end -- but the improved Claude API blows past that.
With a bigger "memory," Claude should be able to converse relatively coherently for hours -- several days, even -- as opposed to minutes. And perhaps more importantly, it should be less likely to go off the rails.
In a blog post, Anthropic touts the other benefits of Claude's increased context window, including the ability for the model to digest and analyze hundreds of pages of materials. Beyond reading long texts, the upgraded Claude can help retrieve information from multiple documents or even a book, Anthropic says, answering questions that require "synthesis of knowledge" across many parts of the text.
Anthropic lists a few possible use cases:
Digesting, summarizing, and explaining documents such as financial statements or research papersAnalyzing risks and opportunities for a company based on its annual reportsAssessing the pros and cons of a piece of legislationIdentifying risks, themes, and different forms of argument across legal documents.Reading through hundreds of pages of developer documentation and surfacing answers to technical questionsRapidly prototyping by dropping an entire codebase into the context and intelligently building on or modifying it"The average person can read 100,000 tokens of text in around five hours, and then they might need substantially longer to digest, remember, and analyze that information," Anthropic continues. "Claude can now do this in less than a minute. For example, we loaded the entire text of The Great Gatsby into Claude ... and modified one line to say Mr. Carraway was 'a software engineer that works on machine learning tooling at Anthropic.' When we asked the model to spot what was different, it responded with the correct answer in 22 seconds."
Now, longer context windows don't solve the other memory-related challenges around large language models. Claude, like most models in its class, can't retain information from one session to the next. And unlike the human brain, it treats every piece of information as equally important, making it a not particularly reliable narrator. Some experts believe that solving these problems will require entirely new model architectures.
For now, though, Anthropic appears to be at the forefront.