Summary: an early review of xAI’s Grok 4, a multi-agent, multi-modal artificial intelligence/large language model, based on the AI Research Podcast, with some limitations mentioned.
On 29 July, I came across the following assessment of Elon Musk’s new
large language model (LLM), Grok 4. As indicated by its title,‘xAI
built an AI team, not a chatbot inside Grok 4 heavy’s insane power’,
the podcast is highly laudatory (I initially described it as ‘a quasi
panegyric’), although the two AI Research Podcast reviewers
address some of Grok 4’s weak points towards the end of their
assessment.
As I have not yet used Grok 4, I cannot offer a first-hand opinion of
the system, but other YouTube reviews also suggest that Grok 4
constitutes a significant improvement over the earlier version of Elon
Musk’s LLM (Grok 3).
Below is a summary of the podcast, generated by Grok 3 from a transcript
of the clip produced by another AI system:
Grok 4 Heavy, developed by xAI, is a ground-breaking
artificial intelligence model distinguished by its massive 1.7 trillion
parameter architecture and a hybrid modular design. Unlike monolithic
models, it incorporates multiple specialised neural subsystems, each
optimised for specific tasks such as coding, mathematical reasoning or
linguistic comprehension. Its multi-agent architecture enables several
AI agents to collaborate in parallel, sharing and verifying insights to
tackle complex problems, making it exceptionally effective for tasks
requiring deep reasoning.
Trained on a vast and diverse dataset, including the real-time Twitter
(X) stream and high-quality texts (books, academic
papers, code repositories), Grok 4 Heavy employs intensive reinforcement
learning to prioritise logical and accurate reasoning. With a context
window of up to 256,000 tokens (roughly 200,000 words), it excels at
analysing lengthy texts, such as books or legal documents, while
integrating real-time data through internet searches or external APIs.
In benchmark tests, Grok 4 Heavy performs impressively: it achieved a
perfect 100% score on the American Invitational Mathematics
Examination, far surpassing Grok 3’s 52%, and scored nearly 50%
on the Humanity’s Last Exam, almost doubling competitors like
Gemini (27%). It also excels in coding (75% on SWE-bench) and physics
(87% on GPQA), demonstrating exceptional capability in technical and
scientific domains. Its multi-modal capabilities, though limited,
support image analysis and generation, with planned enhancements for
audio and video understanding.
Grok 4 Heavy is integrated into X, Telegram (through
a £230 million partnership), and is accessible via grok.com
and mobile apps for iOS and Android. An API
compatible with OpenAI’s SDKs facilitates developer adoption.
It is transforming fields like coding (game development, code
optimisation), education (personalised tutoring), research (literature
synthesis) and productivity (document summarisation, real-time data
management).
However, limitations persist: responses can be overly verbose, the
multi-agent mode is slow (sometimes taking minutes) and its visual
capabilities lag behind specialised models. Safety and ethical
alignment, particularly given the raw Twitter data, remain contentious,
though xAI has strengthened filters following early incidents. The high
cost (£230/month for Heavy mode) and lack of transparency (no public
technical documentation) are also barriers.
xAI plans rapid improvements, particularly in multi-modal
capabilities (video, audio), efficiency (reducing latency), and is
considering open-source variants. Long-term, the goal is to advance
towards artificial general intelligence (AGI), with potential
integrations into Tesla and SpaceX systems. Grok 4
Heavy marks a significant leap towards an AI intellectual partner, but
its rapid development raises critical questions about responsible use
and societal impact.
Other links on Grok 4:
The presentation with Elon Musk on the day of the launch of Grok 4: https://x.com/i/broadcasts/1lDGLzplWnyxm
https://x.ai/news/grok-4Lausanne, the above was published on the third day of the eighth month of the year two thousand and twenty-five.