Simon Willison’s Post

Founder of the Datasette open source project

2mo

My notes on OpenAI's new GPT-4o (the "o" is for "omni") model release. Short version: it's not a huge leap in "intelligence" over GPT-4, but the multi-model audio support, the ability to output images (which appears a whole lot more sophisticated than DALL-E) and the drop in price are all very interesting new capabilities. https://lnkd.in/gh56yMTZ

Hello GPT-4o

simonwillison.net

4 Comments

Tyler Nelson

CEO & Co-founder @ Alset Technologies | Ranger Veteran

2mo

Very interesting for sure.

Joseph Martinez

Software Engineer

2mo

Always appreciate your write ups on these releases!

Younes Brahimi

Software Engineer

2mo

And a great increase in speed!

See more comments

To view or add a comment, sign in

More Relevant Posts

WebProNews

1,163 followers
2mo
Report this post
OpenAI Unveils GPT-4o With Real-Time Capabilities - https://lnkd.in/gr93e5j8 OpenAI took the wraps off of its latest AI model, GPT-4o designed to "reason across audio, vision, and text in real time."

OpenAI Unveils GPT-4o With Real-Time Capabilities

https://www.webpronews.com
Like Comment
To view or add a comment, sign in
Sumit Sahoo

Solution Architect (Mobility, Cloud, and AI)
2mo
Report this post
Real time text, audio and video in OpenAI GPT-4o, that’s 🤯. Can’t wait to test it out. https://lnkd.in/dmPunkk5

Hello GPT-4o

openai.com
Like Comment
To view or add a comment, sign in
Kumar Saurabh 🇮🇳

Internal Audit || Risk Advisory Professional || Bussiness Consultant || Sales & Marketing ||
1mo Edited
Report this post
OpenAI .. OpenAI' GPT-4o.... OpenAI has unveiled GPT-4o, a powerful free for all AI model with vision, text, and voice . The new OpenAI model brings together the intelligence and capabilities of GPT-4 to all users. As per the reports, GPT-4o offers GPT-4 level intelligence and it is much faster and improves its capabilities across text, vision & audio . Also , OpenAI's new model GPT-4o makes human to machine interaction much more natural and far easier. OpenAI's GPT-4o also has a vision, allowing users to upload photos and documents, and can start conversations about the same. One can also use the Memory feature, and browse to search real-time information during conversations. OpenAI's GPT-4O model ...👇👇
Like Comment
To view or add a comment, sign in
Christian Duvall

CTO | HealthTech | AI
9mo
Report this post
It's amazing how good the new AI image tools (e.g., GPT-4V) are at some things, yet still so bad in many other ways. Computer Vision is actively used in a lot of industries with a high rate of success, but GPT-4V fails at even some of the lighter use cases because it's trained to be a generalist, not a specialist. Still, very impressive as a sandbox. https://lnkd.in/euw3mU3i

GPT-4 with Vision: Complete Guide and Evaluation

blog.roboflow.com
Like Comment
To view or add a comment, sign in
THE DECODER - EVERYTHING AI

1,885 followers
6mo
Report this post
1/ Logan Kilpatrick, OpenAI's developer relations manager, considers prompt engineering "a bug, not a feature," and expects the effort required to get good results to drop by a factor of 10 in the future. 2/ Nevertheless, major AI companies such as Google and Microsoft currently rely on complex prompts to achieve the best results on benchmarks to promote their AI models, such as Google's Gemini Ultra and OpenAI's GPT-4. 3/ Kilpatrick predicts that future AI systems will be inherently more capable, making complex prompts less relevant.

Prompt engineering "is a bug, not a feature"

the-decoder.com
Like Comment
To view or add a comment, sign in
Alexander Acker

Co-Founder & CEO of logsight.ai | Building Enterprise AI | Lecturer AI & Cloud Operations | Mentor
2mo Edited
Report this post
𝗬𝗔𝗚𝗣𝗧𝗥 - 𝗬𝗲𝘁 𝗮𝗻𝗼𝘁𝗵𝗲𝗿 𝗚𝗣𝗧-𝟰𝗼 𝗥𝗲𝘃𝗶𝗲�� 😉 Yesterday, OpenAI released their new… Model? Product? Version? I'm not sure what to call it, but I can say I was deeply impressed. While no technical details were disclosed, it’s clear that OpenAI engineers had to solve an enormous number of challenges to achieve this level of functionality. Here are a few things I want to highlight: 𝗟𝗮𝘁𝗲𝗻𝗰𝘆 The challenge of real-time processing for audio and video is immense. There was no noticeable delay when interacting with the app, suggesting that OpenAI has reached a significant milestone by mapping audio to audio directly as a primary modality. 💡 Explanation: Previous approaches involved translating audio to text with one AI model, processing the text with a second model (LLM), and then synthesizing the text back to voice with a third model. The low latency seen in the demo indicates they’ve managed to go directly from voice to voice. This necessitates a novel approach to tokenization, meaning the human voice input is divided into pieces that the model can process. 𝗗𝗲𝗰𝗲𝗻𝘁𝗿𝗮𝗹𝗶𝘇𝗲𝗱 𝗗𝗮𝘁𝗮 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 To leverage the computing power of user devices, it is likely that OpenAI preprocesses voice and video data directly on the device. For instance, OpenAI might have developed a neural-first, streaming codec that uses a small, low-energy neural network on the end-user device for tokenization. This approach could mitigate the delays caused by compressing and decompressing audio and video data. 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗗𝗮𝘁𝗮 Amid all the hype, which I share, we must consider what data was used to train this model. Numerous challenges need to be addressed in the training data to help the model learn how to handle them. How does it respond to interruptions? How does it react emotionally to various situations, like when to express sadness, happiness, excitement, or offense? The emotional aspect of this app is fascinating. How did it learn this? The most viable training data source would be all the audio data available online, such as YouTube videos, podcasts, movies, and recorded calls. This will likely spark another debate on the rightful use of such data for model training, and rightfully so, in my opinion. As highlighted by Jim Fan Another approach could be generating synthetic data. Using GPT-4 to create realistic dialogues and text-to-voice models to produce audio from those texts is viable for expanding training data. However, I believe this cannot be the sole source of data for achieving a model of this quality. There is so much more to discuss, especially the emotional aspect. Did anyone else feel it was rude to interrupt the model during the presentation? Honestly, I felt empathy for it. Amazing…

Introducing GPT-4o

https://www.youtube.com/

3 Comments
Like Comment
To view or add a comment, sign in
Rob Falconer

Chief AI Officer
2mo
Report this post
Breaking! GPT 4o (four Oh) to be released, with GPT4 capabilities but faster and easier to use. Plus they just hinted that there are more big announcements coming in the next few weeks about 'the future' - GPT5 https://lnkd.in/eisJrz44

OpenAI's GPT-4o: Everything We Know So Far

https://www.techopedia.com
Like Comment
To view or add a comment, sign in
Patrick C Miller
8mo
Report this post
OpenAI introduces GPT-4 Turbo: Larger memory, lower cost, new knowledge https://lnkd.in/dunE-Xw4

OpenAI introduces GPT-4 Turbo: Larger memory, lower cost, new knowledge

arstechnica.com
Like Comment
To view or add a comment, sign in
Divyanshu (Div) Dixit, CFA

Co-Founder | Ex-Investment Banker | AI Enthusiast |
2mo
Report this post
Spoilt for choice - Llama this or GPT that? The last 2 weeks have been intense when it comes to LLMs. With so many releases packed into April, it is hard to keep up. Here's the lay of the land (as of today) to help catch up, wherein we shall discuss: 1. Anthropic's Claude Opus: Released mid-March 2. Cohere's Command R+: Released Apr 04 3. Google's Gemini Pro public access: Released Apr 09 4. OpenAI's GPT4-Turbo-2024-04-09: Released Apr 09 5. Mistral AI's Mixtral 8x22B: Released Apr 09 6. X’s Grok 1.5 Vision: Released Apr 12 7. Reka AI's Reka: Released Apr 15 8. Hugging Face's Idefics2: Released Apr 15 9. EleutherAI's Pile T5: Released Apr 15 10. Microsoft's WizardLM-2: Released Apr 15 11. Meta's Llama3: Released Apr 18 Happy Reading! #ai #LLMs #opensource #tech

Spoilt for choice - Llama this or GPT that?

div.beehiiv.com

3 Comments
Like Comment
To view or add a comment, sign in
Evan Dela-Grammaticas

Strategy | Growth & Business Performance
2mo Edited
Report this post
Google newest AI innovation is designed to rival OpenAI 4o omni-channel model - but is it enough? Google is about to revolutionise the online search experience with AI, making it smarter and more personalised than ever before. WIRED reports that at the recent I/O conference, Elizabeth Reid the new head of Google Search, introduced AI-driven updates that personalise and summarise search results, enhancing user interaction. (https://lnkd.in/gnCE9aTw) But is this enough to compete with OpenAI new advanced GPT4o model? The new model, launched Monday by Mira Murati, is an impressive demonstration of a near-human chat experience that offers a seemingly authentic comprehensive personal interaction through text, voice, and real-time video. (https://lnkd.in/g7vbCMQp) Irrespective of how the cards fall, one thing is certain - the pace of innovation is accelerating, and the future of search is rapidly evolving.

OpenAI GPT-4o guessing May 13th's announcement

https://vimeo.com/
Like Comment
To view or add a comment, sign in

3,633 followers

41 Posts

View Profile Follow

Simon Willison’s Post

More Relevant Posts

Introducing GPT-4o

https://www.youtube.com/

OpenAI GPT-4o guessing May 13th's announcement

https://vimeo.com/

Explore topics