My notes on OpenAI's new GPT-4o (the "o" is for "omni") model release.
Short version: it's not a huge leap in "intelligence" over GPT-4, but the multi-model audio support, the ability to output images (which appears a whole lot more sophisticated than DALL-E) and the drop in price are all very interesting new capabilities. https://lnkd.in/gh56yMTZ
OpenAI Unveils GPT-4o With Real-Time Capabilities - https://lnkd.in/gr93e5j8
OpenAI took the wraps off of its latest AI model, GPT-4o designed to "reason across audio, vision, and text in real time."
OpenAI ..
OpenAI' GPT-4o....
OpenAI has unveiled GPT-4o, a powerful free for all AI model with vision, text, and voice . The new OpenAI model brings together the intelligence and capabilities of GPT-4 to all users.
As per the reports, GPT-4o offers GPT-4 level intelligence and it is much faster and improves its capabilities across text, vision & audio .
Also , OpenAI's new model GPT-4o makes human to machine interaction much more natural and far easier.
OpenAI's GPT-4o also has a vision, allowing users to upload photos and documents, and can start conversations about the same. One can also use the Memory feature, and browse to search real-time information during conversations.
OpenAI's GPT-4O model ...👇👇
It's amazing how good the new AI image tools (e.g., GPT-4V) are at some things, yet still so bad in many other ways.
Computer Vision is actively used in a lot of industries with a high rate of success, but GPT-4V fails at even some of the lighter use cases because it's trained to be a generalist, not a specialist.
Still, very impressive as a sandbox.
https://lnkd.in/euw3mU3i
1/ Logan Kilpatrick, OpenAI's developer relations manager, considers prompt engineering "a bug, not a feature," and expects the effort required to get good results to drop by a factor of 10 in the future.
2/ Nevertheless, major AI companies such as Google and Microsoft currently rely on complex prompts to achieve the best results on benchmarks to promote their AI models, such as Google's Gemini Ultra and OpenAI's GPT-4.
3/ Kilpatrick predicts that future AI systems will be inherently more capable, making complex prompts less relevant.
𝗬𝗔𝗚𝗣𝗧𝗥 - 𝗬𝗲𝘁 𝗮𝗻𝗼𝘁𝗵𝗲𝗿 𝗚𝗣𝗧-𝟰𝗼 𝗥𝗲𝘃𝗶𝗲�� 😉
Yesterday, OpenAI released their new… Model? Product? Version? I'm not sure what to call it, but I can say I was deeply impressed. While no technical details were disclosed, it’s clear that OpenAI engineers had to solve an enormous number of challenges to achieve this level of functionality. Here are a few things I want to highlight:
𝗟𝗮𝘁𝗲𝗻𝗰𝘆
The challenge of real-time processing for audio and video is immense. There was no noticeable delay when interacting with the app, suggesting that OpenAI has reached a significant milestone by mapping audio to audio directly as a primary modality.
💡 Explanation: Previous approaches involved translating audio to text with one AI model, processing the text with a second model (LLM), and then synthesizing the text back to voice with a third model. The low latency seen in the demo indicates they’ve managed to go directly from voice to voice.
This necessitates a novel approach to tokenization, meaning the human voice input is divided into pieces that the model can process.
𝗗𝗲𝗰𝗲𝗻𝘁𝗿𝗮𝗹𝗶𝘇𝗲𝗱 𝗗𝗮𝘁𝗮 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴
To leverage the computing power of user devices, it is likely that OpenAI preprocesses voice and video data directly on the device. For instance, OpenAI might have developed a neural-first, streaming codec that uses a small, low-energy neural network on the end-user device for tokenization. This approach could mitigate the delays caused by compressing and decompressing audio and video data.
𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗗𝗮𝘁𝗮
Amid all the hype, which I share, we must consider what data was used to train this model. Numerous challenges need to be addressed in the training data to help the model learn how to handle them. How does it respond to interruptions? How does it react emotionally to various situations, like when to express sadness, happiness, excitement, or offense? The emotional aspect of this app is fascinating. How did it learn this?
The most viable training data source would be all the audio data available online, such as YouTube videos, podcasts, movies, and recorded calls. This will likely spark another debate on the rightful use of such data for model training, and rightfully so, in my opinion.
As highlighted by Jim Fan Another approach could be generating synthetic data. Using GPT-4 to create realistic dialogues and text-to-voice models to produce audio from those texts is viable for expanding training data. However, I believe this cannot be the sole source of data for achieving a model of this quality.
There is so much more to discuss, especially the emotional aspect. Did anyone else feel it was rude to interrupt the model during the presentation? Honestly, I felt empathy for it. Amazing…
Breaking! GPT 4o (four Oh) to be released, with GPT4 capabilities but faster and easier to use.
Plus they just hinted that there are more big announcements coming in the next few weeks about 'the future' - GPT5
https://lnkd.in/eisJrz44
Spoilt for choice - Llama this or GPT that? The last 2 weeks have been intense when it comes to LLMs.
With so many releases packed into April, it is hard to keep up. Here's the lay of the land (as of today) to help catch up, wherein we shall discuss:
1. Anthropic's Claude Opus: Released mid-March
2. Cohere's Command R+: Released Apr 04
3. Google's Gemini Pro public access: Released Apr 09
4. OpenAI's GPT4-Turbo-2024-04-09: Released Apr 09
5. Mistral AI's Mixtral 8x22B: Released Apr 09
6. X’s Grok 1.5 Vision: Released Apr 12
7. Reka AI's Reka: Released Apr 15
8. Hugging Face's Idefics2: Released Apr 15
9. EleutherAI's Pile T5: Released Apr 15
10. Microsoft's WizardLM-2: Released Apr 15
11. Meta's Llama3: Released Apr 18
Happy Reading!
#ai#LLMs#opensource#tech
Google newest AI innovation is designed to rival OpenAI 4o omni-channel model - but is it enough?
Google is about to revolutionise the online search experience with AI, making it smarter and more personalised than ever before.
WIRED reports that at the recent I/O conference, Elizabeth Reid the new head of Google Search, introduced AI-driven updates that personalise and summarise search results, enhancing user interaction. (https://lnkd.in/gnCE9aTw)
But is this enough to compete with OpenAI new advanced GPT4o model?
The new model, launched Monday by Mira Murati, is an impressive demonstration of a near-human chat experience that offers a seemingly authentic comprehensive personal interaction through text, voice, and real-time video. (https://lnkd.in/g7vbCMQp)
Irrespective of how the cards fall, one thing is certain - the pace of innovation is accelerating, and the future of search is rapidly evolving.
CEO & Co-founder @ Alset Technologies | Ranger Veteran
2moVery interesting for sure.