OpenAI has released o1 model in API with major feature updates.
OpenAI's o1 API release, Sam Altman's vision, Nvidia's $249 AI computer, Falcon-3 models, and Meta's Apollo video-LLMs reshape AI landscape while ChatGPT reveals groundbreaking correlations.
⚡In today’s Edition (17-Dec-2024):
👨🔧 OpenAI has released o1 model in API with major feature updates.
🔦 Key Highlights from Sam Altman's Vision for OpenAI's Future on their dev-day
🏆 Nvidia’s announced $249 AI computer and dev kit, Jetson Orin Nano Super
📡 New Falcon-3 family of models just dropped, by The Technology Innovation Institute.
📹 Meta introduces Apollo, an incredible family of video-LMMs in 1.5B, 3B, and 7B parameter sizes.
🗞️ Byte-Size Brief:
OpenAI’s o1-preview model achieves 80% accuracy vs doctors' 30% on NEJM cases
Google Gemini captures 50% developer market share in 3 months
ChatGPT finds strong correlations in real-world objects that humans haven’t discovered yet.
👨🔧 OpenAI has released o1 model in API with major feature updates
🎯 The Brief
OpenAI releases o1 model to API with function calling, structured outputs, and vision inputs, and a 200k context window. It is priced at $15/1M input and $60/1M output tokens.
⚙️ The Details
→ Starting Tuesday, o1 will begin rolling out to devs in OpenAI’s “tier 5” usage category, the company said. To qualify for tier 5, developers have to spend at least $1,000 with OpenAI and have an account that’s older than 30 days since their first successful payment. O1 replaces the o1-preview model that was already available in the API.
→ Unlike most AI, so-called reasoning models like o1 effectively fact-check themselves, which helps them avoid some of the pitfalls that normally trip up models. As a drawback, they often take longer to arrive at solutions.
→ The o1 model introduces a reasoning_effort parameter for controlling model's thinking depth. Vision capabilities are being rolled out with potential rate limits and subscription requirements.
→ OpenAI released a new snapshot of its o1 model today. The latest snapshot of the named o1-2024-12-17 shows major performance gains: 48.9% on SWE-bench Verified, 76.6% on LiveCodeBench, and 79.2% on AIME 2024.
→ In a note on its website, OpenAI said that the newest o1 should provide “more comprehensive and accurate responses,” particularly for questions pertaining to programming and business, and is less likely to incorrectly refuse requests. These improvements also extend to o1 pro mode.
→ OpenAI said that this new version of o1 is in the API — and, soon, it will be there in ChatGPT. And it is a “new post-trained” version of o1. Compared to the o1 model released in ChatGPT two weeks ago, this one, “o1-2024-12-17,” improves on “areas of model behavior based on feedback”
→ o1-2024-12-17 (coral bars) consistently outperforms previous versions, achieving nearly 100% accuracy in structured outputs and function calling tasks
→ Technical advancements include 60% reduction in reasoning tokens compared to o1-preview, supporting enhanced efficiency.
→ OpenAI’s Realtime API now supports WebRTC—you can add Realtime capabilities with just a handful of lines of code. They have also cut prices by 60%, added GPT-4o mini (10x cheaper than previous prices), improved voice quality, and made inputs more reliable.
→ WebRTC integration enables broader platform compatibility with native support for browser-based apps, mobile clients, and IoT devices. Realtime API pricing updates: GPT-4o audio costs reduced by 60% to $40/1M input tokens, GPT-4o mini priced at $10/1M input tokens.
→ Today (17-Dec-2024) OpenAI also brought preference fine-tuning to its fine-tuning API; preference fine-tuning compares pairs of a model’s responses to “teach” a model to distinguish between preferred and “non-preferred” answers to questions. And the company launched an “early access” beta for official software developer kits in Go and Java.
🔦 Key Highlights from Sam Altman's Vision for OpenAI's Future on their dev-day
🎯 The Brief
OpenAI's CEO Sam Altman outlines strategic vision during Dev Day, emphasizing O-series models focused on reasoning capabilities, agent development, and future AI infrastructure scaling. The company positions itself to create trillions in market value through advanced AI products and services.
⚙️ The Details
→ Sam Altman emphasizes the strategic importance of reasoning models over larger models, targeting scientific advancement and complex coding capabilities. No-code tools are in the pipeline, but initial focus remains on making coders more productive
→ OpenAI advises startups to build solutions that leverage improving model capabilities rather than patching current limitations. They expect rapid improvements in the O series models.
→ Sam Altman defines an AI agent as a system handling long-duration tasks with minimal supervision, moving beyond simple automation like restaurant bookings.
→He sees potential for "trillions of dollars" in new market value through AI-enabled products and services
→ Future pricing models might shift from per-seat basis to compute-based pricing with dedicated GPU resources.
🏆 Nvidia’s announced $249 AI computer and dev kit, Jetson Orin Nano Super
🎯 The Brief
NVIDIA launches Jetson Orin Nano Super Developer Kit at $249, delivering 67 TOPS (Trillion Operations per second) AI performance - a 70% boost over its predecessor. The compact edge AI computer packs an Ampere GPU with 1024 CUDA cores, 32 tensor cores, and a 6-core ARM CPU, enabling advanced AI development at an affordable price point.
⚙️ The Details
→ The hardware delivers 102GB/s memory bandwidth with 8GB LPDDR5 memory. The performance boost comes through enhanced GPU, memory, and CPU clock speeds.
→ The kit features comprehensive connectivity with two MIPI CSI-2 camera connectors, four USB 3.2 Gen2 ports, and PCIe expansion slots supporting both 2-lane and 4-lane configurations.
→ Video capabilities include 1080p30 encoding and robust decoding support: 1x 4K60, 2x 4K30, 5x 1080p60, and 11x 1080p30 in H.265 format.
→ Power consumption ranges from 7W to 25W, making it efficient for edge deployments. The compact form factor measures 103mm x 90.5mm x 34.77mm.
→ NVIDIA's AI software stack integrates Isaac for robotics, Metropolis for vision AI, and Holoscan for sensor processing. Developers can leverage TAO Toolkit for model fine-tuning and access pre-trained models from NGC catalog.
📡New Falcon-3 family of models just dropped, by The Technology Innovation Institute.
🎯 The Brief
Technology Innovation Institute releases Falcon3, a new family of open-source LLMs ranging from 1B to 10B parameters, optimized for science, math and code capabilities. The flagship Falcon3-10B-Base achieves state-of-the-art performance with 83.0 on GSM8K and 22.9 on MATH-Lvl5 benchmarks. Available in Huggingface.
⚙️ The Details
🎯 Performance Highlights:
Falcon3-1B outperforms SmolLM2-1.7B and matches Gemma-2-2B
3B variant surpasses larger models like Llama-3.1-8B through knowledge distillation
The 7B model achieves performance on par with Qwen2.5-7B
Mamba variant leads State Space Language Models at 7B scale
💫 Core Innovations
The models use head dimension of 256 optimized for FlashAttention-3, enabling high throughput
Architecture spans 18-40 layers for transformer models and 64 layers for Mamba variant
All transformer models are Llama-compatible for better ecosystem integration
Models available in multiple variants including Instruct, GGUF, GPTQ-Int4/8, AWQ and 1.58-bit
→ The training utilized a single pretraining run on 1024 H100 GPUs for the 7B model, processing 14 trillion tokens of web, code, STEM and curated data. Knowledge distillation enabled creation of smaller efficient models using less than 100GT of high-quality data.
→ Technical architecture features head dimension of 256 optimized for FlashAttention-3, with 18-40 layers for transformer models and 64 layers for the Mamba variant. All transformer models maintain Llama compatibility for ecosystem integration.
→ The Mamba variant received 1.5T additional tokens of training, significantly improving mathematical capabilities. Models support 32K context length (except 1B with 8K) and are available in multiple variants including Instruct, GGUF, GPTQ-Int4/8, AWQ and 1.58-bit.
→ Three Critical License Components: Its based on Apache 2.0, with a series of modifications. The Falcon License mandates compliance with TII's dynamic Acceptable Use Policy, requires explicit attribution for derivatives using specific phrasing, and provides royalty-free copyright/patent licenses with defensive termination clauses.
📹 Meta introduces Apollo, an incredible family of video-LMMs in 1.5B, 3B, and 7B parameter sizes.
🎯 The Brief
Meta unveils Apollo, a new family of video-LMMs demonstrating exceptional efficiency where their 3B parameter model outperforms most 7B models and their 7B variant surpasses many 30B models. The project introduces "Scaling Consistency" principle enabling cost-effective model development by validating architectural choices on smaller scales.
⚙️ The Details
→ Apollo comes in three sizes: 1.5B, 3B, and 7B parameters. The architecture builds on Qwen1.5 and Qwen2 foundation, released under A2.0 license for broad accessibility and research collaboration.
→ The research systematically explored hundreds of model variants, focusing on video sampling strategies, token integration methods, and training schedules. The team discovered that design decisions made on smaller models reliably transfer to larger scales.
→ Apollo-7B achieves an impressive 66.3% overall score on ApolloBench, their novel evaluation framework that tests models across OCR, Egocentric, Spatial, Perception, and Reasoning tasks.
→ Training strategy incorporates a balanced mixture of video and multimodal data. Fine-tuning video encoders exclusively on video data shows significant improvements in reasoning and domain-specific tasks.
🗞️ Byte-Size Brief
A new research paper claim o1-preview is superior to doctors on reasoning tasks. The o1-preview model achieved 80% accuracy on 143 challenging New England Journal of Medicine case reports, compared to physicians' 30% success rate, demonstrating significant advances in medical reasoning capabilities.
A new report came that Google Gemini's market share among developers went from ~5% in September to >50% market share last week, as per OpenRouterAI published numbers.
A Reddit post went viral, showing how ChatGPT finds strong correlations that humans haven’t discovered. For example, Your gut bacteria might secretly influence how you make risky decisions, or the correlation betweee Seasonal Sunlight Exposure and Language Complexity.