OpenAIWhy we no longer evaluate SWE-bench Verified
Explores why SWE-bench Verified evaluation was discontinued and its implications for benchmarking and verification in software engineering.
OpenAIOur First Proof submissions
Documenting our first proof submissions and the early lessons learned from building a formal-proof submission workflow.
AWS MLAmazon SageMaker AI in 2025, a year in review part 2: Improved observability and enhanced features for SageMaker AI model customization and hosting
Amazon SageMaker AI in 2025: an in-depth look at improved observability, serverless model customization, rolling updates for safer deployments, bidirectional streaming, and expanded connectivity for enterprise inference.
AWS MLAmazon SageMaker AI in 2025, a year in review part 1: Flexible Training Plans and improvements to price performance for inference workloads
A concise look at how SageMaker AI 2025 Part 1 delivers Flexible Training Plans for inference, price performance, and resilient, scalable deployments via Multi-AZ, parallel scaling, and EAGLE-3 adaptive decoding.
AWS MLIntegrate external tools with Amazon Quick Agents using Model Context Protocol (MCP)
Guide to exposing your product as MCP tools and connecting to Amazon Quick Agents via a remote MCP server, detailing deployment options, authentication patterns, and a six-step MCP integration checklist.
CloudflareCode Mode: give agents an entire API in 1,000 tokens
Code Mode enables a server-side MCP gateway that lets agents access the entire Cloudflare API with just two tools (search and execute) by turning intent into JavaScript against a typed OpenAPI spec and executing it in a sandboxed Dynamic Worker Loader, dramatically reducing tokens while enabling progressive discovery.
Ink and SwitchInk Note Jan-Feb '26: How PlayBook Processes User Input
An in-depth look at PlayBook's real-time input pipeline—capturing, enriching, and routing finger and pencil events into a modular gesture system that decouples input handling from rendering and targets low-latency, cross-platform web apps inside a wrapper.
MIT AIStudy: AI chatbots provide less-accurate information to vulnerable users
A technical examination reveals that state-of-the-art language models underperform for less-educated, non-native English-speaking, and foreign-origin users, with higher refusal rates and condescending responses that may spread misinformation, highlighting the need for bias-aware alignment and equitable AI deployment.
DatabricksFrom AI projects to an operational capability
Turning AI experiments into an operational capability requires modernization, unified data governance, and disciplined testing, aligning AI initiatives with business KPIs.
Lambda LabsLambda appoints Charles Fisher as Chief Financial Officer
Lambda appoints Charles Fisher as Chief Financial Officer to lead capital strategy and scale its AI infrastructure for hyperscale growth.
MIT AIExposing biases, moods, personalities, and abstract concepts hidden in large language models
A targeted feature-learning approach reveals and controllably steers hidden abstract concepts—such as biases, moods, and personalities—within large language models by identifying their representations and perturbing them to shape responses.
DatabricksUse Genie Everywhere with Enterprise OAuth
Embed OAuth-secured Genie analytics across Teams and custom apps to unlock natural-language insights in everyday workflows via Copilot Studio and Genie APIs.