engblogs

Why we no longer evaluate SWE-bench Verified

Explores why SWE-bench Verified evaluation was discontinued and its implications for benchmarking and verification in software engineering.

2/20/2026

OpenAI

Our First Proof submissions

Documenting our first proof submissions and the early lessons learned from building a formal-proof submission workflow.

2/20/2026

AWS ML

Amazon SageMaker AI in 2025, a year in review part 2: Improved observability and enhanced features for SageMaker AI model customization and hosting

Amazon SageMaker AI in 2025: an in-depth look at improved observability, serverless model customization, rolling updates for safer deployments, bidirectional streaming, and expanded connectivity for enterprise inference.

2/20/2026

AWS ML

Amazon SageMaker AI in 2025, a year in review part 1: Flexible Training Plans and improvements to price performance for inference workloads

A concise look at how SageMaker AI 2025 Part 1 delivers Flexible Training Plans for inference, price performance, and resilient, scalable deployments via Multi-AZ, parallel scaling, and EAGLE-3 adaptive decoding.

2/20/2026

AWS ML

Integrate external tools with Amazon Quick Agents using Model Context Protocol (MCP)

Guide to exposing your product as MCP tools and connecting to Amazon Quick Agents via a remote MCP server, detailing deployment options, authentication patterns, and a six-step MCP integration checklist.

2/20/2026

Cloudflare

Code Mode: give agents an entire API in 1,000 tokens

Code Mode enables a server-side MCP gateway that lets agents access the entire Cloudflare API with just two tools (search and execute) by turning intent into JavaScript against a typed OpenAPI spec and executing it in a sandboxed Dynamic Worker Loader, dramatically reducing tokens while enabling progressive discovery.

2/20/2026

Ink and Switch

Ink Note Jan-Feb '26: How PlayBook Processes User Input

An in-depth look at PlayBook's real-time input pipeline—capturing, enriching, and routing finger and pencil events into a modular gesture system that decouples input handling from rendering and targets low-latency, cross-platform web apps inside a wrapper.

2/20/2026

MIT AI

Study: AI chatbots provide less-accurate information to vulnerable users

A technical examination reveals that state-of-the-art language models underperform for less-educated, non-native English-speaking, and foreign-origin users, with higher refusal rates and condescending responses that may spread misinformation, highlighting the need for bias-aware alignment and equitable AI deployment.

2/19/2026

Databricks

From AI projects to an operational capability

Turning AI experiments into an operational capability requires modernization, unified data governance, and disciplined testing, aligning AI initiatives with business KPIs.

2/19/2026

Lambda Labs

Lambda appoints Charles Fisher as Chief Financial Officer

Lambda appoints Charles Fisher as Chief Financial Officer to lead capital strategy and scale its AI infrastructure for hyperscale growth.

2/19/2026

MIT AI

Exposing biases, moods, personalities, and abstract concepts hidden in large language models

A targeted feature-learning approach reveals and controllably steers hidden abstract concepts—such as biases, moods, and personalities—within large language models by identifying their representations and perturbing them to shape responses.

2/19/2026

Databricks

Use Genie Everywhere with Enterprise OAuth

Embed OAuth-secured Genie analytics across Teams and custom apps to unlock natural-language insights in everyday workflows via Copilot Studio and Genie APIs.

2/19/2026