engblogs

How we cut Vertex AI latency by 35% with GKE Inference Gateway

How Vertex AI slashes latency by 35% with GKE Inference Gateway, using load- and content-aware routing, prefix-cache optimization, and admission-controlled queueing to handle context-heavy and bursty workloads at production scale.

2/6/2026

OpenAI

Korea privacy policy

A concise technical overview of South Korea's privacy policy landscape, outlining key regulations, data protection requirements, and implications for compliance in digital services.

2/6/2026

Google Cloud

Starfish Space uses Google Cloud to accelerate satellite servicing in orbit

Starfish Space leverages Google Cloud with Compute Engine and GKE to run millions of Monte Carlo simulations for an autonomous, software-first satellite servicing vehicle, accelerating development and orbital validation.

2/6/2026

Google Cloud

Delivering a secure, open, and sovereign digital world

An in-depth look at building a secure, open, and sovereign digital world with Google Cloud's Sovereign Cloud portfolio—emphasizing data residency, air-gapped/dedicated deployments, open source, and rigorous regulatory controls.

2/6/2026

AWS ML

Evaluate generative AI models with an Amazon Nova rubric-based LLM judge on Amazon SageMaker AI (Part 2)

Assess generative AI with a dynamic, rubric-based Amazon Nova LLM judge on SageMaker AI, auto-generating task-specific rubrics, calculating per-criterion scores, and delivering calibrated model comparisons across outputs (Part 2).

2/6/2026

MIT AI

Helping AI agents search to get the best results out of large language models

EnCompass enables AI agents powered by LLMs to backtrack, clone runtimes, and explore multiple execution paths through configurable search strategies to maximize task outcomes.

2/5/2026

Google Cloud

Ship Production Ready AI and Survive the Multimodal Frontier This February

Roadmap to Production Ready AI and Real-Time Multimodal Agents, drawn from Google Cloud's two-day roadshow covering enterprise-grade security, scalable architecture, and Graph RAG-powered memory across sessions.

2/5/2026

AWS ML

A practical guide to Amazon Nova Multimodal Embeddings

A practical guide to configuring Amazon Nova Multimodal Embeddings for cross-modal search, semantic retrieval, and scalable multimodal AI workflows across text, image, video, and audio.

2/5/2026

AWS ML

How Associa transforms document classification with the GenAI IDP Accelerator and Amazon Bedrock

Associa leverages the GenAI IDP Accelerator on Amazon Bedrock to automatically classify millions of documents with high accuracy and low cost, using first-page OCR+image prompts and model tuning to optimize throughput and workflow integration.

2/5/2026

Modular AI

Modular: The Five Eras of KVCache

Traces the evolution of KVCache in LLM serving—from early naive designs to distributed, unified memory systems—within Modular's MAX platform and Mojo-based stack.

2/5/2026

Jane Street

I design with Claude more than Figma now

How Claude-powered prototyping transforms a designer's workflow, moving from laborious Figma mockups to live AI-driven prototypes that validate ideas quickly at Jane Street.

2/5/2026

OpenAI

Introducing GPT-5.3-Codex

A concise technical overview of GPT-5.3-Codex, highlighting its architecture, capabilities, and potential impact on AI-assisted coding.

2/5/2026