<!DOCTYPE html><html lang="en"><head><meta http-equiv="Content-Type" content="text/html charset=UTF-8"><meta charset="UTF-8"><meta name="viewport" content="width=device-width"><meta name="x-apple-disable-message-reformatting"><title>TLDR Data</title><meta name="color-scheme" content="light dark"><meta name="supported-color-schemes" content="light dark"><style type="text/css">
:root {
color-scheme: light dark; supported-color-schemes: light dark;
}
*,
*:after,
*:before {
-webkit-box-sizing: border-box; -moz-box-sizing: border-box; box-sizing: border-box;
}
* {
-ms-text-size-adjust: 100%; -webkit-text-size-adjust: 100%;
}
html,
body,
.document {
width: 100% !important; height: 100% !important; margin: 0; padding: 0;
}
body {
-webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; text-rendering: optimizeLegibility;
}
div[style*="margin: 16px 0"] {
margin: 0 !important;
}
table,
td {
mso-table-lspace: 0pt; mso-table-rspace: 0pt;
}
table {
border-spacing: 0; border-collapse: collapse; table-layout: fixed; margin: 0 auto;
}
img {
-ms-interpolation-mode: bicubic; max-width: 100%; border: 0;
}
*[x-apple-data-detectors] {
color: inherit !important; text-decoration: none !important;
}
.x-gmail-data-detectors,
.x-gmail-data-detectors *,
.aBn {
border-bottom: 0 !important; cursor: default !important;
}
.btn {
-webkit-transition: all 200ms ease; transition: all 200ms ease;
}
.btn:hover {
background-color: #f67575; border-color: #f67575;
}
* {
font-family: Arial, Helvetica, sans-serif; font-size: 18px;
}
@media screen and (max-width: 600px) {
.container {
width: 100%; margin: auto;
}
.stack {
display: block!important; width: 100%!important; max-width: 100%!important;
}
.btn {
display: block; width: 100%; text-align: center;
}
}
body,
p,
td,
tr,
.body,
table,
h1,
h2,
h3,
h4,
h5,
h6,
div,
span {
background-color: #FEFEFE !important; color: #010101 !important;
}
@media (prefers-color-scheme: dark) {
body,
p,
td,
tr,
.body,
table,
h1,
h2,
h3,
h4,
h5,
h6,
div,
span {
background-color: #27292D !important; color: #FEFEFE !important;
}
}
a {
color: inherit !important; text-decoration: underline !important;
}
</style><!--[if mso | ie]>
<style type="text/css">
a {
background-color: #FEFEFE !important; color: #010101 !important;
}
@media (prefers-color-scheme: dark) {
a {
background-color: #27292D !important; color: #FEFEFE !important;
}
}
</style>
<![endif]--></head><body class="">
<div style="display: none; max-height: 0px; overflow: hidden;">Current AI agents, much like legacy BI tools connected directly to production databases, lack trustworthiness due to ungoverned, noisy sources β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β </div>
<div style="display: none; max-height: 0px; overflow: hidden;">
<br>
</div>
<table align="center" class="document"><tbody><tr><td valign="top">
<table align="center" border="0" cellpadding="0" cellspacing="0" class="container" width="600"><tbody><tr class="inner-body"><td>
<table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr class="header"><td bgcolor="" class="container">
<table width="100%"><tbody><tr><td class="container">
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" style="margin-top: 0px;" width="100%"><tbody><tr><td style="padding: 0px;">
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;">
<div style="text-align: center;">
<span style="margin-right: 0px;"><a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Ftldr.tech%2Fdata%3Futm_source=tldrdata/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/UDxECxF5FeYr4UzfN5HwtxURjbeR-rtYU3_5-sb8hrw=444" rel="noopener noreferrer" target="_blank"><span>Sign Up</span></a>
|<span style="margin-right: 2px; margin-left: 2px;"><a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fadvertise.tldr.tech%3Futm_source=tldrdata%26utm_medium=newsletter%26utm_campaign=advertisetopnav/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/o-DQe-Nk8l0RXfL2Q3oiGkn_-Z8-dbSuVOn1r7jAzs8=444" rel="noopener noreferrer" target="_blank"><span>Advertise</span></a></span>|<span style="margin-left: 2px;"><a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fa.tldrnewsletter.com%2Fweb-version%3Fep=1%26lc=1670a604-84b7-11f0-bcf5-55fc1d40139c%26p=f3dae0c0-07d6-11f1-9320-e33525c77e5b%26pt=campaign%26t=1770894474%26s=c7ad6f2c744690dadac62d6f1615d8392a211568ee8439df9aff24eb822a461b/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/1bloJTycvVacA5a5cK6DACAHDbAEX5CApb9ZHmYvpd4=444"><span>View Online</span></a></span>
<br>
</span></div>
</td></tr></tbody></table>
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="text-align: center;"><span data-darkreader-inline-color="" style="--darkreader-inline-color:#3db3ff; color: rgb(51, 175, 255) !important; font-size: 30px;">T</span><span style="font-size: 30px;"><span data-darkreader-inline-color="" style="color: rgb(232, 192, 96) !important; --darkreader-inline-color:#e8c163; font-size:30px;">L</span><span data-darkreader-inline-color="" style="color: rgb(101, 195, 173) !important; --darkreader-inline-color:#6ec7b2; font-size:30px;">D</span></span><span data-darkreader-inline-color="" style="--darkreader-inline-color:#dd6e6e; color: rgb(220, 107, 107) !important; font-size: 30px;">R</span>
<br>
</td></tr></tbody></table>
<br>
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody></tbody></table>
<table style="table-layout: fixed; width:100%;" width="100%"><tbody><tr><td style="padding:0;border-collapse:collapse;border-spacing:0;margin:0;">
<div style="text-align: center;">
<h1><strong>TLDR Data <span id="date">2026-02-12</span></strong></h1>
</div>
</td></tr></tbody></table>
<table style="table-layout: fixed; width:100%;" width="100%"><tbody></tbody></table>
</td></tr></tbody></table>
</td></tr></tbody></table>
</td></tr>
<tr bgcolor=""><td class="container">
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td style="padding: 0px;">
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;">
<div class="text-block">
<div style="text-align: center;"><span style="font-size: 36px;">π±</span></div></div>
</td></tr></tbody></table>
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;">
<div class="text-block">
<div style="text-align: center;">
<h1><strong>Deep Dives</strong></h1>
</div>
</div>
</td></tr></tbody></table>
<table style="table-layout: fixed; width: 100%;" width="100%"><tbody><tr><td style="padding:0;border-collapse:collapse;border-spacing:0;margin:0;" valign="top">
<table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;">
<div class="text-block">
<span>
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fmaxhalford.github.io%2Fblog%2Ftext-classification-zstd%2F%3Futm_source=tldrdata/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/1zxeFg7f8QO98JyakY5NigBUT2StVOAAFmmTl_jAUE0=444">
<span>
<strong>Text classification with Python 3.14's zstd module (7 minute read)</strong>
</span>
</a>
<br>
<br>
<span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;">
Python 3.14's upcoming standard-library Zstd module enables fast, incremental compression, making compression-based text classification practical by classifying documents based on which class-specific compressor yields the smallest output. This simple, gradient-free approach achieves ~91 percent accuracy on 20 Newsgroups in under 2 seconds, rivaling TF-IDF plus logistic regression while being far simpler and faster to train incrementally.
</span>
</span>
</div>
</td></tr></tbody></table>
<table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;">
<div class="text-block">
<span>
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.infoworld.com%2Farticle%2F4128925%2Fai-augmented-data-quality-engineering.html%3Futm_source=tldrdata/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/H1mWOQSNPIgPz4J77Ljj70pUqOHenhRnt-6iB3FAP_4=444">
<span>
<strong>AI-augmented data quality engineering (8 minute read)</strong>
</span>
</a>
<br>
<br>
<span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;">
AI-powered data quality engineering is replacing traditional rule-based methods in large-scale, dynamic enterprise environments by employing deep learning for semantic inference, transformer models for automated schema alignment, and generative AI for data cleaning and imputation. Techniques like Sherlock and Sato have improved semantic classification accuracy by up to 14% in noisy contexts, while GANs, VAEs, and reinforcement learning optimize anomaly detection and pipeline efficiency. Dynamic trust scoring and explainability frameworks such as SHAP and LIME further enhance governance, auditability, and reliability.
</span>
</span>
</div>
</td></tr></tbody></table>
<table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;">
<div class="text-block">
<span>
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fnetflixtechblog.com%2Fhigh-throughput-graph-abstraction-at-netflix-part-i-e88063e6f6d5%3Futm_source=tldrdata/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/aESvtoJ45wnyBd2GZ5LzZLLI5pZt1cHmGRC0dGmnYek=444">
<span>
<strong>High-Throughput Graph Abstraction at Netflix: Part I (13 minute read)</strong>
</span>
</a>
<br>
<br>
<span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;">
Netflix's Graph Abstraction processes up to 10 million ops/sec across 650TB of graph data, integrating seamlessly with their KV and time-series abstractions for cost-efficient, low-latency (single-digit ms) access and strong eventual consistency. It employs a modular Property Graph model, fine-grained schema management, advanced caching with EVCache, and robust asynchronous operations for scalability and resilience. This architecture enables rapid traversals, fine schema control, and reliable multi-region operation.
</span>
</span>
</div>
</td></tr></tbody></table>
<table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;">
<div class="text-block">
<span>
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Flinks.tldrnewsletter.com%2F1ibfKx/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/9MYcwbdF9c0Z3-ZKoPC1_hqUNEvN84AWHeZji60z5D0=444">
<span>
<strong>Jack of All Trades: Query Federation in Modern OLAP Databases (15 minute read)</strong>
</span>
</a>
<br>
<br>
<span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;">
Query federation has become essential in modern OLAP databases, allowing unified SQL queries across data sources without heavy ETL or data duplication. While general-purpose tools like Trino offer broad connectivity as stateless orchestrators, embedded federation in high-performance engines like StarRocks provides better performance through optimizations (e.g., vectorized execution, metadata caching, and strong Iceberg support).
</span>
</span>
</div>
</td></tr></tbody></table>
</td></tr></tbody></table>
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;">
<div class="text-block">
<div style="text-align: center;"><span style="font-size: 36px;">π</span></div>
</div>
</td></tr></tbody></table>
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;">
<div class="text-block">
<div style="text-align: center;">
<h1><strong>Opinions & Advice</strong></h1>
</div>
</div>
</td></tr></tbody></table>
<table style="table-layout: fixed; width: 100%;" width="100%"><tbody><tr><td style="padding:0;border-collapse:collapse;border-spacing:0;margin:0;" valign="top">
<table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;">
<div class="text-block">
<span>
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.counting-stuff.com%2Fdata-work-in-the-fast-fashion-code-era%2F%3Futm_source=tldrdata/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/xRR4Eiac-7L8VlW6d7wJIC_qaL8XCySCDYWHF13gnk4=444">
<span>
<strong>Data Work in the Fast Fashion Code Era (4 minute read)</strong>
</span>
</a>
<br>
<br>
<span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;">
In the AI/LLM-driven era of βfast fashionβ code, software engineers grapple with fragile, hard-to-refactor systems that demand constant cleanup and ownership. However, for data professionals, the same dynamics become powerful advantages: rapid prototyping and throwaway scripts excel at tackling messy, unstructured sources like PDFs and videos, enabling fast ad-hoc research without the heavy burden of sustained maintenance or production-grade polish.
</span>
</span>
</div>
</td></tr></tbody></table>
<table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;">
<div class="text-block">
<span>
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fthenewaiorder.substack.com%2Fp%2Fdata-teams-should-become-context%3Futm_source=tldrdata/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/nIMa1cEtngRWmMJu6hSFgW65CcxChBE6chz8A3j7hOU=444">
<span>
<strong>Data teams should become context teams (5 minute read)</strong>
</span>
</a>
<br>
<br>
<span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;">
Current AI agents, much like legacy BI tools connected directly to production databases, lack trustworthiness due to ungoverned, noisy knowledge sources. Context engineering is emerging as a new discipline combining data governance, engineering, and data science to build a single, governed, versioned βcontext layerβ for AI. For data teams, this means building ETL, transformation, orchestration, and monitoring for company knowledge sources, with quantitative KPIs such as answer rate, accuracy, speed, and cost, plus bespoke evaluation frameworks to iteratively improve AI agent reliability and efficiency.
</span>
</span>
</div>
</td></tr></tbody></table>
<table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;">
<div class="text-block">
<span>
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Flevelup.gitconnected.com%2Fstructure-is-all-you-need-4ee88db32675%3Futm_source=tldrdata/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/s0Toxj5Vl9Fpl6R0hyo_V_xnv0DcSF15ADYTsqvTza4=444">
<span>
<strong>Structure is All You Need? (9 minute read)</strong>
</span>
</a>
<br>
<br>
<span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;">
AI is reaching the limits of linear transformer architectures and massive context windows, which deliver exhaustive recall but lack structured reasoning. Emerging research advocates shifting toward graph-based architectures: Context Graphs, Trainable Graph Memory, and GraphRAG. These enable episodic, semantic memory, recursive reasoning, and superior state modeling for tasks like code analysis and multi-agent coordination. Investing in state management and hybrid models (combining vector fuzziness with graph rigor) is critical for explainability, traceability, and resilient AI workflows.
</span>
</span>
</div>
</td></tr></tbody></table>
</td></tr></tbody></table>
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;">
<div class="text-block">
<div style="text-align: center;"><span style="font-size: 36px;">π»</span></div>
</div>
</td></tr></tbody></table>
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;">
<div class="text-block">
<div style="text-align: center;">
<h1><strong>Launches & Tools</strong></h1>
</div>
</div>
</td></tr></tbody></table>
<table style="table-layout: fixed; width: 100%;" width="100%"><tbody><tr><td style="padding:0;border-collapse:collapse;border-spacing:0;margin:0;" valign="top">
<table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;">
<div class="text-block">
<span>
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fgithub.com%2Fgetnao%2Fnao%3Futm_source=tldrdata/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/2LD-QEQ6yDGJF8433g8wgeR0xlJ8-FatJ7J5cdPaSds=444">
<span>
<strong>nao (GitHub Repo)</strong>
</span>
</a>
<br>
<br>
<span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;">
nao is a self-hosted framework for building, testing, and deploying analytics agents with rich, versioned context. It lets users query data in natural language while giving data teams control, observability, and fast iteration.
</span>
</span>
</div>
</td></tr></tbody></table>
<table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;">
<div class="text-block">
<span>
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fgithub.com%2Fposit-dev%2Fpointblank%3Futm_source=tldrdata/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/PNG7NJSbYN5RzeTcvXDJr-OuNnvVrSoxgLCgqFlGMv8=444">
<span>
<strong>Pointblank (GitHub Repo)</strong>
</span>
</a>
<br>
<br>
<span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;">
Pointblank is a Python data validation toolkit that enables users to assess and monitor tabular data quality through a chainable, expressive API, supporting backends like Polars, Pandas, DuckDB, PySpark, Snowflake, databases, and Parquet files. It stands out with features such as AI-powered DraftValidation (using LLMs to auto-suggest rules), threshold-based alerts with actions, synthetic test data generation, and conjoint/multi-condition validations.
</span>
</span>
</div>
</td></tr></tbody></table>
<table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;">
<div class="text-block">
<span>
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.getdbt.com%2Fblog%2Fdbt-core-v1-11-is-ga%3Futm_source=tldrdata/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/MUtPsgWCxX7XheQIkFOIOT8mSl9AkaAz_dYd51ZZjnY=444">
<span>
<strong>dbt Core v1.11 is GA (7 minute read)</strong>
</span>
</a>
<br>
<br>
<span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;">
dbt Core v1.11 introduces official support for user-defined functions (UDFs), enabling teams to standardize reusable transformation logic directly in their data warehousesβacross BigQuery, Snowflake, Redshift, Postgres, and Databricksβwith Python UDFs available in Snowflake and BigQuery. Enhanced JSON schema validation enforces stricter, early detection of configuration issues for improved code reliability. Adapter-specific optimizations, such as batched source freshness in BigQuery and deletion tracking in Databricks snapshots, further boost performance and governance.
</span>
</span>
</div>
</td></tr></tbody></table>
</td></tr></tbody></table>
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;">
<div class="text-block">
<div style="text-align: center;"><span style="font-size: 36px;">π</span></div></div>
</td></tr></tbody></table>
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;">
<div class="text-block">
<div style="text-align: center;"><strong><h1>Miscellaneous</h1></strong></div>
</div>
</td></tr></tbody></table>
<table bgcolor="" style="table-layout: fixed; width: 100%;" width="100%"><tbody><tr><td style="padding:0;border-collapse:collapse;border-spacing:0;margin:0;" valign="top">
<table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;">
<div class="text-block">
<span>
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fengineering.fb.com%2F2026%2F02%2F09%2Fdata-center-engineering%2Fbuilding-prometheus-how-backend-aggregation-enables-gigawatt-scale-ai-clusters%2F%3Futm_source=tldrdata/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/pxjjkBXUUPnRTTSmJX2SCJi2AS3HYbF1wYo_pzrGiyU=444">
<span>
<strong>Building Prometheus: How Backend Aggregation Enables Gigawatt-Scale AI Clusters (3 minute read)</strong>
</span>
</a>
<br>
<br>
<span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;">
Meta's Prometheus AI cluster will deliver 1 GW of capacity by interconnecting tens of thousands of GPUs across numerous data centers, enabled by the Backend Aggregation (BAG) network. BAG employs modular Jericho3 ASIC-powered chassis, petabit-level inter-BAG bandwidth (up to 48 Pbps per region pair), and advanced Ethernet-based topologies with eBGP routing and MACsec security. Precise oversubscription management and distributed architecture ensure high-performance, resilient networking.
</span>
</span>
</div>
</td></tr></tbody></table>
<table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;">
<div class="text-block">
<span>
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fmotherduck.com%2Fblog%2Fobsidian-rag-duckdb-motherduck%2F%3Futm_source=tldrdata/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/diKptIP_lvF6VrPmja7pjJj5B4-dcVaMyII_wtlQcH0=444">
<span>
<strong>Building an Obsidian RAG with DuckDB and MotherDuck (21 minute read)</strong>
</span>
</a>
<br>
<br>
<span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;">
A local-first Retrieval-Augmented Generation (RAG) pipeline for Obsidian notes uses DuckDB as an embedded vector database to store embeddings, intelligently chunks Markdown files while retaining backlinks and the full knowledge graph structure, and supports powerful semantic search combined with two-hop traversals to reveal hidden connections between ideas. The system then syncs data to MotherDuck, enabling a lightweight, serverless web app that executes DuckDB queries directly in the browser.
</span>
</span>
</div>
</td></tr></tbody></table>
</td></tr></tbody></table>
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;">
<div class="text-block">
<div style="text-align: center;"><span style="font-size: 36px;">β‘</span></div></div>
</td></tr></tbody></table>
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding-top: 0px; padding-bottom: 0px;">
<div class="text-block">
<div style="text-align: center;">
<h1><strong>Quick Links</strong></h1>
</div>
</div>
</td></tr></tbody></table>
<table bgcolor="" style="table-layout: fixed; width: 100%;" width="100%"><tbody><tr><td style="padding:0;border-collapse:collapse;border-spacing:0;margin:0;" valign="top">
<table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;">
<div class="text-block">
<span>
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Flinks.tldrnewsletter.com%2FXzXyPs/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/VqeQnVe4u_ELQpt5yO-4EZGxr6fTd0_tfu43Oy1SOM8=444">
<span>
<strong>Fact-Checking the Future: How BGE-M3 is Taming the RAG Hallucination Puzzle (11 minute read)</strong>
</span>
</a>
<br>
<br>
<span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;">
BGE-M3, developed by the Beijing Academy of Artificial Intelligence, unifies dense, sparse, and multi-vector (ColBERT) retrieval in a single model, efficiently overcoming the traditional RAG "retrieval trilemma" without sacrificing speed or accuracy.
</span>
</span>
</div>
</td></tr></tbody></table>
<table align="center" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;">
<div class="text-block">
<span>
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.linkedin.com%2Fposts%2Fhoytemerson_ive-been-playing-a-game-the-last-couple-activity-7426746327121866752-A4Im%3Futm_source=tldrdata/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/1bzSdtNv2yxNGPDa36_YjdC_VV4MbH4ugRHDWCTKY1g=444">
<span>
<strong>I've been playing a game the last couple of weeks called "Can I just do it with pyarrow?β (1 minute read)</strong>
</span>
</a>
<br>
<br>
<span style="font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;">
pyarrow can handle train/test splits and even train an XGBoost model directly from Arrow tables.
</span>
</span>
</div>
</td></tr></tbody></table>
</td></tr></tbody></table>
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td align="left" style="word-break: break-word; vertical-align: top; padding: 5px 10px;">
<p style="padding: 0; margin: 0; font-size: 22px; color: #000000; line-height: 1.6; font-weight: bold;">
Want to advertise in TLDR? π°
</p>
<div class="text-block" style="margin-top: 10px;">
If your company is interested in reaching an audience of data engineering professionals and decision makers, you may want to <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fadvertise.tldr.tech%2F%3Futm_source=tldrdata%26utm_medium=newsletter%26utm_campaign=advertisecta/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/FDDvcmYbBkgK_U11Y8UPVZvPwPuK8aOmKW9y1OL1w3w=444"><strong><span>advertise with us</span></strong></a>.
</div>
<br>
<!-- New "Want to work at TLDR?" section -->
<p style="padding: 0; margin: 0; font-size: 22px; color: #000000; line-height: 1.6; font-weight: bold;">
Want to work at TLDR? πΌ
</p>
<div class="text-block" style="margin-top: 10px;">
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fjobs.ashbyhq.com%2Ftldr.tech/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/4TgbJUweQ0oI_JGHwerC1a1rOEOXtF_6g1QHalWuCIs=444" rel="noopener noreferrer" style="color: #0000EE; text-decoration: underline;" target="_blank"><strong>Apply here</strong></a>,
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fjobs.ashbyhq.com%2Ftldr.tech%2Fc227b917-a6a4-40ce-8950-d3e165357871/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/qAhroRS84vBYWAzAnGh_O1GOZrLgW2LMwpMDDOjUWbM=444" rel="noopener noreferrer" style="color: #0000EE; text-decoration: underline;" target="_blank"><strong>create your own role</strong></a> or send a friend's resume to <a href="mailto:jobs@tldr.tech" style="color: #0000EE; text-decoration: underline;">jobs@tldr.tech</a> and get $1k if we hire them! TLDR is one of <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.linkedin.com%2Ffeed%2Fupdate%2Furn:li:activity:7401699691039830016%2F/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/fVdmsUUToy80vmTL2lpiPOyXLsvJvBDEmce8xm-xhBg=444" rel="noopener noreferrer" style="color: #0000EE; text-decoration: underline;" target="_blank"><strong>Inc.'s Best Bootstrapped businesses</strong></a> of 2025.
</div>
<br>
<div class="text-block">
If you have any comments or feedback, just respond to this email!
<br>
<br> Thanks for reading,
<br>
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.linkedin.com%2Fin%2Fjoelvanveluwen%2F/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/qzyuc9dyRSUMcJwqcX565CrchUxRdnw4Tm0oweQSKsE=444"><span>Joel Van Veluwen</span></a>, <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.linkedin.com%2Fin%2Fjennytzurueyching%2F/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/Ls-0CxFtRp3qAfgNbFCeYlU7uMgVxEewkEdwVSD8YZ0=444"><span>Tzu-Ruey Ching</span></a> & <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.linkedin.com%2Fin%2Fremi-turpaud%2F/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/JQypG5Ou7oxx7qt6WhTpkaJA0tk5cl0jWAMAqt18jnI=444"><span>Remi Turpaud</span></a>
<br>
<br>
</div>
<br>
</td></tr></tbody></table>
<table align="center" bgcolor="" border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td class="container" style="padding: 15px 15px;">
<div class="text-block" id="testing-id">
<a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Ftldr.tech%2Fdata%2Fmanage%3Femail=silk.theater.56%2540fwdnl.com/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/DzK-vViOJbiLPMSxoRxCY2z9rmMFkQNOKXS-p9jphNA=444">Manage your subscriptions</a> to our other newsletters on tech, startups, and programming. Or if TLDR Data isn't for you, please <a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fa.tldrnewsletter.com%2Funsubscribe%3Fep=1%26l=037ede50-92cc-11ee-b0f2-b761aa2217ad%26lc=1670a604-84b7-11f0-bcf5-55fc1d40139c%26p=f3dae0c0-07d6-11f1-9320-e33525c77e5b%26pt=campaign%26pv=4%26spa=1770894090%26t=1770894474%26s=723f1454a5f0e34872c05e999e40b65409d17d5c1ba7c38a31ed04476f7293bb/1/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/yV59YfUfs27AcXcsPUsHORZeKVWpufPGmV7hp_vf9KY=444">unsubscribe</a>.
<br>
</div>
</td></tr></tbody></table>
</td></tr></tbody></table>
</td></tr></tbody></table>
</td></tr></tbody></table>
</td></tr></tbody></table>
<img alt="" src="http://tracking.tldrnewsletter.com/CI0/0100019c51893e01-d554646a-e3c4-4d2a-829a-2aae47a313a6-000000/-PlVFcSc7iwLEuJMQkMeX_PuP9CFSVYO-SuZ9YpU568=444" style="display: none; width: 1px; height: 1px;">
</body></html>