25 newsletters today.
Abandoned Futures
2026-05-18
On July 16, 1969, five Rocketdyne F-1 engines ignited beneath Apollo 11, each producing 1.522 million pounds of thrust at sea level — burning a ton of RP-1 kerosene and liquid oxygen every second. The F-1 remains, 57 years later, the most powerful single-chamber liquid-fuel rocket engine ever flown. And by the late 1970s, NASA had effectively lost the ability to build another one.
The engine was developed at Rocketdyne's Canoga Park facility starting in 1958 under Paul Castenholz and chief designer Dom Sanchini. Its core problem was combustion instability — pressure oscillations inside the 3.7-meter combustion chamber that could destroy the engine in 40 milliseconds. The team solved it the hard way: they detonated small bombs inside running engines and iterated injector plate designs (the famous "baffled injector") until the chamber could damp out disturbances within 0.1 seconds. This was empirical engineering on a scale that has never been repeated.
Why did it die? The reasons are mundane and infuriating:
Here's why this matters now: in 2012-2013, Marshall Space Flight Center engineers resurrected an F-1 gas generator from museum stock (engine F-6049) and test-fired it. Using structured-light 3D scanning, they reverse-engineered components that had taken Rocketdyne years to design. With modern selective laser melting (SLM) of Inconel 718, the 5,600 individual parts of the original injector could be printed as a single component — exactly the trick SpaceX used on SuperDraco and Aerojet Rocketdyne used on the RS-25 main combustion chamber.
A modernized F-1B was projected to deliver 1.8 million pounds of thrust at lower part-count and dramatically lower cost. It was proposed for SLS's Advanced Boosters competition in 2014. It lost to solid boosters — a political choice favoring ATK's Utah workforce, not a technical one.
The lesson isn't nostalgia. It's that we already proved kerosene-LOX scales to 1.5+ MN thrust in a single chamber in 1967, and modern additive manufacturing, CFD-validated injector design, and digital tribal knowledge capture make it trivial to do again. SpaceX's Raptor is brilliant, but it's 2.3 MN total across 33 engines on Starship's booster. Five F-1Bs would equal that with one-sixth the plumbing complexity.
We didn't lose the F-1 to physics. We lost it to filing.
Daily Automotive Engines
2026-05-18
The quench area (also called squish) is the flat region where the piston crown approaches the cylinder head's flat deck at top dead center. When the piston rises into this tight gap, it violently squirts the trapped mixture sideways into the main combustion chamber at high velocity. This isn't a side effect — it's deliberately engineered turbulence that transforms how your engine burns fuel.
Why does this matter? A homogeneous, turbulent mixture burns faster and more completely than a stagnant one. Faster flame propagation means:
The classic example is the small-block Chevy 350 with closed-chamber 64cc heads. Swap those for "open chamber" 76cc heads (no quench pad) and you'll lose 1+ point of compression AND gain a knock problem on pump gas. Mopar's later "Magnum" 5.9L heads brought back aggressive quench specifically to run 9.1:1 compression on 87 octane reliably.
The critical dimension: quench clearance. This is the distance between the piston crown and head deck at TDC, calculated as:
Quench = Head gasket thickness + Piston deck height (how far the piston sits below the deck at TDC)
Rule of thumb: Target 0.035"–0.045" quench clearance for a street engine. Tighter than 0.035" risks the piston physically kissing the head as the rod stretches at high RPM. Looser than 0.050" and you lose the turbulent squish effect entirely — the gap becomes a dead zone where unburned fuel hides.
Real example: A 350 build with pistons 0.020" down in the hole and a 0.041" composite gasket gives 0.061" quench — too loose, sluggish burn. Mill the deck 0.020" or use a 0.020" steel shim gasket to reach 0.041" quench, and the same engine picks up measurable power AND runs cleaner on the same fuel.
Diesels take this further with bowl-in-piston designs — the quench area pushes nearly all the air down into a swirl chamber machined into the piston crown, which is essential for properly mixing the late-injected fuel spray with compressed air.
Daily Debugging Puzzle
RefCell Reentrant Borrow Panic: The Listener That Tried to Subscribe2026-05-18
This EventBus lets callers subscribe closures and emit events to all listeners. The compiler is happy, the API looks reasonable, and a unit test that just calls emit after a few subscriptions passes. Then a listener does something perfectly natural — registers another listener in response to an event — and the whole program face-plants.
use std::cell::RefCell;
use std::rc::Rc;
struct EventBus {
listeners: RefCell<Vec<Box<dyn Fn(&str)>>>,
}
impl EventBus {
fn new() -> Self {
EventBus { listeners: RefCell::new(Vec::new()) }
}
fn subscribe(&self, listener: Box<dyn Fn(&str)>) {
self.listeners.borrow_mut().push(listener);
}
fn emit(&self, event: &str) {
for listener in self.listeners.borrow().iter() {
listener(event);
}
}
}
fn main() {
let bus = Rc::new(EventBus::new());
let bus2 = Rc::clone(&bus);
bus.subscribe(Box::new(move |e| {
println!("got: {}", e);
if e == "init" {
bus2.subscribe(Box::new(|e2| println!("late: {}", e2)));
}
}));
bus.emit("init"); // thread 'main' panicked: already borrowed
}
emit calls self.listeners.borrow() and holds that immutable borrow for the entire for loop. Inside the loop, the listener invokes bus2.subscribe(...), which calls borrow_mut() on the same RefCell. RefCell enforces Rust's aliasing rules at runtime: an outstanding immutable borrow forbids any mutable borrow. The result is BorrowMutError — a panic, not a compile error.
This is the dynamic-checking analogue of "iterator invalidation." The trap is that the call graph is invisible at the borrow site. Any closure you don't control — a logging hook, a metric counter, a plugin — might reach back into the same RefCell. Even a single-threaded program written entirely in safe Rust can crash this way. Wrapping a RefCell in Rc makes the hazard worse, because clones of the Rc hand out new entry points to the same cell.
The fix is to not hold a borrow across calls to unknown code. The cleanest pattern is to release the borrow before dispatching, by snapshotting or swapping out the vec:
fn emit(&self, event: &str) {
// Move listeners out, releasing the borrow before any callback runs.
let taken = std::mem::take(&mut *self.listeners.borrow_mut());
for listener in &taken {
listener(event);
}
// Restore, prepending any listeners added during dispatch.
let mut current = self.listeners.borrow_mut();
let added = std::mem::replace(&mut *current, taken);
current.extend(added);
}
Alternatives include cloning an Rc of each listener into a local Vec (if listeners are Rc-shared), or queueing subscribe requests into a pending vec that emit drains afterward. Whatever you choose, the rule is the same: drop the borrow before calling untrusted code.
A useful habit: treat every borrow() and borrow_mut() as opening a critical section, and ask "what's the longest call chain that runs before this guard is dropped?" If the answer includes "a user-supplied closure," you have a latent panic.
RefCell moves Rust's aliasing rules to runtime — holding a borrow across a callback can panic, so release the guard (snapshot, swap, or drop) before invoking code you don't fully control.
Daily Digital Circuits
2026-05-18
Normal pipelining inserts flip-flops between logic stages so each cycle holds exactly one "wave" of data. Wave-pipelining deletes those intermediate registers and instead launches a new input before the previous one has finished propagating — multiple data waves coexist in the same combinational cloud, separated only by their propagation delay. The clock period sets the wave spacing; the logic itself becomes the storage.
The trick relies on bounding the delay spread through the logic. Every path from input register to output register has some min delay d_min and max delay d_max. For N waves to coexist safely, you need:
Subtract them and you get the killer inequality: d_max − d_min < T_clk − t_setup − t_hold. The spread between fastest and slowest path must fit inside one clock period minus flip-flop overhead. This is brutal — normal synthesis happily produces paths with 3:1 delay ratios. Wave-pipelining demands tight, balanced logic where every path takes nearly the same time.
Concrete example: a 64-bit carry-lookahead adder has d_max ≈ 2.0 ns and d_min ≈ 0.6 ns. Spread = 1.4 ns. With a 1 ns clock period and 0.15 ns setup+hold overhead, you'd need spread < 0.85 ns — fails. But pad the fast paths with buffers to raise d_min to 1.2 ns (spread = 0.8 ns) and you can run 2 waves in flight at 1 GHz, doubling throughput without adding a pipeline register. Cray's vector units and some 1990s DEC Alpha multipliers used this exact technique to hit clock targets without paying register-file area.
Rule of thumb: wave-pipelining is worth attempting only when register insertion is impossible (analog-style paths, ultra-low-latency loops) or when register power/area is the bottleneck. The design effort is roughly 5× a normal pipeline because you must actively slow down fast paths with delay buffers — and any process/voltage/temperature drift that widens the spread breaks the circuit.
Modern static timing tools barely support it. You typically need custom delay-matching scripts plus on-die delay-line trimming to compensate for PVT variation. That's why you see wave-pipelining in academic papers and a handful of GPU datapaths, but not in mainstream RTL — when registers cost almost nothing, paying 5× design effort to save them rarely pencils out.
Daily Electrical Circuits
2026-05-18
If you've ever wondered why a $50 handheld multimeter can deliver 4½ digits of accuracy while a $500 oscilloscope ADC drifts noticeably between calibrations, the answer is the dual-slope integrating ADC. It's slow, elegant, and brilliantly immune to the noise sources that plague faster converters.
The core trick: instead of comparing the input to a reference directly, you let the input charge a capacitor through a resistor for a fixed time, then discharge that capacitor with a known reference current and measure how long the discharge takes. The ratio of times equals the ratio of voltages — and crucially, the integrating capacitor and resistor values cancel out of the equation.
The three phases:
Why it's so accurate: Notice R, C, and clock frequency all cancel. Only VREF matters for absolute accuracy — and you can buy a 2 ppm/°C reference for under $5. Better yet, by choosing T₁ to be an integer multiple of the AC line period, line-frequency noise integrates to zero. This is called normal-mode rejection, and it's why DMMs feel rock-steady when probing noisy industrial gear.
The 50/60 Hz rule of thumb: Set T₁ = 100 ms (or any integer multiple of 1/60 s AND 1/50 s — 100 ms is the smallest that works for both). This gives infinite theoretical rejection of mains hum and its harmonics. Bench DMMs labeled "5½ digit, 10 readings/sec" are running exactly this trick — that's why you can't get faster readings without losing digits.
Real-world example: The classic ICL7106 (and its successors in Fluke 17x-series meters) is a dual-slope ADC with built-in LCD driver. Pair it with an LM4040 2.5 V reference, a 0.1 µF polypropylene integrating cap (low dielectric absorption is critical here!), and a 100 kΩ integrating resistor. You get 3½ digits, 3 readings/sec, and 50/60 Hz rejection — all from one chip plus a handful of passives.
Watch out for: dielectric absorption in the integrating cap. Ceramic caps "remember" previous charge and cause nonlinearity. Use polypropylene or polystyrene for ≥4-digit designs.
Daily Engineering Lesson
2026-05-18
When a single pump can't deliver enough flow, engineers often install a second identical pump in parallel — both pulling from the same source, both discharging into the same header. The intuition is obvious: two pumps, twice the flow. The reality is almost never that clean, and misunderstanding why has wrecked countless HVAC retrofits, fire suppression systems, and process plants.
The system curve fights back. Every piping system has a system curve: a parabolic relationship where required pressure rises with the square of flow rate (friction losses scale as v²). A pump has its own pump curve: head decreases as flow increases. The operating point is where these curves intersect.
When you add a second identical pump in parallel, the combined pump curve doubles in flow at any given head — but the system curve doesn't move. The new intersection sits at higher flow and higher head, which means each pump now operates further right on its individual curve at lower head per pump than before. You get more flow, just not 2×.
Rule of thumb: For typical systems with significant friction losses, two identical pumps in parallel deliver roughly 1.3× to 1.5× the flow of a single pump — not 2×. Only in systems dominated by static head (lifting water to a fixed elevation with minimal pipe friction) do you approach the 2× ideal.
Quick example: A single pump delivers 500 GPM at 80 ft of head against a system curve. Add an identical pump in parallel. The combined curve intersects the system curve at perhaps 700 GPM and 110 ft of head — each pump now contributes 350 GPM at 110 ft, working harder per unit of flow delivered.
Practical consequences:
For series operation, the math flips: heads add at the same flow, useful when you need more pressure rather than more volume — think booster pumps in tall buildings.
Forgotten Books
2026-05-18
Book: Amateur Gunsmithing by Townsend Whelen (1924)
Read it: Internet Archive
In 1924, Major Townsend Whelen — one of the most respected riflemen and ballistics writers of the early twentieth century — opened his manual with a frontispiece showing a man named Seymour Griffin checking a rifle stock. The caption carried a quietly radical claim:
"Seymour Griffin checking a stock. Although ranking with the most skilled of stockers, he is entirely self-taught."
The book's first chapter doubled down on the idea, asserting that "Gunsmith skill within the talent of many amateurs without previous experience" — a remarkable declaration in an era when most trades were guarded behind formal apprenticeships, guild lineages, and family inheritance.
What's been forgotten is not a technique but an attitude: the casual assumption that an ordinary person, working evenings in a home shop, could legitimately produce a custom rifle stock, chamber a barrel on a lathe, cut checkering patterns, and apply a "London Finish" with rubbed linseed oil — and end up with something competitive with factory or master work. Whelen's table of contents reads like a four-year trade-school curriculum: stocking, checkering, polishing, chambering, bluing and browning. He treats it as a weekend hobby.
Consider what this required the reader to already own or build:
This was not unusual for 1924. Home machine shops were common among middle-class American men. Sears sold lathes. Popular Mechanics ran serial articles on building your own engine. The cultural baseline for "things a normal person can make" included firearms, furniture, radios, and automobile parts.
Is the claim true? The modern evidence is striking. Custom stockmakers today still revere Seymour Griffin — he went on to co-found Griffin & Howe in 1923, which became the most prestigious custom rifle shop in America and still operates a century later. Whelen wasn't exaggerating: the self-taught amateur in his frontispiece literally became the gold standard. And the techniques in the book — bedding a receiver, cutting a comb, rubbing oil into walnut — are essentially unchanged in custom shops today.
The modern parallel is the maker movement, but inverted. We celebrate someone 3D-printing a phone case as a return to self-reliance; in 1924, Whelen assumed his reader could rifle a chamber to military headspace tolerances. The skills haven't gotten harder. The cultural permission to attempt them has eroded. YouTube has revived some of this — there are channels where hobbyists do exactly what Whelen described — but as a niche, not a default.
Whelen's quiet thesis is that craft is not gatekept by talent or pedigree, only by willingness to start. That belief used to be ordinary.
Forgotten Patent
2026-05-18
In September 1888, an American machine-tool engineer named Oberlin Smith published an article in The Electrical World titled "Some Possible Forms of Phonograph." It was not a patent — and that is precisely what makes the story remarkable. Smith had filed a U.S. caveat (No. 3,683) in October 1878, ten years earlier, describing a method of recording sound magnetically. A caveat was a now-extinct legal instrument: a confidential notice to the Patent Office that an inventor was working on something and wanted a one-year warning if anyone filed a competing claim. Smith let his caveat lapse without ever pursuing a full patent, and then in 1888 simply published the whole idea, gifting it to the public domain.
The idea was this: take a length of cotton or silk thread, embed it with steel dust or tiny iron filings, and pull it past an electromagnet whose coil carried the fluctuating current from a microphone. The varying magnetic field would magnetize the iron particles in a pattern corresponding to the sound waves. Pulling the thread back past a similar coil would induce a matching current — and reproduce the sound. Smith had, in a single eight-page article, described:
Smith's reasoning was unusually rigorous. He noted that Edison's tin-foil phonograph (patented just months before his caveat) suffered from mechanical wear — the stylus physically deformed the medium each time it played. A magnetic record, Smith pointed out, would be non-contact on playback: the field would induce current without the medium ever touching the pickup. This is the exact argument that, a century later, would justify read heads flying micrometers above a spinning platter in the IBM hard disk.
Why did it take so long? Smith had no amplifier. The signal induced by his magnetized thread was so faint that, without vacuum-tube amplification (Lee de Forest, 1906), playback was barely audible. Poulsen demonstrated a working wire recorder at the 1900 Paris Exposition — Emperor Franz Josef recorded his voice on it — but commercial magnetic recording had to wait for AEG's Magnetophon in 1935, which used Pfleumer's iron-oxide-coated paper tape. That tape, in turn, became the direct ancestor of the iron-oxide coating on the first IBM RAMAC hard drive in 1956, and of every spinning-disk drive shipped since.
Smith's 1888 article is now considered the founding document of magnetic recording. He could have owned the field. Instead, he gave it away — explicitly writing that he hoped "someone with more time" would develop the idea. Someone did. Everyone, eventually, did.
Daily GitHub Zero Stars
2026-05-18
Language: JavaScript
Link: https://github.com/OriginSecurityX/jsonata-hasownproperty-bypass
This is a proof-of-concept repository demonstrating a prototype pollution bypass in JSONata, the popular JSON query and transformation language used in Node-RED, IBM App Connect, and countless serverless data-mapping pipelines. The author shows how JSONata's function binding mechanism can be abused to override hasOwnProperty, defeating one of the most common defensive patterns developers use to guard against prototype pollution.
What makes this PoC interesting is the specific attack surface. Most prototype pollution discussions focus on merge utilities like lodash or jQuery's $.extend. JSONata is a different beast: it's an expression language, and users frequently pass untrusted JSONata expressions through APIs that treat them as "just queries." If your defensive code reads:
if (Object.prototype.hasOwnProperty.call(obj, key)) — you're probably safeif (obj.hasOwnProperty(key)) — this repo shows how that check can be subverted via function bindingThe bypass leverages JSONata's $function and binding semantics, which let an expression construct callables that shadow built-in property checks on traversed objects. For anyone running JSONata in a multi-tenant context — SaaS data transformation, low-code platforms, customer-supplied integration logic — this is exactly the kind of nuanced edge case worth understanding before it lands in a CVE.
Who benefits:
obj.hasOwnProperty is not a safe check on untrusted objectsThe repo is tiny, focused, and reproducible — exactly the format a good PoC should take. Even if you don't ship JSONata, the underlying lesson about expression-language sandboxes is broadly applicable.
hasOwnProperty is never a trustworthy check against attacker-controlled objects — especially inside expression engines like JSONata.
Daily Hardware Architecture
2026-05-18
Branch predictors get a lot of attention for guessing whether a branch is taken. But there's a second, equally important question: where does it go? Knowing a branch will be taken is useless if you can't start fetching the target instructions immediately. That's the job of the Branch Target Buffer (BTB).
The BTB is a small cache, typically indexed by the branch instruction's PC, that stores the target address of recently executed branches. When the fetch unit pulls an instruction, it doesn't yet know what that instruction is (decode happens later in the pipeline). So in parallel with the fetch, the BTB is queried with the current PC. If there's a hit, the CPU immediately starts fetching from the predicted target — often before it even knows the current instruction is a branch.
This matters enormously. Without a BTB, a correctly predicted taken branch still costs a "fetch bubble" — the cycles between fetching the branch and decoding it to know its target. Modern frontends fetch 32-64 bytes per cycle; even one bubble wastes serious throughput.
Structure: A typical BTB has 4K-8K entries, organized as a set-associative cache. Each entry holds:
Many designs split the BTB into levels — a small, fast L1 BTB (~128 entries, 1-cycle latency) and a larger L2 BTB (4K+ entries, 2-3 cycle latency). Sound familiar? It's caches all the way down.
Real-world impact: Server workloads like databases and JIT-compiled code have huge instruction footprints — often 100MB+ of hot code. The BTB becomes a major bottleneck: a single MySQL query can touch thousands of branches. Intel's Sapphire Rapids increased the BTB to 12K entries specifically to handle this. Profile a Java JIT under perf stat -e BPU_CLEARS and you'll see frontend stalls dominate.
Rule of thumb: If your hot code footprint exceeds BTB_entries × average_branch_density, expect frontend stalls. For an 8K-entry BTB and ~1 branch per 5 instructions, that's roughly 160KB of hot code before BTB pressure becomes painful. Beyond that, you're fetching from cold targets the BTB has forgotten.
BTB misses are subtle: the predictor may still say "taken" correctly, but without a target, the pipeline stalls until decode resolves it — often 5-10 cycles wasted on what felt like a perfect prediction.
Hacker News Deep Cuts
2026-05-18
Link: https://github.com/jnuyens/modulejail/
HN Discussion: 1 points, 0 comments
The Linux kernel ships with hundreds of loadable modules covering ancient filesystems, exotic network protocols, obscure hardware drivers, and decades of accumulated functionality. Most production servers will never touch 95% of them — but every one of those modules is a potential attack surface. A vulnerability in an obscure filesystem driver that gets auto-loaded when a crafted USB stick is inserted, or a flaw in a rarely-used network protocol module that gets pulled in by a malicious packet, can hand an attacker kernel-level access.
Modulejail tackles this systematically. Rather than trying to enumerate which modules to allow (a maintenance nightmare on real systems), it apparently focuses on aggressive blacklisting of modules that have no business being loaded on a typical production host. Think:
jfs, reiserfs, befs, cramfs, hfs, udf — all of which have produced CVEs when fuzzeddccp, sctp, rds, tipc — historically rich sources of LPE bugsThis isn't a novel idea — the CIS benchmarks have recommended blacklisting some of these for years, and grsecurity has long advocated module restriction. What makes a curated tool valuable is that doing this correctly is tedious. You need to know which modules are auto-loadable via crafted input, which are pulled in by udev, which are still actually used by modern systemd or container runtimes, and how to install blacklist entries that modprobe will actually honor (the difference between blacklist and install ... /bin/true matters).
For anyone hardening servers, container hosts, or even desktops in a hostile environment, this is exactly the kind of unglamorous defensive tooling that pays off when the next ksmbd or n_gsm CVE drops. The project is small enough to audit in an afternoon, which is itself a virtue in security tooling.
Worth comparing against kernel-hardening-checker, Lockdown LSM, and the kconfig hardening profiles from KSPP — they overlap but aren't substitutes.
HN Jobs Teardown
2026-05-18
Source: HN Who is Hiring
Posted by: 59243
Expensify's posting (ID 22668400) is the most revealing on the board because it openly contradicts almost every assumption you'd make about a unicorn-scale fintech. They're hiring Full-Stack, PHP, Java, C++, iOS, Android, and Infrastructure engineers across San Francisco, Portland, Michigan, and London — fully remote, visa welcome, $135K+ floor.
The stack tells a story. PHP and C++ alongside Java and mobile native? In 2020, when every fintech pitch deck mentions Go, Rust, or Kotlin (see Hatch's Kotlin/Spring/k8s stack in the same thread), Expensify is unapologetically running on PHP and C++. This isn't laziness — it's conviction. They process "billions of real dollars annually" on this stack with ~130 employees. That's an extraordinary revenue-per-engineer ratio that suggests the boring-tech argument has merit when you commit to it for a decade.
The financial structure is the real headline. Three phrases buried in the post deserve scrutiny:
That last point is the giveaway. Expensify is engineering an exit-less exit — buying out investors to avoid the IPO/acquisition pressure that forces growth-at-all-costs. The "$135K+" floor (notably flat across SF and Michigan) signals they pay for skill, not zip code, years before remote-pay-parity became a debate.
Green flags: Visa sponsorship, transparent salary floor, multi-geography hiring before COVID forced it (Michigan is not a typical fintech hub), and the rare honesty about company structure. The "millions of users and more customers than the rest of the industry combined" claim is checkable and probably true.
Red flags: The PHP/C++ combo will limit your candidate pool to people who either love legacy stacks or don't care about resume optics. "Self-managed" can mean flat hierarchy heaven or political nightmare depending on execution — David Barrett's Expensify has a reputation for both. No mention of the stack's modernization roadmap suggests they're not planning one, which is either disciplined or stagnant depending on your taste.
The trend signal: Profitable, employee-owned, boring-stack fintechs hiring remotely with flat pay bands represent a counter-narrative to the VC-funded, hyper-growth, Kotlin-on-k8s model that dominates this thread. Both can win; they're playing different games.
Daily Low-Level Programming
2026-05-18
Your L1 and L2 caches are private to each core, but the L3 (Last-Level Cache) is shared. On modern Intel server chips, that sharing is not free — the L3 is physically sliced, with one slice sitting next to each core, and a ring bus (or mesh, on newer Xeons) connects them. Your address doesn't live in "the L3" — it lives in one specific slice, determined by a hash of the physical address.
Here's the consequence: when core 0 accesses an address whose slice is co-located with core 15, the request travels across the ring. Each hop costs roughly 1 cycle. On a 20-core ring, the worst case is ~10 hops each way. The same L3 hit can cost 35 cycles for a near slice and 70+ cycles for a far one — a 2x variance for what your profiler calls "an L3 hit."
The hash is deliberately scrambled to prevent any one slice from becoming a hotspot. You cannot easily predict which slice owns a given page, and consecutive cache lines often live in different slices. This is good for bandwidth (parallelism across slices) but bad for latency predictability.
Real-world example: A trading firm pinned their market-data thread to core 0 and their order-submission thread to core 19 on a 20-core Xeon, assuming "isolation = good." Latency was inconsistent. The fix: pin both threads to adjacent cores (0 and 1). The shared cache lines for the order book now resolved through nearby L3 slices, and tail latency dropped by ~40%. Counterintuitively, putting threads closer on the ring beat spreading them out.
Mesh topology (Skylake-SP and later): Intel replaced the ring with a 2D mesh because rings don't scale past ~12 cores — latency grows linearly with core count. The mesh gives O(√n) worst-case hops instead of O(n), but introduces its own surprise: now both dimensions matter, and the "distance" between two cores depends on their (x,y) coordinates on the die.
Rule of thumb: On a ring-bus CPU, expect L3 latency to range from base_latency + 1·N to base_latency + 2·N cycles, where N is core count. If your hot data is shared between two threads, co-locate them on adjacent cores — the LLC slice they hit will be roughly equidistant from both. You can probe slice topology with the CBox performance counters (uncore events UNC_CBO_CACHE_LOOKUP) to see which slice serves your workload.
This is also why benchmarks vary across runs even on an isolated machine: the kernel may schedule your thread on a different core, and now the "same" cache hit takes a different number of cycles.
Reddit Small Subs
2026-05-18
Subreddit: r/metalworking
Discussion: View on Reddit (8 points, 9 comments)
This post chronicles a beautifully precise repair: u/Joebobb22 bought a Futura 800 vintage typewriter on eBay, only to discover that two teeth were missing from the escapement starwheel — the small toothed gear that controls carriage advance with every keystroke. Without it, the typewriter won't type. With it damaged, every line of text drifts.
What makes this repair fascinating is that the OP couldn't just source a replacement part. Vintage typewriter spares for a Futura 800 effectively don't exist, and the starwheel is hardened steel cut to tight tolerances. So instead of replacing the wheel, they rebuilt the missing teeth by fabricating new steel onto the original part.
The technique involved:
The educational value here is twofold. First, it's a great demonstration of the "can't buy it, so make it" mindset that defines repair culture for obsolete mechanisms — a skill set increasingly relevant as more 20th-century machinery passes out of the spare-parts economy. Second, it shows how much of fine metalwork is really about reading the original part: the surviving teeth tell you the pitch, the pressure angle, the depth, and the hardness target. The repair isn't designed from scratch; it's reverse-engineered tooth by tooth.
Comments dig into the choice of filler material, whether silver brazing would have been gentler on the surrounding heat-affected zone, and how to test the repaired wheel under the spring tension of an assembled escapement before committing it back to service.
For anyone interested in horology, watchmaking, or precision mechanical repair, this is a small but rich case study in working at the millimeter scale on a part that can't fail.
RFC Deep Dive
2026-05-18
If you have ever joined a video call in your browser without installing a plugin — Google Meet, Discord on web, Jitsi, Whereby, the millions of telehealth visits during the pandemic — RFC 7742 is one of the small documents that made it possible. It is short (twelve pages) and reads like a checklist, but behind it sits one of the nastiest political fights in the history of the IETF.
The problem. WebRTC, the browser-native real-time media stack, only works if any compliant endpoint can talk to any other compliant endpoint. That means every browser must agree on at least one common video codec. In the 2010s, the obvious candidate was H.264 — universally supported in hardware, beautifully tuned, and absolutely covered in patents. Cisco and Apple shipped it. Mozilla and Google preferred royalty-free VP8, which they had been building into the open web on principle. Neither side would budge. WebRTC nearly stalled.
The compromise. RFC 7742 declares — in the dryest possible language — that both codecs are mandatory to implement for any "WebRTC Browser." Non-browser endpoints (a SIP gateway, a security camera, a bot) only need one of the two. Cisco famously published a free, prebuilt H.264 binary (OpenH264) so that open-source browsers like Firefox could ship H.264 support without paying MPEG LA royalties; Cisco paid those royalties on everyone's behalf up to an annual cap. RFC 7742 is the document that bakes that political settlement into the protocol.
What it actually specifies. Beyond the codec mandate, the RFC lays out the boring-but-essential rules:
googRemb and later transport-cc feedback). A WebRTC encoder is not a file encoder; it is a closed-loop control system.CVO RTP header extension (RFC 7741), so a phone rotated mid-call doesn't ship sideways pixels.Why it still matters. Every browser-to-browser video call you make today negotiates codecs through SDP using exactly the rules in this RFC. When you open DevTools on a Meet call and see VP8 selected on one leg and H.264 on another, that is RFC 7742 in action: a peer offers both, the other side picks, and a transcoding SFU may bridge mismatched legs. AV1 and VP9 are now layered on top via later documents (RFC 9080 and friends), but the "mandatory two" rule has never been retired — backwards compatibility is forever in real-time media.
The history bit. The IETF debate over WebRTC's mandatory codec ran for roughly three years and produced more mailing-list traffic than almost any working-group fight of its era. The final vote in the RTCWEB WG was a hum, not a tally, and the chairs explicitly framed the outcome as "neither side won, and we are shipping." Adam Roach, then at Mozilla, wrote the document that codified that exhaustion into a standard.
Stack Overflow Unanswered
2026-05-18
Stack Overflow: View Question
Tags: c, debugging, visual-studio-debugging, avx2, icx
Score: 3 | Views: 112
The asker has AVX2 intrinsics code that compiles and runs correctly under both MSVC and Intel ICX 2025.2 at maximum optimization. The moment they drop to /Ob1 (inline only functions marked __inline), the debugger reports a read access violation on a pointer that is demonstrably valid. Maximum-inlining builds are fine; selective-inlining builds blow up at runtime under the MSVC IDE debugger.
This sits in a particularly nasty corner of the toolchain stack: a third-party compiler (ICX, which is LLVM-based) producing PDB-format debug info consumed by Microsoft's debugger, generating SIMD instructions whose correctness depends on stack alignment guarantees that differ between inlined and out-of-line call sites.
Why this is interesting: AVX2 aligned loads (_mm256_load_si256, VMOVDQA) require 32-byte alignment. The Windows x64 ABI only guarantees 16-byte stack alignment at function entry. When a function is inlined, the compiler sees the full context and can prove (or arrange) 32-byte alignment of locals. When the same function is emitted out-of-line, the prologue must realign the stack itself, and any mismatch between what the caller promised and what the callee assumed produces a fault on the first aligned 256-bit load. The "invalid pointer" message is misleading — the pointer value is fine; its low 5 bits aren't.
Direction toward a solution:
VMOVDQA / VMOVAPS / VPADD* against a memory operand, suspect alignment. Switch to _mm256_loadu_si256 (unaligned) at the suspect site as a diagnostic — if the crash disappears, you've confirmed it.__alignof the offending object and whether it's a stack local, heap allocation, or a struct member. Heap allocations from malloc are only 16-byte aligned on Windows; use _mm_malloc(size, 32) or _aligned_malloc./O2 versus /Ob1 for the non-inlined function. ICX may be emitting and rsp, -32 only in one configuration.Gotchas: The MSVC debugger's exception text is canned; "access violation" covers both bad pointers and alignment faults (#GP from misaligned VEX.128/256 aligned ops). Don't trust the message. Also, debug-vs-release struct layout can differ if any member uses __declspec(align) alongside compiler-inserted guard bytes.
Daily Software Engineering
2026-05-18
When you split a monolith into microservices, clients suddenly face a problem: which of the 47 services do they call, and in what order? The API Gateway pattern puts a single entry point in front of your service mesh. Clients talk to the gateway; the gateway handles routing, authentication, rate limiting, request aggregation, and protocol translation.
Think of it as a hotel concierge. Guests don't wander into the kitchen, the laundry, and the maintenance office separately — they call the front desk, which routes their request to the right department.
What a gateway typically does:
Real-world example: A mobile app's "order details" screen needs the order, the user's shipping address, the product images, and tracking status. Without a gateway, the app makes four round trips over flaky mobile networks — each with its own auth, retries, and TLS handshake. With a gateway, the app makes one call to GET /orders/123/detail. The gateway parallel-fetches from order-service, user-service, catalog-service, and shipping-service, then returns a merged payload. Latency drops from 4×150ms serialized to ~180ms parallel.
Rule of thumb: If a typical client-facing page requires more than 2 backend calls, you probably need a gateway (or a BFF, which is a specialized gateway). If clients are issuing N+1 calls because "the API is too granular," that's a gateway-shaped problem.
Watch out for:
Common implementations: Kong, AWS API Gateway, Envoy, Traefik, NGINX. For internal-only meshes, Istio or Linkerd often replace the explicit gateway with a sidecar-driven approach.
Tool Nobody Knows
2026-05-18
You benchmark a query. First run: 4.2 seconds. Second run: 80ms. You shrug and call it "caching." But what is cached, how much, and can you control it? The Linux page cache is a massive lever on performance that almost nobody touches directly. vmtouch by Doug Hoyte is the surgical instrument for it.
Mainstream advice tells you to drop caches with echo 3 > /proc/sys/vm/drop_caches (nuclear, requires root, clears everything) or just "run it twice." vmtouch lets you inspect, load, evict, and lock specific files — per-page, no root needed for most operations.
Inspect what's resident:
$ vmtouch /var/lib/postgresql/data/base
Files: 1247
Directories: 38
Resident Pages: 89234/412908 348M/1.5G 21.6%
Elapsed: 0.18 seconds
21% of your Postgres data lives in RAM right now. Add -v for a per-file map showing which pages are hot (it draws an ASCII bar of resident vs. cold pages — genuinely useful for spotting which indexes get touched).
Pre-warm the cache before a benchmark or deploy:
$ vmtouch -t ./big-lookup-table.idx
Files: 1
Resident Pages: 524288/524288 2G/2G 100%
Elapsed: 1.4 seconds
No more "first request is slow" surprises after a restart. Pair it with a systemd unit and your service comes up hot.
Evict specific files to simulate cold-start without nuking the whole machine:
$ vmtouch -e ./test-data.parquet
$ hyperfine './my-query test-data.parquet'
You can now measure honest cold-cache numbers on a shared box without affecting anyone else's working set. This is the killer feature — drop_caches is a sledgehammer, vmtouch -e is tweezers.
Lock files in memory so they're never evicted:
# Pin the entire SQLite DB into RAM as a daemon
$ vmtouch -dl /var/db/hot.sqlite
Files: 1
Resident Pages: 131072/131072 512M/512M 100%
Locked Pages: 131072/131072 512M/512M 100%
Daemonized: PID 8421
Now the kernel can't evict those pages under memory pressure. For latency-sensitive read paths where you can spare the RAM, this beats any application-level cache because it's free — the kernel still serves reads at memory speed via the normal pread/mmap path.
One more trick: directory trees and globbing work. Want to know how much of your Git object store is in RAM after a build?
$ vmtouch .git/objects
Resident Pages: 4831/28104 18M/109M 17.1%
Or load every .so a binary will dlopen:
$ ldd ./myapp | awk '{print $3}' | grep -v '^$' | xargs vmtouch -t
Install: apt install vmtouch, brew install vmtouch, or grab the single ~600-line C file from hoytech.com — it's been stable for over a decade. It uses mincore(2) for inspection, posix_fadvise/mmap+mlock for loading and locking, and POSIX_FADV_DONTNEED for eviction. No kernel module, no root, no magic.
The day you realize page-cache state is something you can see and control, half of your "why is this slow sometimes?" mysteries dissolve.
What If Engineering
2026-05-18
In 1942, Geoffrey Pyke proposed Project Habakkuk: an aircraft carrier made of pykrete — water frozen with 14% wood pulp, which is roughly as strong as concrete but floats. The project died because keeping it cold cost too much. But what if we revived the idea — not for a warship, but for a permanent floating city anchored to the Arctic seabed?
The material. Pure ice has a compressive strength around 5 MPa and is brittle. Add sawdust or cellulose fibers, and pykrete jumps to roughly 7 MPa compressive, 5 MPa tensile — comparable to weak concrete, but with one-ninth the density (≈930 kg/m³). It also creeps slowly under sustained load, like glacial ice, and resists shattering: a bullet that pulverizes ice merely embeds in pykrete.
Sizing the slab. Let's design for a 50,000-resident city — call it 5 km² of usable surface. We need freeboard (above-water height) of at least 5 m to handle storm surge and wave action. Archimedes sets the geometry: if the slab has density 930 kg/m³ floating in seawater (1025 kg/m³), the submerged fraction is 930/1025 = 0.907. So 5 m of freeboard implies a total slab thickness of 5 / (1 - 0.907) ≈ 54 m.
Volume: 5,000,000 m² × 54 m = 2.7 × 10⁸ m³. Mass: ≈250 million tonnes. For comparison, the largest concrete gravity oil platform (Troll A) is 1.2 million tonnes. We are building 200 Troll As — out of frozen slurry.
The refrigeration problem. Here is where Habakkuk died. Heat flows in from above (sun, air, residents) and from below (seawater near 271 K, only 2 K above pykrete's melting point). For a 5 km² top surface in summer Arctic sun, solar gain alone is roughly 5 × 10⁶ m² × 200 W/m² = 1 GW. Bottom surface heat flux into ice from seawater is around 20 W/m² with natural convection, so another 10⁸ W = 100 MW.
Total cooling load: ~1.1 GW thermal. A modern industrial ammonia chiller has a COP near 3 at these temperatures, so we need ~370 MW of electrical input — about the output of a small nuclear reactor. An SMR (small modular reactor) like the BWRX-300 delivers 300 MWe. Two SMRs would keep the city solid.
Insulation matters more than refrigeration. If we cover the top with 2 m of foam insulation (k ≈ 0.03 W/m·K) and a reflective deck, solar gain drops 20×. Suddenly the cooling load is ~150 MW thermal, well within one SMR. The bottom is harder — we can't insulate the water-facing surface, but currents bring fresh cold water and the temperature gradient is small.
Structural creep. Pykrete under sustained stress flows at roughly 10⁻⁹/s per MPa. A 54-m slab sees ~0.5 MPa at its base, so it spreads laterally at ~1.5 cm/year. Manageable if you re-freeze the edges seasonally — essentially the slab self-heals, since seawater splashing onto the cold rim instantly freezes into more pykrete.
The catch. Climate change. Mean Arctic air temperatures have risen ~3°C since 1980. Each degree warmer roughly doubles your cooling load during the melt season. By 2080, today's design might need four SMRs instead of one.
Wikipedia Rabbit Hole
2026-05-18
Wikipedia: Read the full article
Imagine a passenger train that runs on a spinning top. Not metaphorically — literally. In the West Midlands of England, on a tiny branch line between Stourbridge Junction and Stourbridge Town, commuters board what looks like an ordinary single-carriage railcar. Under the floor, a 500-kilogram steel flywheel spins at up to 2,500 rpm, storing enough kinetic energy to launch the vehicle from a dead stop and haul it up the steepest passenger-rail gradient in Britain.
Parry People Movers was the brainchild of John Parry, an English engineer who became obsessed with a deceptively simple question in the 1970s: why do we keep burning fuel to accelerate a vehicle, only to throw all that energy away as heat every time we brake? His answer was the flywheel — a technology so old it predates the steam engine, used by potters and blacksmiths for millennia to smooth out rotational work.
Here's the elegant part. The Class 139 railcars that operate the Stourbridge line use a small LPG engine — not to drive the wheels directly, but simply to top up the flywheel. The flywheel is the actual prime mover. When the train brakes, kinetic energy flows back into the flywheel instead of being lost. It's regenerative braking in its purest mechanical form: no batteries, no inverters, no chemistry — just a heavy disc reluctant to slow down.
If this sounds familiar, it should. Formula 1 cars used essentially the same trick from 2009 to 2013 with mechanical KERS — flywheels spinning at over 60,000 rpm inside carbon-fibre housings. Williams Hybrid Power even adapted the technology for London buses and Audi's Le Mans-winning R18 e-tron quattro. The Stourbridge train is the unglamorous, working-class cousin of those exotic racing systems, quietly shuttling shoppers since 2009.
The Stourbridge Town branch is only 0.8 miles long — Britain's shortest branch line — but it's where the economics work beautifully:
And yet — despite winning awards and proving itself reliable for over a decade — the technology never spread. Britain's rail network kept buying heavy diesel units and electrifying expensive corridors instead. Parry People Movers went into administration in 2022, though the Stourbridge cars keep running, maintained as a quirky curiosity.
There's a deeper lesson hiding in this little train. Energy storage doesn't have to mean lithium. The fundamental physics of a spinning mass — angular momentum, the squared relationship between velocity and stored energy — has been understood since Newton. NASA uses flywheels on satellites. Data centres use them as no-break UPS systems. The Stourbridge shuttle just happens to be the only place where you can buy a ticket to ride one.
Daily YT Documentary
2026-05-18
Channel: Nerith fern (0 subscribers)
On January 15, 1919, a massive steel tank holding 2.3 million gallons of molasses ruptured in Boston's North End, releasing a 25-foot wave of viscous syrup that surged through the streets at an estimated 35 miles per hour. The Great Molasses Flood killed 21 people, injured 150, and flattened buildings, overturned a railway car, and tore an elevated train trestle from its supports.
What makes this disaster genuinely educational is the engineering failure behind it. The tank, owned by the Purity Distilling Company, had been hastily constructed in 1915 to meet wartime demand for industrial alcohol (used in munitions). It was built with steel plates that were too thin, never properly tested with water as required, and leaked so badly the company painted it brown to disguise the seeping molasses. The subsequent court case — one of the first major class-action lawsuits in Massachusetts — established important precedents around corporate liability and engineering accountability that still inform construction standards today.
The documentary covers the metallurgy of the failure, the role of temperature changes in the rupture, and the three-year legal battle that followed. Note: this is a 0-subscriber channel, so production quality may be modest, but the subject matter is one of history's strangest and most consequential industrial disasters.
Daily YT Electronics
2026-05-18
Channel: syyntra (32 subscribers)
Most of this week's batch is shorts, hashtag spam, or generic "how to use a scope" clips with thin descriptions. This one stands out because it sits at the intersection of analog electronics and creative signal generation — using an oscilloscope in XY mode as a vector display rather than a time-domain measurement tool.
The creator combines TouchDesigner (a node-based visual programming environment), TDAbleton (the bridge to Ableton Live), and Osci-render — a tool that converts 3D geometry and audio into stereo signals where the left channel drives the X axis and the right drives the Y. Feed those signals into a scope's XY mode and the electron beam traces out shapes in real time. Motion data from a controller modulates the audio, so the visuals react to physical input.
It's a worthwhile watch because it makes the Lissajous principle tangible: you're literally hearing the same waveform you're seeing, which is the cleanest possible demonstration of how two correlated signals produce 2D geometry. For anyone who's only used a scope to debug PWM or check ripple, seeing it driven as a CRT-era graphics device is a useful mental expansion. The channel is tiny (32 subs), so this is genuinely under-discovered work.
Daily YT Engineering
2026-05-18
Channel: Simple Engineer (1170 subscribers)
Steering looks simple from the driver's seat — turn the wheel, the car follows — but the linkage between your hands and the front tires hides a surprisingly clever piece of geometry. This video walks through the full mechanical chain: steering wheel, steering column, the gear mechanism (typically rack-and-pinion or recirculating-ball), tie rods, and steering arms that ultimately pivot the wheels around their kingpins.
What makes the topic worth a focused explainer is the Ackermann steering geometry problem: when a car turns, the inner and outer wheels trace circles of different radii, so they must steer by different angles to avoid scrubbing. A well-made animation makes this immediately visible in a way that text cannot. The video also typically covers the mechanical advantage built into the gearbox — why a small input force at the wheel translates to enough torque to overturn the tires' contact-patch resistance — and how that ratio trades off against steering responsiveness.
The channel is small (1.17k subs) and the title is plainly descriptive rather than clickbait. For anyone learning automotive mechanics, working on a vehicle dynamics project, or just curious why a car doesn't skid sideways through a corner, this is a solid foundational explainer that connects everyday driving to real mechanical engineering principles.
Daily YT Maker
2026-05-18
Channel: Applied Gray Matter (10 subscribers)
This is honestly a slim field — most candidates here are hashtag-spam shorts in the "prefab home" / "aluminum TV unit" genre with no actual instruction. The Applied Gray Matter shop tour is the only entry that promises real technical substance, even if the title reads like a business card.
UL 508A is the Underwriters Laboratories standard for industrial control panels — the certification that lets a panel shop legally stamp and ship enclosures destined for factory floors in North America. It governs everything from wire bend radii and SCCR (short-circuit current rating) calculations to component spacing, labeling, and the documentation trail. Most electricians never see inside a certified shop, so a walkthrough from a working 508A fabricator is genuinely uncommon content.
Expect to see how a small certified shop is laid out: the panel build benches, wire-prep stations, the labeling and QC area, and the paper trail that backs every panel that leaves the door. Even at a surface level, it gives a sense of why certified panels cost what they do compared to a hand-wired one-off — the overhead is process, traceability, and component pedigree, not the copper.
Caveat: with only 10 subscribers and a phone number in the title, this is closer to a business promo than a tutorial. Treat it as a field-trip glimpse into a trade most makers never encounter, not a deep technical lesson.
Daily YT Welding
2026-05-17
Channel: Buzz Box Fab (7660 subscribers)
A welding table is the single most important piece of equipment in any fabrication shop — it's your reference plane, your fixture base, and your grounding surface all in one. Buying a commercial fixture table can run several thousand pounds, which puts a real one out of reach for most hobbyists and small shops. This build tackles that problem head-on with a complete step-by-step process for fabricating a heavy-duty table for under £500, with free plans included.
What makes this worth watching over the dozens of other "welding table build" videos is the level of depth. Buzz Box Fab walks through material selection (gauge of the top plate matters enormously for flatness under heat), the cutting and squaring sequence for the frame, and the welding order needed to minimize distortion — a topic that catches out almost every first-time builder when their nice flat top warps into a potato chip after they weld the legs on.
The "under £500" constraint also forces some interesting engineering compromises that are educational in themselves: where to spend money (the top), where to save it (the frame steel), and which features (leveling feet, tool trays, casters) actually earn their keep. Free plans mean you can follow along and build your own rather than just watch.
