25 newsletters today.
Abandoned Futures
2026-06-02
In 1958, a small California company run by an engineer named Edmond R. Doak Jr. built an aircraft that did something the V-22 Osprey wouldn't reliably do for another four decades: it took off vertically, transitioned to horizontal flight without drama, and landed again. The Doak Model 16, designated VZ-4DA by the U.S. Army, used a configuration that has since been quietly rediscovered by half the eVTOL industry β tilting ducted fans on the wingtips.
The airframe was almost absurdly simple. A conventional-looking high-wing monoplane with a single Lycoming YT53-L-1 turboshaft producing 840 shaft horsepower in the fuselage, with shafting running out to two 4-foot diameter ducted propellers mounted on the wingtips. The ducts pivoted 90Β° from vertical (hover) to horizontal (cruise). A small reaction-control jet at the tail handled pitch authority in hover. Empty weight: 2,300 lbs. Max speed in cruise: 230 mph. Hover ceiling: roughly 6,000 feet.
Doak first flew it on February 25, 1958. It hovered. It transitioned. Test pilot George Edenborough made the first full conversion to horizontal flight by May 5, 1958. The Army accepted delivery in 1959 and flew it at Edwards AFB and later Langley by NASA. Over roughly 50 flight hours, it never had a serious accident. Pilots reported the transition was smoother than the helicopters they'd flown.
So why did it die?
Why it's viable now: Every reason it failed has evaporated.
fly-by-wire handles synchronization in software. Joby, Archer, and Beta Technologies are all flying variations of this concept right now.The most damning evidence that Doak was right is that Bell's V-280 Valor β the $1.3 billion winner of the Army's FLRAA program in 2022 β uses tilting nacelles on wingtip pylons. It is, conceptually, a 72,000-lb great-grandchild of a 2,300-lb prototype that flew before the Beatles formed.
The VZ-4 sits today at the U.S. Army Transportation Museum at Fort Eustis, Virginia. Visitors walk past it without a second look.
ArXiv Paper Digest
2026-06-02
Authors: Jianru Ding, Ryien Hosseini, Pouya Mahdi Gholami, Mingyuan Xiang
ArXiv: 2606.01839v1
PDF: Download PDF
When you ask an AI agent to do something β book a flight, debug code, research a topic β it doesn't answer in one shot. It thinks, calls a tool, reads the result, thinks again, calls another tool, and so on. Each of those "turns" is a separate trip through the language model, and the server has to schedule them alongside thousands of other users' turns.
Modern LLM servers split each turn into two phases that have very different performance profiles. Prefill is reading the prompt β compute-heavy, bursty. Decode is generating tokens one at a time β memory-bandwidth-heavy, long-running. A trick called disaggregation puts these phases on different GPUs so they don't step on each other. The catch: deciding whether to disaggregate a given turn requires knowing how long the decode will be, how much memory its context will eat, and whether the agent is about to call a tool. None of that is knowable when the turn arrives.
So today's systems guess. They train predictors on past traffic to estimate these quantities. When the predictor is wrong β and with agentic workloads, it often is β the scheduler makes bad placement decisions and throughput suffers.
This paper proposes a refreshingly simple alternative: stop predicting, start observing. Instead of treating each turn as an isolated scheduling decision, treat the whole multi-turn conversation as the unit. By the time turn 3 arrives, you've already observed how turns 1 and 2 behaved for this specific conversation β their decode lengths, their tool-call patterns, their memory footprint. That observed history is a far better signal than any general-purpose predictor.
The key insights:
It's a "stop trying to be clever, just look at what's happening" result β the kind of paper that makes you wonder why everyone was predicting in the first place.
Daily Automotive Engines
2026-06-02
A conventional single-scroll turbo dumps every cylinder's exhaust pulse into one common volute feeding the turbine wheel. That sounds fine until you remember that exhaust events are pulses, not steady flow, and adjacent cylinders in the firing order can stomp on each other. Cylinder 1's blowdown pulse arrives at the turbine while cylinder 3's exhaust valve is still cracked open β pressure backs up the runner, kills scavenging, and dilutes the next intake charge with residual gas. The turbo eventually spools, but you've paid for it in pumping losses and lazy throttle response.
Twin-scroll architecture splits the turbine housing into two parallel volutes, each fed by a separate exhaust manifold runner that pairs cylinders with non-adjacent firing events. On a typical inline-4 with firing order 1-3-4-2, you pair cylinders 1+4 in one scroll and 2+3 in the other. Now each scroll sees a pulse every 360Β° of crank rotation instead of overlapping pulses every 180Β°, and the cylinder currently exhausting never shares a runner with a cylinder whose valve is about to open.
The payoff:
Real-world example: The Subaru WRX swapped from a single-scroll IHI VF52 (2008-2014) to a twin-scroll IHI on the FA20DIT (2015+). Peak torque arrived at 2000 RPM instead of 4000, and turbo lag at part-throttle dropped noticeably even though peak boost was similar.
Rule of thumb for pairing: On any 4-stroke engine, two cylinders should share a scroll only if their exhaust events are separated by 360Β° of crank rotation. For an I4, that means pairing the two outer cylinders together and the two inner cylinders together (firing order 1-3-4-2 β scrolls are 1+4 and 2+3). For an inline-6 with firing order 1-5-3-6-2-4, you split into 1-2-3 and 4-5-6.
The cost is complexity: a divided manifold (often integrated into the head on modern designs), a divided turbine housing with a twin-port wastegate, and tighter casting tolerances to keep the scrolls sealed from each other. Crack the divider wall and you've effectively built an expensive single-scroll.
Daily Debugging Puzzle
RegExp.test() with /g Flag: The Stateful Matcher That Skips Every Other String2026-06-02
This function categorizes strings into those containing a digit and those without. It uses a module-level regex (because allocating one per call would be wasteful) and walks the input once.
const HAS_DIGIT = /\d/g;
function categorize(strings) {
const withDigits = [];
const without = [];
for (const s of strings) {
if (HAS_DIGIT.test(s)) {
withDigits.push(s);
} else {
without.push(s);
}
}
return { withDigits, without };
}
const { withDigits, without } = categorize(
["abc1", "def2", "ghi3", "jkl4", "mno"]
);
console.log(withDigits); // expected: ["abc1","def2","ghi3","jkl4"]
console.log(without); // expected: ["mno"]
Unit tests with a single string pass. Tests with a freshly constructed regex inside the loop pass. But in production, roughly half the digit-bearing strings get misfiled into without. Worse, the exact set that goes missing depends on the order of the input.
The /g flag turns HAS_DIGIT into a stateful object. Each successful test() advances the regex's internal lastIndex property to the character after the match. The next call starts scanning from lastIndex β even if you pass a completely different string.
Walk through the input:
"abc1": lastIndex=0, finds "1" at index 3 β true, lastIndex=4."def2": lastIndex=4, but the string is only 4 chars long β false, lastIndex resets to 0."ghi3": lastIndex=0 β true, lastIndex=4."jkl4": lastIndex=4 β false, reset."mno": no match β false.The function reports withDigits = ["abc1","ghi3"] and without = ["def2","jkl4","mno"]. The same regex object is being asked "where's the next match in this conceptually-continuing stream?" β except the stream changes underneath it. JavaScript shares this footgun with the global RegExp.exec() and with regexes carrying the /y (sticky) flag, but /g is by far the most common source.
The /g flag looks innocuous on test() β there's no "global" semantics in a boolean answer. Many developers add it reflexively, or copy it from a regex that was previously used with String.replace. Linters typically don't flag it. And the bug only manifests when the regex is reused β single-call usage and freshly-created regexes both work fine, which is exactly what tests tend to exercise.
Drop the /g flag. For a yes/no membership question, you don't need it:
const HAS_DIGIT = /\d/; // no /g
function categorize(strings) {
const withDigits = [];
const without = [];
for (const s of strings) {
(HAS_DIGIT.test(s) ? withDigits : without).push(s);
}
return { withDigits, without };
}
If you genuinely need /g (e.g., you're iterating matches with exec), either reset HAS_DIGIT.lastIndex = 0 before each new string, or use String.prototype.matchAll, which returns a fresh iterator and doesn't mutate the regex.
/g (or /y) is a stateful object β reusing it across test() or exec() calls makes results depend on call history, so omit the flag when you only need a boolean.
Daily Digital Circuits
2026-06-02
Single Data Rate (SDR) memory captures one bit per clock edge. DDR (Double Data Rate) captures on both the rising and falling edge, doubling bandwidth without doubling the clock frequency. Sounds simple β until you realize the receiver has to find a clean spot in the middle of a 1.25 ns data eye while the data and clock are racing each other across a PCB.
The trick is the DQS (Data Strobe) signal. DDR doesn't send a free-running clock with the data β it sends a bidirectional strobe that toggles only when data is valid. DQS is generated by whichever side is driving (controller on writes, DRAM on reads), so it tracks the data's exact flight time. This is source-synchronous done right: the strobe and data share the same wire delays.
But there's a catch. The DRAM sends DQS edge-aligned with the data (transitions happen together). The controller needs the strobe center-aligned with the data (strobe edges in the middle of the data eye). So on reads, the controller delays DQS by a quarter cycle using a DLL (Delay-Locked Loop) before using it to clock the input flip-flops. Writes are the reverse: the controller pre-centers DQS before sending it.
Real-world example: DDR4-3200 runs at 1600 MHz with 3200 MT/s. That's a 625 ps unit interval β about the time light travels 7 cm in FR4. The data valid window after accounting for setup/hold, ISI, and jitter might be only 200 ps wide. Your DQS delay has to hit a 100 ps target in the middle of that window, and stay there across temperature drift from 0Β°C to 95Β°C. That's why DDR PHYs spend the first few microseconds of boot running write leveling and read training, sweeping the DQS delay across the eye and picking the center.
Rule of thumb: For every doubling of data rate, the timing budget halves but the eye closes by more than half because ISI and crosstalk grow super-linearly. DDR5-6400 has roughly 1/4 the eye width of DDR3-1600 at the pin, which is why DDR5 added on-die ECC and per-DRAM decision feedback equalization (DFE) β the eye is literally too closed to read reliably without active equalization at the receiver.
One subtle gotcha: DQS is bidirectional and tri-stated between bursts. The receiver has to detect the preamble (a known low period before the first valid edge) to avoid latching noise from the floating bus. DDR4 added a programmable 2-clock preamble specifically because 1-clock wasn't long enough at high speeds for the receiver to gate its strobe input cleanly.
Daily Electrical Circuits
2026-06-02
A standard op-amp integrator (resistor in, capacitor in feedback) integrates voltage. But many sensors β photodiodes in coulomb-counting mode, electrometers, piezoelectric force sensors, ionization chambers β produce charge or current that you want to integrate directly without converting through a resistor first. The charge integrator (also called a charge-sensitive amplifier, or CSA) is the right tool, and it's the front end of nearly every particle detector and CT scanner on Earth.
The topology is deceptively simple: invert the op-amp with a capacitor C_f in the feedback path, and tie the sensor directly to the inverting input. The output voltage is:
V_out = βQ_in / C_f
Every coulomb of charge injected at the summing junction shows up as a voltage step of 1/C_f volts. With C_f = 1 pF, a single femtocoulomb (6,240 electrons) gives you 1 mV β easily measurable. This is why CSAs dominate radiation detection: a silicon detector dumps ~3.6 eV per electron-hole pair, so a 60 keV gamma deposits ~17,000 electrons (~2.7 fC), which becomes a clean 2.7 mV pulse.
The reset problem. A pure integrator has infinite DC gain. Op-amp bias current (even 1 pA on a JFET part) charges C_f until the output saturates. Three fixes:
Noise rules of thumb. The Equivalent Noise Charge (ENC) of a CSA scales as ENCΒ² β aΒ·C_detΒ²/Ο + bΒ·C_detΒ²Β·Ο, where C_det is detector capacitance. Two consequences: (1) minimize C_det β keep traces short, use a low-capacitance detector. (2) There's an optimum shaping time Ο; too fast and series noise dominates, too slow and parallel (leakage) noise wins. For a typical silicon detector at room temp, Ο β 1β3 Β΅s is the sweet spot.
Op-amp choice: JFET or CMOS input is mandatory β bipolar bias currents (nA range) overwhelm the signal. Look for parts with input capacitance under 5 pF and en under 5 nV/βHz. The OPA657, ADA4817, or LMP7721 are classic CSA choices.
Daily Engineering Lesson
2026-06-02
PID control is reactive β it waits for an error to develop, then corrects. Feedforward control is proactive: it measures or predicts a disturbance and pre-compensates before the process variable ever deviates. The two are almost always used together, with feedforward doing the heavy lifting and PID cleaning up the residual.
The core idea: if you know a disturbance is coming and you know how the process responds to it, you can calculate the corrective action in advance. You don't need to wait for the thermometer to drop or the tank level to fall.
Classic example β a steam-heated water heater: cold feedwater enters a tank, steam heats it, and you want the outlet at 80 Β°C. A pure PID loop on outlet temperature works fine at steady state. But when feedwater flow suddenly doubles, the outlet temperature crashes before the PID even notices, and it takes 30+ seconds to recover.
Add a flow meter on the feedwater. The heat balance is simple:
Result: when flow jumps, steam jumps simultaneously, and the outlet temperature barely moves. PID only trims the small modeling error.
Where feedforward shines:
Real-world examples:
Rule of thumb: feedforward handles the predictable 80% of a disturbance; feedback handles the unpredictable 20% β measurement noise, modeling error, unmeasured disturbances. Never use feedforward alone; without feedback, any model mismatch causes permanent offset.
The trap: feedforward requires a model. If your process gain changes (fouling, wear, seasonal effects), the feedforward calculation drifts off. Periodic recalibration matters.
Forgotten Books
2026-06-02
Book: American Photography (January, 1917) by American Photographic Publishing Company, edited by Frank R. Fraprie (1917)
Read it: Internet Archive
Buried in a full-page advertisement in the January 1917 issue of American Photography β a respected monthly edited by the prolific Frank R. Fraprie β is a product pitch that, viewed from 2026, reads like a Victorian prophecy of the digital age. Burke & James of Chicago and New York were touting their new Rexo Record Film, and the headline feature was not its speed or its silver-rich emulsion, but something they called simply:
The Recording Feature: Ample space is provided between each negative for writing thereon full data relating to each picture. This record is made with ordinary black ink, after the film is developed.
Their slogan β "Every Click a Picture" β was paired with an even more striking promise: "a new aid to better pictures." The pitch was that a serious amateur could annotate, on the film itself, the date, the location, the exposure, the subject. The metadata traveled with the negative forever.
This is, of course, EXIF. It is the little block of invisible data that every smartphone in your pocket silently stitches onto every JPEG β shutter speed, aperture, GPS coordinates, lens model, the precise second the photon hit the sensor. Modern photographers take it utterly for granted. But in 1917, the problem was already well understood: a shoebox of unlabeled negatives is a shoebox of mysteries. Burke & James solved it with the lowest-tech possible substrate β bottle ink on gelatin β and made the recording part of the film stock itself.
Did it work? Mostly, yes. The interframe gap on roll film had always been a tiny strip of wasted real estate; widening it and inviting the photographer to write on it cost almost nothing. Surviving Rexo negatives in collector archives do show handwritten captions in the margins. But the idea never caught on industrywide, for two reasons that feel obvious in hindsight:
What makes the ad poignant is how clearly the inventors understood the user need a full century before the technology existed to solve it elegantly. They knew that a picture without context is half a picture. They knew amateurs would happily pay for "a higher percentage of clear, sharp pictures" β language that could be lifted verbatim into a modern computational-photography press release. They were right about the problem and approximately right about the solution; they just didn't have a camera that could write to its own film.
By 1943, the U.S. War Department's darkroom manuals (TM 11-404 and TM 11-405) catalog every piece of equipment a field photographer might need β trays, siphons, ferrotype plates, tongs β but no record-keeping film. The idea had quietly died.
Forgotten Patent
2026-06-02
On October 25, 1906, an obscure American inventor named Lee de Forest filed a patent for a "Device for Amplifying Feeble Electrical Currents." It was granted as US Patent 841,387 on January 15, 1907. The device looked unremarkable β a glass bulb with three electrodes inside. De Forest called it the Audion. He barely understood how it worked. Yet this three-electrode vacuum tube β the triode β would become the single most important component of electronics for the next 40 years, and the conceptual ancestor of every transistor ever made.
The setup was deceptively simple. De Forest took John Ambrose Fleming's two-electrode diode valve (patented 1904, which only rectified current) and added a third electrode: a zigzag wire grid between the heated cathode and the cold anode. A tiny voltage applied to the grid could control a much larger current flowing from cathode to anode. For the first time in history, a weak electrical signal could control a strong one. That is the definition of amplification β and amplification is the precondition for nearly everything electronic.
De Forest himself was a chaotic figure. He thought the gas inside the bulb was essential (it wasn't β it actually hurt performance). He was sued repeatedly by Fleming and by Edwin Armstrong, who actually figured out the regenerative feedback circuit that made the Audion useful. De Forest lost most technical arguments but won the patent fights. By the 1910s, AT&T had bought the rights and was using triodes to build continental telephone repeaters β the 1915 transcontinental phone line from New York to San Francisco was impossible without them. Signals that decayed over copper could finally be boosted, perfectly, anywhere along the line.
What the Audion enabled is staggering in retrospect:
Then in December 1947, at Bell Labs, Bardeen, Brattain, and Shockley built the first point-contact transistor. It did exactly what the Audion did β a small signal controlling a large one via a third terminal β but in a tiny chunk of germanium instead of a glass bulb. The functional lineage is direct: cathode β emitter, anode β collector, grid β base. Every transistor in your phone is a solid-state descendant of de Forest's grid.
The modern relevance is almost comic. There are roughly 13 sextillion transistors manufactured per year as of the mid-2020s β more than every grain of sand on Earth, every year. Each one performs the trick de Forest stumbled into in 1906: let a small voltage gate a larger current. A modern 3nm MOSFET in an Apple M-series chip operates at the same conceptual level as the Audion. The geometry shrank by a factor of 10 million, the speed increased by a factor of 10 billion, and the power dropped by 15 orders of magnitude β but the idea is identical.
Vacuum tubes themselves never fully died. Magnetrons still cook your food in microwave ovens. Klystrons and traveling-wave tubes still drive satellite uplinks and particle accelerators. Audiophiles still pay thousands for triode amplifiers because the distortion characteristics are pleasingly nonlinear. De Forest, who died in 1961 nearly broke despite winning the patent wars, would be astonished to learn his "feeble current amplifier" became the atom of the information age.
Daily GitHub Zero Stars
2026-06-02
Language: Unknown (JavaScript/TypeScript expected)
This repo tackles a problem that sounds simple until you actually try to solve it in the browser: capturing a screen recording and then trimming it to an exact frame range, entirely client-side. The author ships two primitives β a useScreenRecorder() React hook and a <VideoTrimmer /> component β and leans on the modern WebCodecs API to do the heavy lifting without round-tripping video through a server or shelling out to ffmpeg.wasm.
Why this is interesting:
Who benefits: anyone building bug-report tools, async standup recorders, Loom-style products, customer support widgets, or educational platforms where users record their screen and need to clip off the awkward intro before sharing. Teams currently paying for Loom or Vimeo Record SDKs could plausibly replace them with something like this for free.
Caveats worth noting: WebCodecs has uneven browser support (Safari only recently caught up), and the repo is brand new with zero stars, so production-readiness is unknown. Worth a look, worth a star.
Daily Hardware Architecture
2026-06-02
Every cache coherence transaction β invalidate, share, transfer β operates on a cache line, not a byte or a word. Modern x86 and ARM settled on 64 bytes. Why not smaller, to reduce false sharing? Why not larger, to amortize the protocol overhead? The answer reveals a tension between three competing pressures that hardware designers can't escape.
The three forces:
The false sharing tax: Because coherence operates on whole lines, two cores writing different 4-byte counters in the same 64-byte line ping-pong the line back and forth at L3-miss latency (~40ns) instead of L1-hit latency (~1ns) β a 40Γ slowdown for code that has zero logical sharing.
Real-world example: The Linux kernel's per-CPU counters use ____cacheline_aligned to force each counter onto its own 64-byte line. Before this was systematized (~2.6 era), profilers showed network stack throughput dropping 30% on 8-core boxes because struct net_device packed RX and TX statistics into adjacent words. The fix was literally inserting padding β no algorithmic change, just spacing.
Why not 128 bytes? Intel's L2 prefetcher actually fetches pairs of 64-byte lines (the "adjacent line prefetcher"), effectively giving you 128-byte spatial locality when it pays off, without the false sharing cost of an actual 128-byte coherence unit. This is why std::hardware_destructive_interference_size on x86 is often 128 β the prefetcher pulls in the neighbor, so to truly avoid contention you need 128-byte separation.
Rule of thumb: If two threads write the same struct, place hot fields at least 128 bytes apart on modern x86, 64 bytes on most ARM. Measure with perf c2c β it directly identifies cache-line contention by tracking HITM (hit-modified) events.
Hacker News Deep Cuts
2026-06-02
Link: https://www.fidonet.org/inet92_Randy_Bush.txt
HN Discussion: 2 points, 0 comments
This is a 1993 paper by Randy Bush β yes, that Randy Bush, the network operator whose name shows up in countless RFCs and BGP routing discussions β presented at INET '92. It's a primary-source account of how Fidonet actually worked: the technology, the social structures, and the politics of running a global store-and-forward messaging network on dial-up modems before the commercial internet existed.
Why does this deserve attention in 2026? A few reasons:
It's a plain-text file on fidonet.org, so it loads instantly and will outlive most of what's on the web. Worth thirty minutes for anyone interested in protocol design, network governance, or the history of online community.
HN Jobs Teardown
2026-06-02
Source: HN Who is Hiring
Posted by: gdeglin
OneSignal's posting (ID 22665848) is the most analytically rich entry in this batch because it quietly hands you the company's entire growth thesis in two numbers: 6 billion daily notifications and 1 million registered developers. Both are framed as multiples of SendGrid and Twilio at IPO β that's not a casual flex, that's a deliberate pre-IPO positioning narrative.
What the stack implies (even though it's not listed): Delivering 6B daily push notifications means roughly 70,000 messages per second sustained, with massive spikes. That workload screams a few things: a fanout-optimized queueing layer (Kafka, NATS, or custom), aggressive sharding by app_id, and tight integrations with APNs/FCM/web push protocols. The fact that they're hiring Full Stack and Backend engineers β not specifically SREs or distributed systems specialists β suggests the hard infra problems are largely solved, and the next bottleneck is product surface area: dashboards, segmentation UIs, A/B testing, journey builders. That's the playbook of a company moving from "pipe" to "platform."
Stage signals:
Skills/trends highlighted: The unstated but obvious one is developer-led growth. 1M registered developers means a self-serve, freemium funnel β which means the backend has to be multi-tenant from day one, with hard quota enforcement and abuse prevention. Anyone joining is going to spend real time on rate-limiting, deliverability metrics, and the unglamorous infrastructure of trust (spam prevention, opt-out compliance, GDPR).
Green flags: Concrete metrics instead of vague "fast-growing startup" language; named comparables; specific role focus.
Red flags: No salary band, no equity discussion, no mention of the team size or tech stack β the posting is optimized for the company's narrative, not the candidate's evaluation. The SendGrid/Twilio comparison also subtly elides that both of those companies built far more defensible infrastructure (SMTP reputation, telecom interconnects) than push notification routing, which is fundamentally a thin layer over APNs/FCM that Apple and Google control.
Daily Low-Level Programming
2026-06-02
Before 2010, switching address spaces on x86 was brutal: writing to CR3 flushed the entire TLB. Every context switch meant the new process started cold, paying a TLB-miss tax on every memory access until its working set was re-walked from the page tables. PCID (Process-Context Identifier) fixed this by tagging each TLB entry with a 12-bit ID, so entries from different address spaces can coexist.
The mechanism: when PCID is enabled (CR4.PCIDE=1), the low 12 bits of CR3 become the PCID, not part of the page-table address. A MOV to CR3 now means "switch to address space X with ID Y" β and entries tagged with other IDs survive untouched. Bit 63 of the CR3 value controls whether even the current PCID's entries get flushed: set it, and you get the old behavior; clear it, and the TLB persists.
The Meltdown wrinkle. When Linux deployed KPTI (Kernel Page-Table Isolation) in 2018, every syscall suddenly required two address-space switches β userβkernelβuser. Without PCID, this would have meant two full TLB flushes per syscall. Linux instead uses two PCIDs per process: one for user-mode page tables, one for kernel-mode. The kernel sets bit 63 of CR3 to skip the flush, and the TLB entries for both views survive across the syscall boundary. On a Skylake without PCID, KPTI cost 30%+ on syscall-heavy workloads; with PCID, it dropped to roughly 5%.
Rule of thumb. A TLB miss costs ~20-100 cycles (a 4-level page walk, possibly with cache misses on each level). A modern process has a working set of hundreds to thousands of TLB entries. Without PCID, a context switch back to a previously-running process pays ~20,000-100,000 cycles repopulating the TLB. With PCID, near zero β as long as the entries weren't evicted in the meantime.
The limit. Only 4096 PCIDs exist. Linux maintains a per-CPU LRU map of ~6 PCIDs per CPU, recycling aggressively. When a process gets a fresh PCID assignment, you pay the cold-cache cost anyway. You can see this in perf stat -e dtlb_load_misses.miss_causes_a_walk spiking after a long sleep.
Real-world impact. Redis, PostgreSQL, and anything doing rapid syscalls saw measurable slowdowns post-Meltdown on pre-Haswell CPUs (no PCID support) β some shops literally replaced hardware to recover throughput. Check /proc/cpuinfo for the pcid flag; if missing, KPTI is expensive and you should consider nopti if your threat model permits.
RFC Deep Dive
2026-06-02
If you've ever written $.foo.bar[0] in JSONPath, used JSON Patch to modify a document, or seen a cryptic "$ref": "#/components/schemas/User" in an OpenAPI spec, you've brushed up against JSON Pointer. RFC 6901 is a tiny specification β barely seven pages β that defines a string syntax for identifying a specific value within a JSON document. It is one of those quiet plumbing standards that quietly underpins an enormous amount of modern tooling.
The problem. JSON documents are trees. Once you have one, how do you unambiguously point to a node inside it? You might say "the third element of the items array inside order." But protocols need a machine-readable form. XML solved this long ago with XPath, a rich query language. The JSON community wanted something far simpler β no predicates, no functions, no axes β just a fragment identifier that names exactly one location.
The design. A JSON Pointer is a Unicode string of zero or more reference tokens, each prefixed by /. Evaluation starts at the document root and walks down. Given:
"" β the whole document"/foo" β the value at key foo"/foo/0" β the first element if foo is an array"/" β the value at the empty-string key (yes, JSON allows that)"/a~1b" β the value at key a/b"/m~0n" β the value at key m~nThat last pair is the spec's one piece of cleverness. Because / is the separator and ~ is the escape character, they have to be encoded inside a token: ~1 for / and ~0 for ~. The order matters β you must replace ~1 before ~0, or you'd corrupt a key like ~1. Implementers get this wrong constantly.
Array indices are decimal integers with no leading zeros, plus a special token - meaning "the nonexistent element just past the end." That's not useful for retrieval, but it's vital for JSON Patch (RFC 6902), where you might want to append to an array.
URI fragments. Section 6 defines how to embed a pointer in a URI fragment, e.g. http://example.com/schema.json#/definitions/Address. This is the form JSON Schema and OpenAPI bake into their $ref mechanism. The fragment must be percent-encoded, so a pointer with non-ASCII characters or reserved URI characters gets two layers of escaping. This is another quiet source of bugs.
Why it matters today. JSON Pointer is the addressing layer for an entire ecosystem:
$ref to compose and reuse schemas.$ref mechanism β every modern API spec is glued together by JSON Pointers.kubectl patch and many CRDT-flavored sync engines.The interesting omission. JSON Pointer deliberately can not express "all elements of an array," "every object with property X," or any kind of wildcard. That's what JSONPath (and later, the standardized RFC 9535) is for. RFC 6901 picked unambiguous, single-target identification as its entire job β and by refusing to do more, it became universally implementable in about fifty lines of code in any language. Mark Nottingham, one of the editors, has noted that the spec's brevity was the point: a thing this fundamental needed to be impossible to get wrong at the design level, even if implementers still find ways at the escaping level.
Stack Overflow Unanswered
2026-06-02
The asker has a vmlinux with debug symbols and a .ko module they need to debug during init. The usual recipe β load the module, read its section addresses from /sys/module/<name>/sections/, then add-symbol-file in GDB β doesn't work here because by the time you've gathered that information, module_init() has already run. They need symbols resolved before the module's init code executes.
This is a classic chicken-and-egg: you can't know where the module will be loaded until the kernel allocates memory for it, but you need a breakpoint before that memory is used.
The kernel itself loads modules through do_init_module() in kernel/module/main.c (or kernel/module.c on older trees). The trick is to set a breakpoint in the kernel that fires after the module's sections have been allocated and relocated but before do_one_initcall(mod->init) runs.
-s -S (or use kgdb on real hardware).vmlinux symbols, then break do_init_module.insmod the module. When the breakpoint fires, the struct module *mod argument has all the section addresses you need: mod->mem[MOD_TEXT].base, plus the .data, .bss, .rodata sections (older kernels use mod->core_layout.base).add-symbol-file mymod.ko <text_addr> -s .data <data_addr> -s .bss <bss_addr> ... command.init function and continue.The kernel's own scripts/gdb/linux/symbols.py (the lx-symbols command) does almost exactly this β it walks the module list and loads symbols automatically. The catch is that it normally runs after modules are loaded. You can either hack it to break at do_init_module entry, or call lx-symbols from inside that breakpoint handler before stepping into do_one_initcall.
mod->core_layout/init_layout got replaced by the mod->mem[] array. Your script must match the kernel version.vmlinux symbols are loaded against the actual runtime base (use the kernel's nokaslr boot param for sanity while debugging).-O1 or -Og and EXTRA_CFLAGS += -g; -O2 will inline most of module_init away.core_initcall in a built-in module won't go through do_init_module at all β different path entirely.Daily Software Engineering
2026-06-02
When you need to distribute work across N servers, the obvious approaches both have problems. Random assignment is cheap but unlucky β some servers get hammered while others sit idle. Least-loaded (pick the server with the shortest queue) is optimal but requires querying every server on every request, which doesn't scale.
The Power of Two Choices (sometimes called "two random choices" or P2C) is the sweet spot: pick two servers at random, then send the request to whichever has the shorter queue. That's it. One extra comparison per request, no global state, and the results are stunningly good.
The math is the surprising part. With pure random assignment to N servers, the maximum queue length grows as log N / log log N. With Power of Two Choices, it drops to log log N / log 2 β exponentially better. For 1,000 servers under load:
Going from one choice to two gives you most of the benefit. Going from two to three barely moves the needle. That's the rule of thumb: two is the magic number.
Real-world example: NGINX added P2C as its random two least_conn load balancing method specifically for distributed proxy fleets. When you have multiple NGINX instances each load balancing to the same backend pool, pure least-connections creates a "herd" problem β every proxy independently picks the same "least loaded" backend, instantly overloading it. P2C breaks the herd because the two random picks differ across proxies. HAProxy, Envoy, and Netflix's internal load balancers all use variants of this.
Where it shines: stateless request routing, connection pools, work queue dispatchers, sharded cache lookups. Anywhere you'd reach for "least loaded" but can't afford the coordination cost.
Where to be careful: the "load" signal needs to be meaningful and recent. If you measure "connections open" but requests vary wildly in cost, you're picking the wrong dimension. And if your load metric is stale (cached for 10 seconds), P2C degrades toward random. Update the signal on every dispatch, even if it's just a local counter.
The deeper lesson: a tiny amount of information often beats either zero information or perfect information. Two samples isn't much, but it's infinitely more than one.
Tool Nobody Knows
2026-06-02
Everyone reaches for nohup, tmux, or at when they want to background a job. Almost nobody reaches for the tool that systemd itself uses under the hood: systemd-run. It launches an arbitrary command as a transient service, scope, or timer β giving you journal logging, real cgroup-enforced resource caps, sandboxing, and calendar scheduling without ever writing a .service file.
The basic split: --scope runs the command inside your current terminal as the foreground process. --service (the default) forks a detached unit you talk to via journalctl/systemctl.
Capture stdout, stderr, and exit code in the journal, keyed by a unit name:
systemd-run --user --unit=nightly-ingest ./ingest.sh
journalctl --user -fu nightly-ingest.service
systemctl --user status nightly-ingest.service
Cap a runaway build at 4 GB and two cores β actual kernel enforcement, not ulimit wishful thinking:
systemd-run --user --scope \
-p MemoryMax=4G -p MemorySwapMax=0 \
-p CPUQuota=200% \
cargo build --release
Block until done and return the exit code (great in CI scripts):
systemd-run --user --wait --pipe --collect \
-p MemoryMax=2G ./flaky-converter input.bin
--pipe wires stdin/stdout through so it composes in pipelines. --collect garbage-collects the unit on exit so you don't accumulate failed-unit corpses.
Schedule a one-shot in 90 minutes, no at daemon needed:
systemd-run --user --on-active=90min \
--unit=cache-purge /usr/local/bin/purge.sh
A recurring timer with calendar syntax that beats cron's:
systemd-run --user --on-calendar="Mon..Fri 09:30" \
--unit=workday-report ./report.sh
Drop into a sandboxed shell β no network, private /tmp, read-only system, writable only where you say:
systemd-run --user --pty --same-dir \
-p PrivateNetwork=yes -p PrivateTmp=yes \
-p ProtectSystem=strict \
-p ReadWritePaths=$PWD \
bash
Inspect everything transient currently running:
systemctl --user list-units --type=service "run-*" "*.service"
systemd-cgls --user
systemd-cgtop
Why this beats the alternatives:
nohup gives you a runaway log file and zero resource control.tmux/screen keeps the process alive but can't cap memory or CPU.at can't repeat and offers no sandboxing.daemon-reload, and stick around forever.Gotchas worth knowing:
--user units die at logout unless you run loginctl enable-linger $USER once.--scope is tied to your shell; close the terminal and it's gone. Use --service for true background.-p StandardOutput=file:/var/log/foo.log.systemd.exec(5) documents: IOWeight, Nice, AmbientCapabilities, BindReadOnlyPaths, RuntimeMaxSec⦠the whole zoo.Once you internalize that "run a command" and "configure a systemd unit" are the same operation, you stop writing .service files for half the things you used to.
What If Engineering
2026-06-02
The US produces about 12 million tons of waste glass annually, and only a third gets recycled. Meanwhile, we lay roughly 350 million tons of asphalt per year on roads that crack, rut, and require resurfacing every 10-15 years. What if we sintered crushed glass cullet into highway-grade pavement instead?
The physics is more interesting than "glass road = slippery death." Soda-lime glass has a compressive strength of ~1,000 MPa β roughly 30Γ that of typical asphalt concrete (35 MPa) and 3Γ Portland cement concrete. Its Mohs hardness of 5.5 means it laughs at studded tires that grind asphalt to dust. So why don't we already do this?
You can't just pour molten glass on a highway. Bulk fusion requires ~1,500Β°C β energetically absurd for road construction. The trick is vitrified aggregate: crushed glass particles bonded by partial surface melting at ~720Β°C, similar to how foam glass insulation is made. Energy budget per square meter of 20 cm-thick road:
Volume: 0.2 mΒ³/mΒ²
Mass: ~500 kg (density 2500 kg/mΒ³)
ΞT: 700 K
Cp glass: 840 J/(kgΒ·K)
Energy: 500 Γ 840 Γ 700 β 294 MJ/mΒ²
β 82 kWh/mΒ²
At industrial natural gas prices (~$0.04/kWh-thermal), that's $3.30/mΒ² of energy cost. A one-mile, four-lane highway (~22,000 mΒ²) needs ~1.8 GWh and costs ~$73,000 in fuel β comparable to the asphalt binder it replaces. Manageable.
Wet glass has a coefficient of friction around 0.2 β terrifying for highway speeds, where we want ΞΌ β₯ 0.5. Solution: etched macrotexture. During the sintering pass, embed coarse silicon carbide grit (Mohs 9) at 15-20% volume fraction in the top 5 mm. The SiC stays proud as the glass matrix wears, maintaining microtexture indefinitely. Think of it as permanently chip-sealed road.
This is where it gets dicey. Glass has a thermal expansion coefficient of ~9Γ10β»βΆ/K and modest tensile strength (~50 MPa). A 40Β°C diurnal swing in a constrained slab generates:
Ο = EΒ·Ξ±Β·ΞT = 70 GPa Γ 9e-6 Γ 40
= 25 MPa
That's half the tensile strength β no margin for ice expansion or load stress. We'd need expansion joints every 3-4 meters (vs. 30+ for concrete) and a borosilicate-style formulation with Ξ± β 3Γ10β»βΆ/K. Borosilicate triples raw material cost.
The catch: in-place sintering requires towing a 2 MW induction or microwave kiln behind the paver at ~1 m/min. That's a 30 GJ/hour machine crawling down I-95. Doable, but it's a fundamentally different construction logistics chain β closer to railroad welding than asphalt paving.
Wikipedia Rabbit Hole
2026-06-02
Wikipedia: Read the full article
In The Hunt for Red October, the Soviet submarine's defining feature is its "caterpillar drive" β a silent propulsion system with no propellers, no moving parts, just water flowing through the hull and emerging with thrust. Tom Clancy didn't invent it. He borrowed it from a 1960s physics experiment, and a working version was quietly launched into Kobe Harbor in 1992.
The principle is almost insultingly simple if you remember high-school physics. Pass a current through a conductor inside a magnetic field, and the conductor feels a force perpendicular to both β the Lorentz force, the same effect that spins every electric motor on Earth. Now replace the solid conductor with seawater, which conducts electricity reasonably well thanks to its dissolved salts. Electrodes on one side, magnets on the other, and the water itself becomes the rotor. It squirts out the back. The vessel moves forward. No shaft, no bearings, no cavitation noise.
The first patent was filed in 1958 by an engineer named Stewart Way at Westinghouse, who later built a 10-foot working submarine model in 1966 and successfully drove it around a swimming pool. Then came the Japanese ship Yamato 1, an actual 30-meter passenger vessel powered by twin MHD thrusters using superconducting magnets cooled with liquid helium. It hit a top speed of about 15 km/h. That sounds underwhelming, and it is β which is the central problem with MHD drives.
The efficiency is brutal. Seawater is a terrible conductor compared to copper β about 10βΈ times worse β so you need monstrous magnetic fields (think 5β15 tesla, MRI-machine territory) just to get respectable thrust. Most of the electrical energy goes into electrolyzing the water and heating it, not pushing the ship. Yamato 1's overall efficiency was estimated below 30%, while a conventional propeller manages 70% or more.
But the use case isn't speed β it's silence. A propeller generates cavitation bubbles that collapse with enough acoustic signature to be heard across oceans by passive sonar. An MHD drive has nothing rotating, nothing pulsing, nothing slapping the water. For a military submarine, that's a holy grail worth chasing even at terrible efficiency. The U.S. Navy funded MHD research throughout the Cold War for exactly this reason, and the declassified results are still patchy.
There's a stranger frontier too. MHD doesn't require water β it works on any conducting fluid, including ionized air. NASA and several university labs have built "ionic wind" aircraft that produce thrust by accelerating ionized air through electrode arrays, with MIT flying a small fixed-wing plane this way in 2018. The same equations that should have powered Red October are now lifting drones with no moving parts.
Daily YT Documentary
2026-06-02
Channel: Insight Hive (31 subscribers)
Most people assume solar power is a late-20th-century invention, born of the oil crises and silicon revolution. This video tells a much stranger story β that a Victorian-era English engineer named William Adams was building practical solar concentrators in colonial Bombay back in the 1870s, nearly 150 years before today's renewables debate.
Adams experimented with arrays of flat silvered mirrors arranged around a central boiler β an early version of what we now call concentrated solar power. He documented his work in an 1878 book that influenced later solar pioneers, and his Bombay experiments produced enough steam to run small engines using nothing but sunlight and geometry.
What makes this worth watching is the counter-history: it reframes solar energy not as a futuristic novelty but as a road not taken. Coal was cheap, empire logistics favored it, and Adams's ideas were shelved. The video walks through the engineering of his mirror arrays, the colonial context that both enabled and ultimately buried his work, and why his designs still echo in modern solar-thermal plants in the Mojave and Morocco.
Insight Hive is a tiny channel, but the piece is a focused biographical-engineering explainer rather than stock-footage filler β exactly the kind of forgotten-pioneer story that rewards a few minutes of attention.
Daily YT Electronics
2026-06-02
Channel: V. Hunter Adams (7830 subscribers)
This is a final project demo from Cornell's ECE 5760 (Advanced Microcontroller Design), a course that consistently produces some of the most ambitious FPGA work on YouTube. Three students β Utku Melemetci, Sam Keamy, and Peter He β tackle one of the harder problems in modern video: hardware-accelerating AV1 decoding.
AV1 is the royalty-free successor to H.264/HEVC, and it is notoriously computationally expensive to decode in software. That's exactly why hardware acceleration matters: phones, TVs, and streaming devices all rely on dedicated silicon to play AV1 efficiently. Building a partial decoder on an FPGA is a serious undertaking β you have to grapple with entropy coding, transform blocks, motion compensation, and loop filtering, then figure out which pieces actually benefit from parallel hardware versus staying on the soft-core CPU.
What makes the ECE 5760 project demos worth watching is that they're not polished marketing pieces. The students walk through their actual architecture β what they put in hardware, what stayed in software, where they hit memory bandwidth limits, and what trade-offs they made to fit on the DE1-SoC. For anyone interested in video codecs, hardware/software co-design, or just seeing what's achievable in a one-semester FPGA project, this is a great look.
Pair it with the linked project webpage for the full writeup including Verilog source.
Daily YT Engineering
2026-06-02
Channel: Dr. A. Vetrivel (54 subscribers)
Thin-walled pressure vessels are everywhere β propane tanks, boiler drums, pipeline segments, aerosol cans β and the math governing their failure is one of the cleaner derivations in strength of materials. This case-study lecture from an engineering instructor walks through how hoop stress (circumferential) and longitudinal stress develop in a cylinder under internal pressure, and why the hoop stress is always exactly twice the longitudinal stress for the same geometry.
That 2:1 ratio is the reason cylindrical vessels almost always rupture along their length rather than popping their ends off β a detail you can verify on any failed propane tank or burst soda can. The video frames this through real failure cases, which is the right pedagogical move: the equations only stick once you've seen what they predict in the wreckage.
The production is bare-bones academic β chalkboard-style derivation, no animation budget β but the channel is clearly an instructor sharing classroom material rather than chasing views, and the content is the kind of actually useful mechanics content that gets buried under disaster-porn thumbnails on YouTube. The other candidates this week were mostly clickbait shorts or hashtag spam; this one quietly teaches a concept you'll use.
Daily YT Maker
2026-06-02
Channel: JSB Home Solutions (1370 subscribers)
Most homeowners only see the final installed countertop β the polished slab sitting on their cabinets. What happens between the raw quarried stone and that finished surface is a surprisingly involved fabrication process, and this behind-the-scenes shop tour walks through it.
Stone fabrication sits at an interesting intersection of traditional craft and modern CNC machining. A typical shop pipeline includes digital templating (often with laser measurement tools that capture exact cabinet dimensions down to fractions of a millimeter), slab layout and seam planning, CNC waterjet or bridge saw cutting, edge profiling, sink and cooktop cutouts, and multi-stage polishing. Each step has real engineering considerations β grain matching across seams, structural reinforcement around cutouts where the stone is weakest, and managing the enormous weight of slabs that can exceed 800 pounds.
For anyone planning a kitchen renovation, building cabinetry, or just curious about how a working fab shop operates, seeing the actual sequence demystifies a process that's usually hidden behind a vendor's quote. It's also useful for makers who work with sheet goods generally β the templating and layout strategies translate to plywood, steel, and other rigid materials.
JSB Home Solutions is a small channel, so this is a relatively rare unvarnished look inside a real working shop rather than a sponsored manufacturer reel.
Daily YT Welding
2026-06-02
Channel: RBPerformance (37 subscribers)
A strut tower bar is one of those parts that looks simple but rewards careful fabrication. The bar ties the two front strut towers together, reducing chassis flex during hard cornering β which means the geometry has to be measured accurately from the car itself, not guessed from a drawing. Get the angles wrong and you'll either preload the suspension or end up with a bar that doesn't bolt up.
This build on a classic Volkswagen Golf MK1 walks through the process from raw stock to a finished, fitted part. Expect to see the fabricator transferring measurements off the chassis, cutting and notching tube to match the angle between the towers, fitting end plates or tabs that match the strut top bolt pattern, and then tacking and welding the assembly. Classic Golfs have notoriously tight engine bays, so clearance around the brake master cylinder and intake plumbing is a real constraint β watch how they route the bar.
RBPerformance is a tiny channel (37 subscribers), so the production is modest, but for anyone interested in one-off chassis fabrication for older European cars, this is exactly the kind of hands-on content that's hard to find. It's also a good study in how a humble-looking bracket actually requires careful planning.
