More

kovek · 2026-05-29T18:04:23 1780077863

What if the organization who tries to verify sends a request on an app on the user’s iPhone (or whatever device can do the same), and the user scans their face with FaceID to produce a file send to the organization, which will then send that file to Apple to ask if the file represents the right person? I trust Apple so that works for me.

kovek · 2026-05-26T17:21:16 1779816076

I’d say it’s possible to have creativity when you’re sitting as well. I like to think that’s it’s all about staying active. Reading, diarying, calling a friendind. All of that.

kovek · 2026-05-17T02:15:57 1778984157

LLMs are like a search engine that autocompletes. It's a tool.

kovek · 2026-05-15T21:41:33 1778881293

What negative consequences does being unelected have?

kovek · 2026-05-08T04:21:15 1778214075

Maybe you can't 100% know what every layer "thinks", if you go through all the layers, you might see a cohesive "thinking" story. So, if there is any information you lose at layer N, you might learn some of it in layer N+1. The masking in the layers is not deterministic so the model can't really consistently lie throughout the layers. It doesn't chose what information we get to inspect. There might be a game of whack-a-mole, but you might get a general sentiment. I think the more layers there are, the more the model itself can hide very nuanced lies (But by that time we'd have a better mind-reading model).

However, I haven't read about it yet. I'm really excited to look into it!

kovek · 2026-05-07T20:42:49 1778186569

I’ve read recently about natural systems in the book Antifragile. It’s interesting how those systems can become better.

kovek · 2026-05-06T04:59:48 1778043588

> The core idea is content-dependent selection. For each query, the model selects which parts of the sequence are worth attending to, and computes attention exactly over those positions.

I don't know if this will help for things like understanding code, where the all relevant parts can be the file of 1000 lines that we are analyzing, and where every token is relevant in understanding recursion, loops, function calls, etc.

This sounds like it would be great to do SSA before passing things along to a code model like claude code.

Let me know if I misunderstood

alexsubq · 2026-05-08T03:57:06 1778212626

Yeah, tokens are excluded, only pairwise relationships between tokens. Coding is something we are looking at carefully!

kovek · 2026-04-29T06:44:16 1777445056

Does thinking about how to offload matter?

locknitpicker · 2026-04-29T07:04:30 1777446270

A discussion on how to avoid paying the price of running an expensive model is not about the expensive model. You can triage things running a cheap model with Ollama. Heck, throw in gpt4.1 which is free.

kovek · 2026-04-29T17:42:00 1777484520

I don’t think triaging is necessarily an easy task

kovek · 2026-04-23T21:05:30 1776978330

What if the cache was backed up to cold storage? Instead of having to recompute everything.

vanviegen · 2026-04-24T08:49:58 1777020598

They probably already do that. But these caches can get pretty big (10s of GBs per session), so that adds up fast, even for cold storage.

kovek · 2026-04-24T18:21:34 1777054894

10s of GBs? ( 1,000,000 context * 1,000 vector size ) ^ 2 = 1,000,000,000,000,000,000… oh wow.. I must be miscalculating

What about only storing the conversation and then recomputing the embeddings in the cache? Does that cost a lot? Doing a lot of matrix multiplication does not cost dollars of compute, especially on specialized hardware, right?

Majromax · 2026-04-24T19:08:22 1777057702

Context length 1e6, vector length 1e3, and 1e2 model layers for 100e9 context size. Costs will go up even more with a richer latent space and more model layers, and the western frontier outfits are reasonably likely to be maximizing both.

kovek · 2026-03-10T16:51:51 1773161511

Is this similar to send 48656c6c6f2c20686f772061726520796f753f in the prompt? As done here: https://youtu.be/GiaNp0u_swU?si=m7-LZ7EYxJCw0k1-

dnhkng · 2026-03-10T17:46:54 1773164814

Yes, I was using Base64 to 'jailbreak' LLMs back in the day (so similar), and thats what led me to the hypothesis, and months of GPU use to find optimal later dultication!