Hacker Newsnew | past | comments | ask | show | jobs | submit | kovek's commentslogin

What if the organization who tries to verify sends a request on an app on the user’s iPhone (or whatever device can do the same), and the user scans their face with FaceID to produce a file send to the organization, which will then send that file to Apple to ask if the file represents the right person? I trust Apple so that works for me.

I’d say it’s possible to have creativity when you’re sitting as well. I like to think that’s it’s all about staying active. Reading, diarying, calling a friendind. All of that.

LLMs are like a search engine that autocompletes. It's a tool.


What negative consequences does being unelected have?


Maybe you can't 100% know what every layer "thinks", if you go through all the layers, you might see a cohesive "thinking" story. So, if there is any information you lose at layer N, you might learn some of it in layer N+1. The masking in the layers is not deterministic so the model can't really consistently lie throughout the layers. It doesn't chose what information we get to inspect. There might be a game of whack-a-mole, but you might get a general sentiment. I think the more layers there are, the more the model itself can hide very nuanced lies (But by that time we'd have a better mind-reading model).

However, I haven't read about it yet. I'm really excited to look into it!


I’ve read recently about natural systems in the book Antifragile. It’s interesting how those systems can become better.


> The core idea is content-dependent selection. For each query, the model selects which parts of the sequence are worth attending to, and computes attention exactly over those positions.

I don't know if this will help for things like understanding code, where the all relevant parts can be the file of 1000 lines that we are analyzing, and where every token is relevant in understanding recursion, loops, function calls, etc.

This sounds like it would be great to do SSA before passing things along to a code model like claude code.

Let me know if I misunderstood


Yeah, tokens are excluded, only pairwise relationships between tokens. Coding is something we are looking at carefully!


Does thinking about how to offload matter?


A discussion on how to avoid paying the price of running an expensive model is not about the expensive model. You can triage things running a cheap model with Ollama. Heck, throw in gpt4.1 which is free.


I don’t think triaging is necessarily an easy task


What if the cache was backed up to cold storage? Instead of having to recompute everything.


They probably already do that. But these caches can get pretty big (10s of GBs per session), so that adds up fast, even for cold storage.


10s of GBs? ( 1,000,000 context * 1,000 vector size ) ^ 2 = 1,000,000,000,000,000,000… oh wow.. I must be miscalculating

What about only storing the conversation and then recomputing the embeddings in the cache? Does that cost a lot? Doing a lot of matrix multiplication does not cost dollars of compute, especially on specialized hardware, right?


Context length 1e6, vector length 1e3, and 1e2 model layers for 100e9 context size. Costs will go up even more with a richer latent space and more model layers, and the western frontier outfits are reasonably likely to be maximizing both.


Is this similar to send 48656c6c6f2c20686f772061726520796f753f in the prompt? As done here: https://youtu.be/GiaNp0u_swU?si=m7-LZ7EYxJCw0k1-


Yes, I was using Base64 to 'jailbreak' LLMs back in the day (so similar), and thats what led me to the hypothesis, and months of GPU use to find optimal later dultication!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: