Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've had a research-heavy career in computer science. A central problem with the academic research is it commonly ignores real-world constraints. Or less commonly, it imagines constraints that don't exist. As someone who went deep in a few domains, academic literature is usually disappointing[0] once you've read and understand the entirety of it.

The valuable thing about hands-on experience is that the real world doesn't let you ignore constraints. You get that feedback quickly if you are paying attention. This in turn allows you to build a more accurate mental model of the true nature of the problem you are solving and where the hidden limitations and leverage points are. A lot of academic literature tacitly works from a set of assumptions that don't map to any real-world environment.

Once you have that hands-on mental model, the flaws and limitations of much of what is in the academic literature becomes obvious from first principles. Most of the insights might be academically interesting in a theoretical sense but they often aren't reducible to useful practice in real systems. Non-academic careers require implementations that actually work well.

[0] The computer science literature from the 1970s and earlier is much better in this regard than what came later. Many early papers were written by people that clearly had concrete experience in the trenches with the problems they were writing about. Those papers are both more readable and more applicable. This awareness of constraints is lacking from a lot of modern computer science papers.



I think it just goes back to what your goals are. I don't know for sure but I imagine the research you're describing as disappointing was never meant to be directly applicable to a real world problem. It's meant to explore and push the boundaries of our understanding in an idealized and theoretical sense. Over time the research that turned out to be important gets codified into textbooks and undergraduate courses and software packages, but if you're at the bleeding edge yeah it's gonna be tough to make sense of the landscape and apply it to your needs, but that's why people that can do it get the big bucks.

I mentioned nuclear physics because it's a wonderful union of theory and practice. The experimenters need theories to test, and the theorists need their ideas tested.

Contemporary AI is massively driven by research. There are a handful of influential papers from the past few years that have gone right into practice. Industry players have famously invested in their own academic divisions.


To give a concrete example, spatial indexing algorithms badly break cache replacement algorithms for intrinsic theory reasons. This has been known since at least the 1980s.

We have a literature full of spatial indexing algorithms that can't work in any real system because they assume cache replacement algorithms. This problem isn't even mentioned in modern academic papers. That is extremely low-value research. That's like doing physics research under the assumption that the fundamental laws of physics don't apply. It might be an intellectually interesting exercise but it isn't useful.

It isn't all like this. The spatial indexing literature is actively bad to an unusual extent. If you look at e.g. academic graph analytic algorithm research, where I also worked, it is mostly just decades behind the non-academic state-of-the-art. The literature won't mislead you but it also won't tell you where the frontier is.


I'm not really trying to white knight for academia. I know there is a lot of low-quality stuff (part of that is just reality, most of the work done in academia or industry won't stand the test of time). But I do kind of have to assume that if this field of work is continuing there is a reason and they haven't just been spinning wheels for 40 years because they missed something simple and obvious.

This article[0] says "Data changes are usually much less frequent than queries, so incurring an initial cost of processing data into an index is a fair price to pay for instant searches afterwards."

Is that what you're referring to with cache replacement? There's no way to quickly update the index?

Wikipedia says "The R-tree was proposed by Antonin Guttman in 1984 and has found significant use in both theoretical and applied contexts."

I'm gathering that there are applications which are ok with that limitation? Generally there is communication and awareness between industry and academia, if everyone really needs cache replacement for it to be useful, why have they not attempted to account for it? What is the cause of the total disconnect you describe?

[0] https://blog.mapbox.com/a-dive-into-spatial-search-algorithm...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: