I take a lot of notes but don’t make use of them. I feel like I’m leaving insight on the table. It should be possible to remix thoughts and learnings to create something new, like taking frozen leftovers out of the fridge once in a while to make a stew. After all, creativity basically boils down to connecting distant ideas. This is not covering new ground: Zettelkasten, Obsidian, and Smart Notes have already tackled this “second brain” approach.
However, I do think the capacity for computers to not forget, have infinite attention, and pull information instantly is underleveraged. It’s possible that our ability to forget and ignore vast amounts of information is the exact strength that’s required for good creative thinking, but until that’s proven, crunching information automatically with a computer is a fun direction to explore. Some kind of middle ground seems optimal, since in some ways, we’re already androids. Our personal devices, connected to internet, have become the newest and outer-most members of our cortex.
Currently: a Python harness that can be used as an obsidian plugin. https://github.com/jack-song/smart-link
Set up a quantitative testing harness to test link prediction in linked notes.
I’m curently using Obsidian to try out the “good old manual” approach to this problem.
Instead of interacting with live recommendations, a different workflow might make sense - a dedicated time to “review” linking suggestions. A basic script would suggest notes that could be linked, based on their content.
Trying to learn some classic NLP to tackle this. Unfortunately, like with most data work, cleaning is half the job. My old notes suck. Trying to figure out a way around the garbage-in garbage-out effect. https://github.com/4OH4/doc-similarity (TD-IDF and GloVe) feels like it just won’t cut it in terms of suggesting meaningful links.
I built a simple local prototypes that records notes but acts as an auto-complete. Like naming a new file, it brings up previous notes with similar words. Problem: I’ve noticed that it’s difficult to write my note and evaluate the recommendations at the same time. It feels like a better return-on-effort is focusing on writing good notes (foreshadowing!). Another problem: the recommendations are bad and not helpful.
To make things more accessible, 2 years ago I started writing everything into a Google Sheets document. Easy to use, index, and search. Not bad, but became slow to edit and search. Main problem still unsolved - I rarely went back to review, connect, and mentally process ideas. Perhaps the solution is something interactive, that will draw some basic connections for you.
Working notes. Click here go back up.
2022-04-04: Open sourced working version! https://github.com/jack-song/smart-link. Looks like setting up a Python server locally will probably be too much barrier-to-entry for most users, so deciding if I should pursue more technical work producing better predictions, or start doing the JS to make this an actual plugin.
2022-03-30: Use Obsidian Lab interface more creatively. Thinking of ways to open source the code so far.
2022-03-28: Hooked up to Obsidian Lab, filtering out existing links and common tag notes to produce more interesting pairs to review.
2022-02-25: Add and test Universal Sentence Encoder as another source of similarity scores. Suprisingly promising compared to GloVe and TFIDF.
2022-02-24: Mark pairings that are already linked or have common tags.
2022-02-23: Faster notebook scaffolding that caches the language embedding model and use tfidf to generate global similarity scores to compare with GloVe. More notes cleanup.
2022-02-21: Subbing in Obsidian Tools for GloVe processing rather than manual file reading.
2022-02-16: Found great existing discussions leading to Obsidian Tools and Obsidian Lab. Final test before moving to a learning algorithm: a simple sum rank score for 3 features using cosine similarity (tfidf, GloVe or Word2Vec, and Universal Sentence Encoder). These features have already been explored extensively for similarity. If that’s still a bit weak, I’ll dive into the scaffolding for building a learning setup.
2022-02-15: One last try with GloVe after removing daily notes. Backup plan: start with a ready made link prediction solution. May next try to pull in more fitting datasets (WikiCS or WikiVitals+). Then scrape actual Obsidian Published notes. And finally try it own my own local notes (custom parser?).
2022-02-14: Took me over a week to realize that “link prediction” is the problem that I’m trying to solve - and that it’s well known in the space. Picking a good example / starting point to replicate.
2022-02-11: Manually testing some similiarity and search mechanisms to try to intuit if the rankings would be “good enough” - or I would need to train something from scratch focused on linking.
2022-02-10: Trying some related Obsidian plugins, starting to build scraper for published blogs and linked notes as training. Trying out programmable Google search engine to see what good ranking might look like. Reading about semantic search.
2022-02-09: Comprehensive notes and thinking about the desired output of this project: ideas could be hypothesis, observations, or theories. Looking over thoughts from https://jzhao.xyz/, https://andymatuschak.org, and https://nicolevanderhoeven.com/. Next steps are to engage and ask questions directly to the community.
2022-02-07: Using Obsidian to manually try random notes (including the extension upgrade), using Mem for work, more research into “serendipity”, and seeing published notes to try to understand desired output better.
2022-02-06: Moving “log” style notes into sheets, where they belong. Especially for things like coffee notes. Understanding how to use tags better.
2022-02-03: Commiting to not digitizing my daily scratch notes. Chances of that raw information being useful is outweighed by the cost of management and noise. Will use a dedicated process of translating and re-focusing daily notes into atomic and meaningful zettels.
2022-02-02: Cleaning up old notes! Especially trying to make progress breaking apart and rewriting my past spreadsheets (currently stored as a single mega note). Thinking about how daily notes / a journal really fit into the zettelkasten and smart notes philosophy.
2022-02-01: Learning how to take more effective notes - titles, structure, and linking.
2022-01-31: Reading about the OG https://zettelkasten.de/posts/overview/, skimming https://forum.zettelkasten.de/, and tweaking some of my Obsidian notes.
2022-01-30: Results still not great after some basic tweaks. Trying to break down and understand components of the problem better. Will do individual deep dives on: the data, data cleaning, the embeddings, similarity scores, and the power / purpose of linking. Will probably want to work backwards from the desired output. Use better filters on text, stopwords, document size. Sneaking suspicion that similarity skills may not be a good proxy for “linkable ideas.”
2022-01-27: Information density feels a bit too low to give meaningful results. False negatives are already going to be very high for this kind of task, so we want to drop it as much as possible to minimize “wasted time” for a user. Will try finding similarities between entire notes as documents. “Daily notes” are challenging because they might relate to different subjects. Also need to filter out templated text.
Maybe rather than ranking “dumb” embeddings, this can be turned into a real learning model, using public Obsidian Publish sites, hyperlinked blogs, and exiting links in their own vault as training data.
2022-01-26: Fixed filtering and indexing bugs in test with GloVe embeddings. Going over results.