Jekyll2024-02-01T08:03:32+00:00https://josherich.me/feed.xmlJosherich’s Bloghome sweet homeFirst Christianity in Korea2024-01-17T00:00:01+00:002024-01-17T00:00:01+00:00https://josherich.me/short/korean-christian<p>from Unfabling the East Chapter 4</p>
<p><img src="/images/unfabling-east.jpg" alt="korea-christianity" /></p>
<p>from Wikipedia</p>
<blockquote>
<p>In 1603, Yi Su-gwang, a Korean politician, returned from Beijing carrying several theological books written by Matteo Ricci, an Italian Jesuit missionary to China.[4] He began disseminating the information in the books, introducing Christianity to Korea.</p>
</blockquote>
<p>from ChatGPT</p>
<blockquote>
<p>When and how was Christianity first introduced to Korea?
Christianity was first introduced to Korea in the late 18th century. The initial contact with Christianity came through Western missionaries. The first known encounter was in 1784 when Yi Seung-hun, a Korean diplomat, returned from Beijing with religious books, including Catholic texts.</p>
</blockquote>
<p>from Perplexity</p>
<blockquote>
<p>When and how was Christianity first introduced to Korea?
Christianity was first introduced to Korea in the 17th and 18th centuries. The initial variety of Christianity in the peninsula, Nestorianism, spread to Korea in the Middle Ages by way of China via Middle Eastern adherents. In the 1700s, a group of young local scholars who had studied Jesuit missionary Matteo Ricci’s work introduced Catholicism to Korea. The first Western missionaries to enter Korea officially were Horace Grant Underwood and Henry Gerhard Appenzeller, both from the United States. The spread of Christianity in Korea was also influenced by the decline of Confucianism and the impact of Christian ideas on the Korean elite5.</p>
</blockquote>
<p>from you.com</p>
<blockquote>
<p>When and how was Christianity first introduced to Korea?
The first introduction of Christianity to Korea can be traced back to the early 17th century. In 1603, Yi Gwang-jeong, a Korean diplomat, returned from Beijing carrying several theological books written by Matteo Ricci, an Italian Jesuit missionary to China. Yi Gwang-jeong began disseminating the information in these books, thus introducing Christianity to Korea</p>
</blockquote>
<p>from Claude</p>
<blockquote>
<p>Christianity was first introduced to Korea in the late 18th century. Some key events and figures:</p>
</blockquote>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>In 1784, a Korean diplomat named Suh Jun-gan came across Catholic writings in Beijing and took them back to Korea. He began propagating Catholic teachings, making him the first Korean Catholic.
In 1794, the first Catholic priest arrived - a Chinese priest named Zhou Wen-mo who ministered to Catholics in Korea in secret. Catholicism was seen as dangerous and was persecuted by Korean authorities.
In 1836, the first Western Catholic priest, Fr. Maubant from France, arrived in Korea to further facilitate the spread of Catholicism.
By the 1860s, there were about 23,000 Korean Catholics. However, persecution intensified, culminating in the deadly 1866 Byeong-in Persecution, where thousands of Korean Catholics were killed.
In 1884, the first Protestant missionary, Horace Underwood from the U.S., arrived in Korea under the banner of the Presbyterian Church. Other denominations soon followed.
</code></pre></div></div>
<blockquote>
<p>So in summary, Catholicism arrived first in the late 1800s, but it initially spread slowly and clandestinely due to persecution. Protestantism arrived slightly later in the 1880s, but began more public evangelization efforts. Together, they laid the groundwork for Christianity to gradually take root and grow in Korea over the next century.</p>
</blockquote>from Unfabling the East Chapter 4Real World LLM Obfuscation2023-10-01T00:00:01+00:002023-10-01T00:00:01+00:00https://josherich.me/llm/real-world-llm-obfuscate<p>I’ve heard of LM’s deobfucate ability for a while. Like many previous claims of this kind, its real world performance could be far away from extrapolation of what’s shown in the demo (mostly a social media post). I’m curious and love to see it flops.</p>
<h3 id="the-experiment">The Experiment</h3>
<p>Decode minified JavaScript code of a Chrome extension <a href="https://chrome.google.com/webstore/detail/chatgpt-summary-for-chrom/mikcekmbahpbehdpakenaknkkedeonhf">ChatGPT Summary for Chrome</a></p>
<p>CRX Source can be unzipped with https://crxviewer.com/</p>
<h3 id="challenges">Challenges</h3>
<p>People with a good mental model of LM would find these quite obvious. The thing that works consistently in these experiments are localized code snippets with hints. The biggest trouble however, is the model refuses to strictly give the code, instead like to skip details and only give comments or function names with empty bodies. I suspect the bias is from HFRL training.</p>
<ul>
<li>Obviously one shot doesn’t work due to context limit.</li>
<li>A single response (including first response and continuing responses) refuses to give all the details, which means simply giving “please continue” won’t work.</li>
<li>Given the above, there is no high level algorithm to guide the decoding.</li>
<li>Most importantly, the naive prompt-response approach does not leverage signals from program (static) analysis</li>
</ul>
<p>Planning sounds fancy but I suspect it is simpler: performing a topological sort on the call graph to get a list of functions in the order they should be deobfuscated. An obvious heuristic is to start with globals and propagate decoded names repeatedly each time a localized snippet is decoded.</p>
<p>Not surprised people have tried <a href="https://arxiv.org/abs/2306.02546">this</a>.</p>
<h3 id="what-efficient-human-in-the-loop-looks-like">What Efficient Human in the Loop Looks Like</h3>
<p>Among the prompts in a single experiment, high level planning is probably the most important bit from me as a human in the loop. I have to make decisions of which parts have all the information the model needs to provide a complete response.</p>
<h3 id="what-about-reverse-engineering">What about Reverse Engineering?</h3>
<p>Among the four steps of reverse engineering, people find that the last step requires the most human like “thinking” and naturally try to automate it.</p>
<ul>
<li>data carving and the parsing of filesystems and files</li>
<li>disassembly</li>
<li>decompilation</li>
<li>annotation and commentary</li>
</ul>
<p>The <a href="https://medium.com/tenable-techblog/g-3po-a-protocol-droid-for-ghidra-4b46fa72f1ff">G-3PO</a> project includes a script that basically use the following prompt:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Below is some C code that Ghidra decompiled from a binary that I'm trying to
reverse engineer.
\```
{c_code}
\```
Please provide a detailed explanation of what this code does, in {style},
that might be useful to a reverse engineer. Explain your reasoning as much
as possible. {extra}
Finally, suggest suitable names for this function and its parameters.
</code></pre></div></div>
<h3 id="what-does-it-mean-to-open-source-code-in-the-long-run">What does it mean to “open source code” in the long run?</h3>
<p>Note I use the term “open source code”, which is completely different from “open source”.</p>
<p>While I’m optimistic about the knowledge sharing aspect, how this particular ability is adopted is probably dictated more by organizational limit. In recent years, source code breach has become so frequent and yet created little existential threat to those companies. This probably reveals how little code without human resource matters. It’s not even controversial nowadays to say most code in big companies are glued together and need an army to make it sort of work.</p>
<h3 id="appendix">Appendix</h3>
<p>summarize prompt</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Q:
Please provide a detailed explanation of what this obfucated code does, the code is from a Chrome Extension
A:
This obfuscated code appears to be implementing an AI chatbot functionality in a Chrome extension. Here is what I can tell about what it is doing:
- It is using a library called uuid to generate unique IDs for messages.
- It has functions for connecting to the OpenAI API to send messages and receive responses. It supports both using an API key and session tokens for authentication.
- It has code for storing user settings like API key and login type in chrome.storage.
- It opens a connection to the background page when the extension icon is clicked. It will send the contents of the current page to the background to summarize.
- The background page will call OpenAI to summarize the page content. It limits the text to 14000 characters before summarizing.
- The summarized text is sent back to the content script and displayed.
- It also has code to handle OpenAI errors and message feedback.
- There is functionality to open the options page on install and when clicked.
So in summary, this appears to allow summarizing web pages using OpenAI through a chat interface in the Chrome extension. It handles the authentication, API calls, background processing and communication between the different extension components needed to implement this functionality.
</code></pre></div></div>
<p>deobfuscate prompt</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>deobfuscate the following code for a Chrome Extension
{code}
give the obfuscated source of the function {xxx}
please continue
please give a list of original function names and obfuscated names
</code></pre></div></div>
<table>
<thead>
<tr>
<th>original</th>
<th>deobfuscated</th>
</tr>
</thead>
<tbody>
<tr>
<td>780</td>
<td>SessionCache</td>
</tr>
<tr>
<td>918</td>
<td>cleanupMap</td>
</tr>
<tr>
<td>931</td>
<td>createPromise</td>
</tr>
<tr>
<td>l</td>
<td>uuidToStr</td>
</tr>
<tr>
<td>p</td>
<td>createMessagingPipeline</td>
</tr>
<tr>
<td>v</td>
<td>PROMPT_STORAGE_KEY</td>
</tr>
<tr>
<td>g</td>
<td>getSummaryPrompt</td>
</tr>
<tr>
<td>ze</td>
<td>summarizePage</td>
</tr>
<tr>
<td>Je</td>
<td>getAccessToken</td>
</tr>
<tr>
<td>Ke</td>
<td>chatGptRequest</td>
</tr>
</tbody>
</table>
<h3 id="workerjs">worker.js</h3>
<div id="container-worker" style="width: 960px; height: 600px; border: 1px solid grey"></div>
<h3 id="scriptjs">script.js</h3>
<div id="container-script" style="width: 960px; height: 600px; border: 1px solid grey"></div>
<h3 id="settingjs">setting.js</h3>
<div id="container-setting" style="width: 960px; height: 600px; border: 1px solid grey"></div>
<script src="../../js/vs/loader.js"></script>
<script>
require.config({ paths: { vs: '../../js/vs' } });
require(['vs/editor/editor.main'], function () {
var diffEditor1 = monaco.editor.createDiffEditor(document.getElementById('container-worker'));
Promise.all([xhr('../../js/real-world-llm-obfuscate/worker.js'), xhr('../../js/real-world-llm-obfuscate/worker-deobf.js')]).then(function (r) {
var originalTxt = r[0].responseText;
var modifiedTxt = r[1].responseText;
diffEditor1.setModel({
original: monaco.editor.createModel(originalTxt, 'javascript'),
modified: monaco.editor.createModel(modifiedTxt, 'javascript')
});
});
var diffEditor2 = monaco.editor.createDiffEditor(document.getElementById('container-script'));
Promise.all([xhr('../../js/real-world-llm-obfuscate/script.js'), xhr('../../js/real-world-llm-obfuscate/script-deobf.js')]).then(function (r) {
var originalTxt = r[0].responseText;
var modifiedTxt = r[1].responseText;
diffEditor2.setModel({
original: monaco.editor.createModel(originalTxt, 'javascript'),
modified: monaco.editor.createModel(modifiedTxt, 'javascript')
});
});
var diffEditor3 = monaco.editor.createDiffEditor(document.getElementById('container-setting'));
Promise.all([xhr('../../js/real-world-llm-obfuscate/setting.js'), xhr('../../js/real-world-llm-obfuscate/setting-deobf.js')]).then(function (r) {
var originalTxt = r[0].responseText;
var modifiedTxt = r[1].responseText;
diffEditor3.setModel({
original: monaco.editor.createModel(originalTxt, 'javascript'),
modified: monaco.editor.createModel(modifiedTxt, 'javascript')
});
});
});
</script>
<script>
function xhr(url) {
var req = null;
return new Promise(
function (c, e) {
req = new XMLHttpRequest();
req.onreadystatechange = function () {
if (req._canceled) {
return;
}
if (req.readyState === 4) {
if ((req.status >= 200 && req.status < 300) || req.status === 1223) {
c(req);
} else {
e(req);
}
req.onreadystatechange = function () {};
}
};
req.open('GET', url, true);
req.responseType = '';
req.send(null);
},
function () {
req._canceled = true;
req.abort();
}
);
}
</script>I’ve heard of LM’s deobfucate ability for a while. Like many previous claims of this kind, its real world performance could be far away from extrapolation of what’s shown in the demo (mostly a social media post). I’m curious and love to see it flops.Running Prometheus promtool on Web with WASM2023-09-24T00:00:01+00:002023-09-24T00:00:01+00:00https://josherich.me/short/prometheus-promtool-wasm<p><a href="https://github.com/josherich/prometheus-wasm">prometheus-wasm</a></p>
<p>WASM is great but the tooling looks terrifying for no reason. I went into this quick hack worrying about being overwhelmed by the tooling. It turned out to be surprisingly easy.</p>
<p>The main hurdle is of course knowing the build process itself, I always appreciate the help of –verbose to report what is going on underneath, which is important when a separate binary <a href="https://github.com/prometheus/promu"><code class="language-plaintext highlighter-rouge">promu</code></a> is used to build the go project. There is even a public <a href="https://docs.google.com/document/d/1Ql-f_aThl-2eB5v3QdKV_zgBdetLLbdxxChpy-TnWSE/edit#heading=h.24x0bg1hyuak">RFC</a> doc for it.</p>
<p>The next part is removing unsupported features and associated pkgs: fs.watcher in fsnotify, mmap, syscall.SIGUSR1 in go-metrics, and sockets in go-connections.</p>
<p>The final part is wasm file and glue code on both sides. None of these looks right:</p>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">go</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">Go</span><span class="p">();</span>
<span class="nx">WebAssembly</span><span class="p">.</span><span class="nx">instantiateStreaming</span><span class="p">(</span><span class="nx">fetch</span><span class="p">(</span><span class="dl">"</span><span class="s2">promtool.wasm</span><span class="dl">"</span><span class="p">),</span> <span class="nx">go</span><span class="p">.</span><span class="nx">importObject</span><span class="p">).</span><span class="nx">then</span><span class="p">((</span><span class="nx">result</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">go</span><span class="p">.</span><span class="nx">run</span><span class="p">(</span><span class="nx">result</span><span class="p">.</span><span class="nx">instance</span><span class="p">);</span>
<span class="p">});</span>
<span class="c1">// it should really just be</span>
<span class="kd">const</span> <span class="nx">go</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">Go</span><span class="p">(</span><span class="dl">"</span><span class="s2">promtool.wasm</span><span class="dl">"</span><span class="p">);</span>
<span class="nx">go</span><span class="p">.</span><span class="nx">run</span><span class="p">();</span>
</code></pre></div></div>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
<span class="n">js</span><span class="o">.</span><span class="n">Global</span><span class="p">()</span><span class="o">.</span><span class="n">Set</span><span class="p">(</span><span class="s">"checkRules"</span><span class="p">,</span> <span class="n">CheckRulesWebWrapper</span><span class="p">())</span>
<span class="o"><-</span><span class="nb">make</span><span class="p">(</span><span class="k">chan</span> <span class="kt">bool</span><span class="p">)</span>
<span class="p">}</span>
<span class="c">// makes no sense for app code to do this</span>
</code></pre></div></div>
<section>
<textarea name="" id="rules" cols="30" rows="10"></textarea>
<button id="check_rules" onclick="checkRules()">promtool check rules</button>
</section>
<section>
<textarea name="" id="config" cols="30" rows="10"></textarea>
<button id="check_config" onclick="checkConfig()">promtool check config</button>
</section>
<script src="../../js/wasm_exec.js"></script>
<script>
const go = new Go();
WebAssembly.instantiateStreaming(fetch("../../js/promtool.wasm"), go.importObject).then((result) => {
go.run(result.instance);
const ruleExample = `groups:
- name: example
rules:
- record: code:prometheus_http_requests_total:sum
expr: sum by (code) (prometheus_http_requests_total)`;
const res = checkRules(ruleExample);
console.log(`Result of "promtool check rules"`, res);
document.getElementById("rules").value = ruleExample;
document.getElementById("check_rules").onclick = () => {
const rules = document.getElementById("rules").value;
const res = checkRules(rules);
window.alert(res);
};
const configExample = `# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label job=<job_name> to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]`;
document.getElementById("config").value = configExample;
document.getElementById("check_config").onclick = () => {
const config = document.getElementById("config").value;
const res = checkConfig(config, false, false, 'all');
window.alert(res);
};
});
</script>prometheus-wasmSoftware Doesn’t Work For the Happiest Path2023-09-18T00:00:01+00:002023-09-18T00:00:01+00:00https://josherich.me/short/kernel-panic-4k<p>Anyone with the slightest amount of software experience will know that it’s always better to NOT upgrade unless you absolutely have to. I payed the price for not strictly following this advice and naively believing it’s okay if I’m supposed to be on the happiest path. Here’s what happened:</p>
<p>10 minutes after agreeing to upgrade Ubuntu 22.04 to 23.04, the monitor went completely black. I had to manually reboot the system and it kept doing nothing. Not much can be inferred from a complete black screen, unresponsive keyboard and mouse. My hunch was that the upgrade was interrupted by the manual reboot, the black screen could be related to the 4k monitor driving issue, both of which turned out to be right. After connecting a 1080 monitor, the following text showed after booting the new 6 kernel failed:</p>
<blockquote>
<p>Kernel panic - not syncing: VFS: Unable to mount root fs on unknown block(0,0)</p>
</blockquote>
<p>The rest is obvious, after successfully booting from the old 5 kernel, running <code class="language-plaintext highlighter-rouge">dpkg --configure -a</code> would redo and finish the upgrade.</p>
<p>The same lesson just repeated over and over: The software is nowadays written in a way that no robustness is guarantee even for the happiest path. I got the most popular motherboard, the most popular graphics card, the most popular linux distro, the most popular monitor category. When combined, the reliability is embarrassing.</p>Anyone with the slightest amount of software experience will know that it’s always better to NOT upgrade unless you absolutely have to. I payed the price for not strictly following this advice and naively believing it’s okay if I’m supposed to be on the happiest path. Here’s what happened:Puppeteer PDF Generation Performance and Size Over Time2023-08-02T00:00:01+00:002023-08-02T00:00:01+00:00https://josherich.me/short/puppeteer-perf-over-time<p><img src="/images/puppeteer-perf.png" alt="puppeteer perf history" /></p>
<p>The test environment is WSL ver 1.2.5 on a windows laptop(i7-11850H, 32GB RAM)
I’m going to post the result on M1 laptop soon, and it looks not good</p>Zelda ToTK Kudanisar Shrine Orb Recall2023-07-13T00:00:01+00:002023-07-13T00:00:01+00:00https://josherich.me/short/zelda-recall<p>I find it very confusing that the orb in Kudanisar shrine ends up being recalled, but there is no cue for players to tell the orb’s recall duration is not 20s, instead it is infinite.</p>
<p>Another interesting design choice about recall is the near-infinite range. There’s also almost no limitation with the recall target.</p>
<p>The 20s seems to be the main factor that prevents it from breaking the game mechanics.</p>
<p><img src="/images/zelda-kudanisar-shrine-orb.jpg" alt="zelda" /></p>I find it very confusing that the orb in Kudanisar shrine ends up being recalled, but there is no cue for players to tell the orb’s recall duration is not 20s, instead it is infinite.My Two Cents on the new Bard2023-07-13T00:00:01+00:002023-07-13T00:00:01+00:00https://josherich.me/short/two-cents-bard<ol>
<li>diffing text</li>
</ol>
<p>None of the four is in the correct answer!
<img src="/images/bard-2023-07-13-3.png" alt="diffing text" /></p>
<ol>
<li>(very) basic text processing</li>
</ol>
<p>All previous versions refuse to do the task, maybe due to some legal terms. All non openai models don’t get the “new list” part.
<img src="/images/bard-2023-07-13-1.jpg" alt="text processing" /></p>
<p>Not from Bard, but an interesting <a href="https://www.youtube.com/watch?v=-cAB5FG4bXI&t=993s">example</a> about lead acetate solution being cloudy
<img src="/images/bard-2023-07-13-2.png" alt="lead acetate cloudy" /></p>diffing text2 Millions WeakMap Keys2022-06-18T00:00:01+00:002022-06-18T00:00:01+00:00https://josherich.me/pl/million-times<p>I was told multiple times that infrastructure should be “upgraded frequently so that upgrading gets easier and easier”, and that’s what my colleagues did. They decided to bump the Meteor version after staying on a very old version for a long time. It did not go well.</p>
<p>The building gets super slow, people seem not to notice slower dev building because it’s always been slow without cache, which can take 2 hours, and almost always exceeds node memory limit. The not noticing part is funny, but what’s funnier is the root cause. The path of me finding it is longer than it should be. I think it’s more interesting that nothing stops these tools and methods to be automated, yet both Babel and Meteor don’t care enough to do that.</p>
<ul>
<li>The tldr version is WeakMap with > 2 million keys is slow</li>
<li>The tldr lesson is Babel and Meteor do not run performance regression test</li>
<li>Another tldr lesson is, illustrated in the following revised <a href="https://xkcd.com/2347/">xkcd</a>, the dependency flaw is recursive</li>
</ul>
<p><img src="/images/millions-of-times/recursive-deps.png" alt="resursive-deps" /></p>
<h4 id="slow-meteor-building">Slow Meteor building</h4>
<p>Any developer that has upgraded libraries or frameworks probably won’t be surprised when they hit a major performance downgrade. That’s why “not upgrading unless one has to” is believed to be a best practice before the above claim of upgrading frequently, for good reasons. So that’s what you get, <a href="https://github.com/meteor/meteor/issues/11905">build times increased significantly since upgrade to 2.5.6</a>, ‘significantly’ is really orders of magnitude.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Babel.compile..........................................603,897 ms (25639)
linker File#computeAssignedVariables...................488,136 ms (27564)
other ImportScanner#findImportedModuleIdentifiers.......43,497 ms (26261)
</code></pre></div></div>
<p>It’s not hard to infer what does that mean for the dev pipelines. The building is such an universal bottleneck that these pipelines get meaningless because any node in the sequential jobs can timeout and break the flow.</p>
<h4 id="babel-parser">Babel parser</h4>
<p>Surprisingly, it’s not hard to figure out the slow steps, thanks to Meteor’s profiling. I was first mislead by the above thread that mentioned the cache for parser result might need a resize because statiscally the order of package files get processed (linked in Meteor’s context) might change. As always, the truth simpler than what it appears to be. The parser, of two particular version, has a major performance hit. What’s more interesting is the hit only happens when WeakMap has more than about two millions keys, which is very common for that use case in the parser.</p>
<p>what happens next is tedius, I ended up just replacing parser code <code class="language-plaintext highlighter-rouge">tools/node_modules/@babel/parser/lib/index.js</code> with that of an older version and fast build is back.</p>
<h4 id="dive-deeper">Dive Deeper</h4>
<p>The performance <a href="https://github.com/babel/parser_performance">test</a> written by the Babel team turns out to be quite revealing, which is also ironic that it’s one tiny step away from being automated.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌──────────────────────────┬────────────────────────────────┬──────────────────────────────┐
│ fixture │ babel_parser_7.16.12 │ babel_parser_7.17.12 │
├──────────────────────────┼────────────────────────────────┼──────────────────────────────┤
│ es5/angular.js │ 10.29 ops/sec ±22.89% (97ms) │ 70.5 ops/sec ±10.83% (14ms) │
├──────────────────────────┼────────────────────────────────┼──────────────────────────────┤
│ es5/ember.debug.js │ 1.28 ops/sec ±134.41% (779ms) │ 25.55 ops/sec ±6.51% (39ms) │
├──────────────────────────┼────────────────────────────────┼──────────────────────────────┤
│ es5/babylon-dist.js │ 2.35 ops/sec ±129.79% (426ms) │ 122 ops/sec ±3.96% (8.193ms) │
├──────────────────────────┼────────────────────────────────┼──────────────────────────────┤
│ es5/jquery.js │ 2.98 ops/sec ±102.73% (335ms) │ 146 ops/sec ±8.2% (6.864ms) │
├──────────────────────────┼────────────────────────────────┼──────────────────────────────┤
│ es5/backbone.js │ 16.61 ops/sec ±96.49% (60ms) │ 703 ops/sec ±1.46% (1.422ms) │
├──────────────────────────┼────────────────────────────────┼──────────────────────────────┤
│ es5/react-with-addons.js │ 1.75 ops/sec ±138.36% (572ms) │ 73.69 ops/sec ±2.3% (14ms) │
├──────────────────────────┼────────────────────────────────┼──────────────────────────────┤
│ es6/angular-compiler.js │ 0.52 ops/sec ±158.51% (1921ms) │ 35.22 ops/sec ±12.04% (28ms) │
├──────────────────────────┼────────────────────────────────┼──────────────────────────────┤
│ es6/material-ui-core.js │ 1.05 ops/sec ±173.55% (956ms) │ 43.82 ops/sec ±7.19% (23ms) │
└──────────────────────────┴────────────────────────────────┴──────────────────────────────┘
</code></pre></div></div>
<p>Another general way to dig out root cause of performance hit is flame chart</p>
<p><img src="/images/millions-of-times/7.16.8.png" alt="7.16.8" />
<img src="/images/millions-of-times/7.16.10.png" alt="7.16.10" /></p>
<p>It’s immediately obvious <code class="language-plaintext highlighter-rouge">curPosition</code> is the single source of slowness. Its only job is to create a Position object that tells where the error happens. In a suggestion <a href="https://github.com/babel/babel/pull/14130#discussion_r785454366">comment</a>, a WeakMap is used to cache and expose ast node offset instead of a plain object property. To be fair it’s too much to ask to foresee the consequence since the performance hit of WeakMap only happens when the number of keys exceeds 2 millions.</p>
<p>Eventually the <a href="https://github.com/babel/babel/pull/14174/commits/40475e0dea53ce7ef064df30cea7b559a3349f30#diff-6a9848ed0c6fa07e549e2c093dc65a0390484d710088036c465b925fa0e7f4a4">fix</a> was made and was concluded with this claim:</p>
<blockquote>
<p>There may exist some performance argument for switching over to .index as well, but from what I’ve seen so far it doesn’t seem to be too substantial.</p>
</blockquote>
<p>The thread also tells us Prettier and one of the Netlify projects got this performance hit.</p>
<h4 id="2-millions-keys">2 millions Keys</h4>
<p>Of course the first thing I did was asking the internet, specifically the V8 dev discussions. The most relevant <a href="https://bugs.chromium.org/p/v8/issues/detail?id=4086">thread</a> was again concluded with this statement:</p>
<blockquote>
<p>The initially observed behavior seems to be fixed</p>
</blockquote>
<p>No, it’s not.</p>
<p>Running the exact same test suggested by the OP gives you this graph</p>
<p><img src="/images/millions-of-times/weakmap-perf.png" alt="weakmap-perf" /></p>
<p>The <code class="language-plaintext highlighter-rouge">set</code> op gets 10 times slower at almost exactly 2 millions keys and gets even slower linearly after that, with spikes at 2.7 millions and 5.5 millions. Most surpisingly, the test stops at about 12 million and never finishs, no matter how much more memory is assigned.</p>
<h4 id="next">Next</h4>
<p>I would love to spend more time on digging into V8 and figure out why the spikes and halting.</p>
<p>On the other hand, the experience summarizes very well what have gone wrong about software dev. Assuming the spikes come from a compromise made in V8, it flows through the cracks of numerous tests and processes, all the way to ruin the physical world.</p>I was told multiple times that infrastructure should be “upgraded frequently so that upgrading gets easier and easier”, and that’s what my colleagues did. They decided to bump the Meteor version after staying on a very old version for a long time. It did not go well.On Year Review and Blogging2022-01-02T00:00:01+00:002022-01-02T00:00:01+00:00https://josherich.me/reading/on-year-review<p>I’ve been enjoying reading year review from my blog RSS feed and douban feed. Since 2021 is quite a boring year for me myself, I’d rather write a review about reviews.</p>
<p>The most common thing people talk about in year reviews is plan and execution. It is common that SaaS companies talk about business execution in public posts, but it’s even more fun to read about how personal projects or business play out. My favorite on this topic is people explaining the thought process when doing their signature work <a href="https://jvns.ca/blog/2021/12/31/2021--year-in-review/">https://jvns.ca/blog/2021/12/31/2021–year-in-review/</a>:</p>
<blockquote>
<p>The explanation looks something like “the reason many people struggle with TOPIC is because they don’t understand X, here’s what you need to know”</p>
</blockquote>
<p>These secret recipes are usually quite easy to spot: they look over-simplified and biased because it’s what we use to remind ourselves of the most important part of our thinking process. It’s a recent popular statement that you can’t really rely on being creative to repeatedly publish good content, rather people figure out practices and frameworks that serve as the limit and container for content at the same time. Despite the fact that these practices are tailored to its purpose, one should still find them easy to transform. The Bartok and Coltrane example <a href="https://youtu.be/QCwqnjxqfmY?t=934">https://youtu.be/QCwqnjxqfmY?t=934</a> says it very well: a problem/constraint is proposed for the author to solve, a design for performers/authors.</p>
<p>Another category of year review is a summary of works <a href="https://www.scattered-thoughts.net/writing/2021/">https://www.scattered-thoughts.net/writing/2021/</a>. Although the listing can be too dry to read in detail, it can be a perfect index page for revisiting the blog. It’s not a convention for a blogger to write a review of past posts/work, I find these posts far more effective than About pages. It’s sad that the blog, which was invented quite recently and has plenty of opportunies to evolve with the internet, still lacks basic convenience of the ebook or even the physical book: bookmark, good search, index and glossary. If there are things we need to pick up to make blogs great again, making the content more accessible is one of them.</p>
<p>There are also more personal posts <a href="https://yufree.cn/cn/2021/12/31/34/">https://yufree.cn/cn/2021/12/31/34/</a> , <a href="http://bowarrowstreet.blogspot.com/2021/12/blog-post.html">http://bowarrowstreet.blogspot.com/2021/12/blog-post.html</a> where people talk about moving, traveling, cooking and career choices. These are the most unique topics that can’t really be found in other places. I always appreciate that the authors are willing to share personal life in a way that I usually only get from close friends when we physically visit each other, which is even more rare since the pandemic. If blog wes invented today by big tech, these would definitely be exclusive paid content. There’s a small chance that Substack will keep growing and everyone is onboard, it’ll be sad that people start to put these personal experiences behind a $1 pay wall.</p>
<h3 id="booksreading-review">Books/Reading review</h3>
<p>Sadly but naturally, book review is still the best source to randomly pick a book to read, except one believes ML has achieved super human intelligence and relevancy is more important than quality. On the other hand, I’m suprised how bad douban’s effort on book community ended up. They basically abandoned the recommendation algorithm (<a href="https://book.douban.com/recommended?icn=index-nav">https://book.douban.com/recommended?icn=index-nav</a>) and a few social media features (<a href="https://book.douban.com/updates?icn=index-nav">https://book.douban.com/updates?icn=index-nav</a>) in book section. The ratings system was also rigged by paid reviews when the publisher figured out how huge the influence of reviews have on sales number.</p>
<p>Tyler’s book reviews <a href="https://marginalrevolution.com/marginalrevolution/2021/12/what-ive-been-reading-210.html?utm_source=rss&utm_medium=rss&utm_campaign=what-ive-been-reading-210">https://marginalrevolution.com/marginalrevolution/2021/12/what-ive-been-reading-210.html</a> are quite good, for Chinese books, I got a few good ones from douban <a href="https://m.douban.com/note/822990754/">https://m.douban.com/note/822990754/,</a> <a href="https://www.douban.com/note/823105912/?dt_dapp=1&utm_source=pocket_mylist">https://www.douban.com/note/823105912/,</a> <a href="https://www.douban.com/note/822755021/?_i=1177663QBK6wW1,1177933QBK6wW1">https://www.douban.com/note/822755021/</a></p>
<p>Douban’s user size and activity started to shrink since probably 5 years ago, it’s a shame that fewer and fewer people write year review because the book and movie integration and the unique culture are perfect for encouraging people to write year reviews about books and movies.</p>
<h3 id="video">Video</h3>
<p>In 2021 I started to watch a lot more youtube video in various topics that I haven’t consume in video before: cars (Alex on Autos, Engineering Explained), electronics(Applied Science, GreatScott!), OS dev(Andreas Kling), Farm business(Gold Shaw Farm) and even chip making(Sam Zeloof). It’s amazing that nowadays you can use search quite technical terms on youtube and get either personal projects that has <1k views or popular short explainers.</p>
<p>Although year review is not really a thing for youtubers, a few still show up on my subscription feed and they are all quite interesting <a href="https://www.youtube.com/watch?v=Okqrbwf4tjA">https://www.youtube.com/watch?v=Okqrbwf4tjA</a>, <a href="https://www.youtube.com/watch?v=p0demxxnon0">https://www.youtube.com/watch?v=p0demxxnon0</a></p>
<p>Besides the part that’s exclusive about youtube, or more specially making them, video year reviews don’t offer too much that has to be presented visually. I’d love to instead read a blog post with plenty of data and analysis about the video making pratice or fighting with platforms.</p>
<h3 id="reflecting-predictions">Reflecting Predictions</h3>
<p><a href="https://www.youtube.com/watch?v=46f-YuSyAvU">https://www.youtube.com/watch?v=46f-YuSyAvU</a>, <a href="https://twitter.com/VitalikButerin/status/1477402749994156036">https://twitter.com/VitalikButerin/status/1477402749994156036</a></p>
<p>It’s not a popular idea to make predictions about the near future (i.e. the coming year) and reflect on them, probably because the feedback loop is too long, and in general that, prediction is super hard <a href="https://sites.google.com/site/steveyegge2/ten-predictions">https://sites.google.com/site/steveyegge2/ten-predictions</a>. The good news is they offer even more insights when proved wrong. The difficult part about reflecting on them is to properly assess them by removing black swan and information that’s not available when they are made. Hence it’s probably worth adding assumptions more than it appears to be needed, just for the convenience of reflecting.</p>
<h3 id="blog">Blog</h3>
<p>For people who are very conscious about social media’s toxicity, blog is still a good alternative where good quality content can be shared and found, although obvisouly one order of magnitude fewer than social media in terms of quantity and popularity. Fortunatily for certain topics such as computer science, programming, econ and electric engineering, blog is still sort of at its peak mainly because people in these areas share a nerd culture and still appreciate a few not so obvious advantages of blogging <a href="https://danluu.com/programming-blogs/">https://danluu.com/programming-blogs/</a>.</p>
<p>Blog is in many ways better than institutional media, the two major issues are noise and discovery, which social media is good at. I find whatever combines the elements of these two is interesting in different ways. Tyler’s assorted links <a href="https://marginalrevolution.com/marginalrevolution/2021/12/most-popular-mr-posts-of-the-year-2.html">https://marginalrevolution.com/marginalrevolution/2021/12/most-popular-mr-posts-of-the-year-2.html</a> doesn’t hesitate to present the relatively raw characteristics. Although its comment section doesn’t appear too informative for me, it does seem to provide meaningful feedback to Tyler. While MR is perfect on its own domain, it’s not really reproducable given it’s very rare a domain expert commits to write blog posts in such a regular basis.</p>
<p>While platforms like Substack and Patreon offer a light solution to noise and discovery, its corporate nature has huge long-term influence on the kind of content creators choose to publish. For minimum influence, I believe the payment and discovery mechanism must be completely independent components.</p>
<h3 id="monetization-patreon-and-substack">Monetization, Patreon and Substack</h3>
<p>I realize there is now a spectrum of how far monetization can go. Next to all open and free, Sam Harris <a href="https://www.samharris.org/">https://www.samharris.org/</a> only puts an extra step to all free content ⇒ Patreon allows creators to set quite a few different tiers ⇒ Substack seems to only allow monthly and annual subscription with a fixed charge ⇒ institutional media with pay wall.</p>
<p>Looking at Substack’s top paid on technology <a href="https://substack.com/discover/category/technology/paid">https://substack.com/discover/category/technology/paid</a>, specifically the category of posts that are behind the pay wall makes me feel it’s not different from institutional media where the paid wall is designed to maximize revenue and nothing more. I’m certainly not against pay wall or any mechanism that allows authors to monetize their works, it’s the choice of which and how to get audience to pay that matters the most. If the creator has full control over how to monetize, I doubt one would put a constraint on the granularity of paying for content.</p>
<p>However, creator platforms’ decisions are definitely highly influenced by the recent success of monthly subscription. It’s probably under the disguise of “support” according to how Patreon argues per creation payment is almost a legacy feature:</p>
<blockquote>
<p>Instead of thinking of your paid posts as things that your patrons have “purchased” try to think about your paid posts as content you’ve created– and your patrons have membership that unlocks access to your content. Patrons value that content at different levels of support.</p>
</blockquote>I’ve been enjoying reading year review from my blog RSS feed and douban feed. Since 2021 is quite a boring year for me myself, I’d rather write a review about reviews.Github 走进生活2019-12-25T00:00:01+00:002019-12-25T00:00:01+00:00https://josherich.me/reading/github-more-than-code<p>看似偶然或灵光一现,Github 越来越代表了一种发声,展示和反馈的方式。在工具层面上,它和古典 BBS 并没有本质的不同:是一种非 Live 的,对于文本的展示和共同编辑。在文本展示的背后,其最重要的 Git 版本控制和 Pull Request 可以看做一种议会的运转。无论是<a href="https://github.com/drop-ice/dear-github-2.0">公告</a>,<a href="https://github.com/996icu/996.ICU">请愿</a>,<a href="https://github.com/sindresorhus/awesome">四库全书式的列表</a>,都体现了群体的某一块交集。而这种交集也符合了现代人的碎片生活,通过 watch 功能,参与者无需象 BBS 用户般每天查看,可以对项目和具体议题(issue)持续关注。</p>
<p>正如互联网早期用户一样,参与此类项目的用户也有着明显的自由主义倾向。但在我看来,这反而限制了参与者的想象力。无论从议题的类型,还是操作的方式来看,Github 都不应该限于请愿和申诉。</p>
<p>在每一个衰退的领域,我们或许都可以在这里找到一个更好的投射:杂志,新闻,出版,写作,营销,法律援助,社区议题。。。</p>
<p><a href="https://github.com/drop-ice/dear-github-2.0">Open letter to ICE</a></p>
<p><a href="https://github.com/timqian/chinese-independent-blogs">中文独立博客</a></p>
<p><a href="https://github.com/996icu/996.ICU">996 ICU</a></p>
<p><a href="https://github.com/formulahendry/955.WLB">work life balance</a></p>
<p><a href="https://github.com/Jinnrry/getAwayBSG">逃离北上广</a></p>
<p><a href="https://github.com/HuaweiJoke/Huawei-Joke">huawei Joke</a></p>
<p><a href="https://github.com/evil-huawei/evil-huawei">evil huawei</a></p>
<p><a href="https://github.com/facert/beijing_house_knowledge">北京买房</a></p>
<p><a href="https://github.com/renjie-feng-trash/fengrenjie">冯仁杰</a></p>
<p><a href="https://github.com/CNwoman-bot/evil-man">性别暴力</a></p>
<p><a href="https://github.com/JadaGates/ShadowsocksBio/blob/master/readme.md">SS历史笔记</a></p>
<p><a href="https://github.com/HDYA/constitution-of-fudan-university/pull/1/files">复旦大学章程</a></p>
<p><a href="https://github.com/Pratitya/wuhan2020-timeline">疫情与舆情:武汉新冠肺炎时间线TIMELINE</a></p>看似偶然或灵光一现,Github 越来越代表了一种发声,展示和反馈的方式。在工具层面上,它和古典 BBS 并没有本质的不同:是一种非 Live 的,对于文本的展示和共同编辑。在文本展示的背后,其最重要的 Git 版本控制和 Pull Request 可以看做一种议会的运转。无论是公告,请愿,四库全书式的列表,都体现了群体的某一块交集。而这种交集也符合了现代人的碎片生活,通过 watch 功能,参与者无需象 BBS 用户般每天查看,可以对项目和具体议题(issue)持续关注。