Jonathan Lam

Core Developer @ Hudson River Trading


Blog

Brain dump 2

On 7/7/2021, 8:56:42 PM

Return to blog


On software engineering

My thoughts on this are not very collected, but I thought I'd put this in print anyways since I feel strongly about it.

I'm currently a month and a half into an internship at MathWorks. It's been a slow start, slower than last year at Cigna. In both roles, I ended up in web development roles (no doubt because that's where most of my background lies). Last year, I was using a large web framework to develop some internal dashboards. This year, my work is a R&D project, and we're using a different framework (which I was not familiar with) as well as a much larger codebase in general.

For the first few weeks, I was facing a major problem: I didn't understand enough about the codebase, and I didn't even know where to start. The problem with asking a question like "How do I do X" is that I didn't even know how to choose an X to ask about. If you then ask "How do you do Y?", where Y is a more general task than X, then you might not get an answer. So it almost feels like being stuck in this weird place, where asking and not asking don't really get you very far. So the best solution is to hack away at this huge codebase until something miraculously works.

Another problem is that I'm working on two teams: I was assigned to one team, but the project that I'm developing integrates another team's solution. So there's the uneasiness of asking and scheduling meetings with the other team, especially since they don't have time set out specifically for my project. So I have to be careful to only ask when there wouldn't be any other way to solve something. This makes it difficult because then it's hard to narrow down the problem.

It seems like an easy solution: just ask! But it's really harder to put into practice than I had originally thought. And it might be because I'm used to asking questions like someone would on Stack Overflow, where you should have researched the question thoroughly before asking and include all the context. But of course, we can't share our proprietary code on the public web, and even if we could, the context would be too overwhelming for the tasks that I'm working on. Another reason why I might be stuck like this longer than necessary is that I generally think that I'm good at reverse engineering. But I've never worked on something of this scale, and I think I exceeded my mental RAM sometimes and gotten demotivated.

One thing that's been very helpful is to have daily office hours with my mentor, when I've been able to openly ask. It does away with the awkwardness of scheduling meetings and feeling like you're intruding on someone else's time, because the time is already set aside for you. I think I do have to get more comfortable scheduling meetings, but for now this is fine. The original problem of not being able to ask a good question still persists, however: the question may be too broad so that you can't ask it directly and get meaningful results, and you can't break it down because you simply don't understand enough about the infrastructure to ask a better question.

I've already talked several times to my manager, my "buddy," and my team lead about this, and their insight is really helpful (and what I wanted to share).

One piece of advice from my manager is to set a simple time limit as a rule. If you've spent more than Z minutes on a task and are unable to move forward, ask. It gets rid of the subjective barrier of when to ask, and it shows that you've put some effort into the problem. I'm generally bad with following rules like this for myself, but I'll try to get better at it.

When talking with my mentor and my team lead, I often say that I don't understand enough about the infrastructure to be able to implement any of my project. I'm sure this is some form or another of imposter's syndrome, but it's very real. I hear the others on my team (all full-time; I'm the only intern) speaking fluently and with total understanding about their projects and asking questions about other teammates's tasks, while I am still completely unknowing of what their tasks mean. Maybe it's because I've never worked alongside another intern, so I don't know what others feel like when they're working on a team. I feel like I don't understand enough of the infrastructure to be able to implement anything on my own, to be able to choose the best or correct way to do a particular task.

This is where my mentor tells me that that is not what I need to do. I don't need to understand everyone's code. He said something along the lines of (paraphrased from memory): "the softwared developers that I really admire are the ones who can jump into a huge codebase with a task in mind and quickly point to a few lines of code and think: 'That's the code I need to change.' And they understand and change only the few pieces of code they need to change. On the other hand, other programmers in a startup might brag that they can write code super quickly and be super efficient -- but they don't have to fit in with the rest of the system. That's easy when you wrote most of the code. But in MATLAB, you have to integrate with the millions of lines of code that already exist in the codebase, which is a lot harder."

Unfortunately, I'm one of the latter programmers, the one who usually writes the entire codebase that he works in. I haven't dealt much with large team projects, and I think I greatly underestimated what it means to work on a team and really not understand the code outside of your own and closely related pieces. I used to think that software programming was hard because you had to learn a lot of technologies and be able to put them all together, but now I realize that it's because, on top of that, you have to learn other people's intents when you look through existing codebases. There won't be nice documentation everywhere you go, not everything will be written like a public-facing library with helpful resources.

As obvious as it may seem to others, it was a revelation for me. That software engineering was more about understanding your team rather than understanding the technology. My previous notion of software engineering really shows how little I know about working as part of a big team.


New thoughts on Python and Javascript

I've pretty much always held animosity towards Python, from the very beginning. At first, it was probably because I was too used to the C-style languages, and it felt too weird for me. At some point later on, it had to do with the fact that there were too many magic keywords (e.g., __getitem__ and all of the other double-underscore-delimited special keywords) defining or redefining syntax so that I didn't know what was going on under the hood -- again, maybe because of my Java rigidness, I wasn't used to Python or C++ operator overloading. Too much felt like magic under my fingers, and I like knowing what's going on: simple procedure calls and object-oriented programming were my style. Later on, in college, data science and Python or MATLAB were almost equivalent -- all of the ML courses involved Python or MATLAB, and all of the Python projects I did were for data science purposes. This increased my animus towards Python, what with package dependency annoyances and the terrible coding style I observe in my fellow engineer's work. Spaghetti code galore.

Other things that have generally peeved me is that Node.JS generally beats out Python in numeric benchmarks, so it isn't faster than Javascript. To me, Javascript feels far superior: NPM is great1, there isn't a weird divide like the Python 2/3 split2, and (not least) the syntax is simpler and more intuitive to a programmer familiar with the C-family of languages. To the latter point, I don't like many of the syntactic devations Python made from C code, such as writing out several operators (the ternary and boolean operators) as words rather than symbols, assignments not evaluating to expressions, etc.3

My opinions over the last few weeks have been changing, as I've used Python for a pet project (the configuration client for my VEIKK driver), and have been using Javascript at scale for my internship. It seems that the more I use a programming language for work or school, the more I loathe it, and the more I use one for a personal project, the more I like it. This is the case here too.

On Python (and its ease of use)

Simply put, Python has great libraries. Tensorflow, SciPy, Numpy, Pandas, Matplotlib. These are the go-to libraries for machine learning and data analysis. The documentation tends to be pretty good as well. Support is relatively poor in other languages (I'm glad to see Tensorflow.JS's rise).

In my case, what I needed to do was interact with several Linux userspace APIs. I needed to deal with sysystemd to run a system service and handle logging, dbus for IPC between the mapping daemon and the configuration tool, some GUI tools (GTK/Qt and Xlib) to help the user set the screen mapping more easily, udev to detect when VEIKK devices are plugged in or removed, evdev to listen to events from the driver, uinput to create virtual devices to spawn events, and a YAML parsing library to handle serialization for the configuration file. Any single one of these would be painful using the C interfaces (which most of them are natively written in). Luckily, each one of these Linux APIs has at least one well-maintained Python wrapper library, and I was able to get each one up and working fairly quickly.

Having had some time to experiment with Python, I also learned that there are some cool features. The class metaprogramming (metaclass) concept is a cool idea and useful (e.g., in YAML and DBus, when classes correspond to a certain YAML type or a DBus interface), and I now understand better how static methods and class methods work and why they might be useful. Dynamic dispatch in Python is as straightforward as it can be, and multiple inheritance has not caused problems for me. Optional typing (much like TypeScript) is also highly appreciated, and has made debugging much less painless.

My takeaway is that, for most projects where you need to bring in a lot of libraries and don't care about absolute performance, and you want semi-functional programming with an optionally-typed ecosystem, Python is a great tool. It really felt like coding in Javascript.

On Javascript (and its inconsistencies)

People tend to complain a lot about some of the weird behavior in Javascript, but that usually deals with strange combinations of truthiness of empty dictionaries and arrays. One of these scenarios came up for me recently; I don't remember the exact scenario but it wasn't important. I won't talk about these because I don't think you should be checking equality of empty dicts and arrays anyways -- as far as I can tell, those kinds of scenarios really matter because you would never do those operations in production code anyways.

What I do want to complain about is Javascript's classes. I don't like them. When I began using Javascript, the class keyword didn't exist, and all classes were function instances. There are no real "classes" in Javascript, nor does it officially have inheritance -- objects have a "prototype" that you can use to mock inheritance. It feels weird, and there is no official support for multiple inheritance4. It took me a long time to even remotely understand this syntax, since the prototype magic all seems to be done implicitly. The act of "subclassing" by example is not intuitive to me at all.

One example is about how you (shallow) clone an object, including its prototype methods -- this was something I needed to do today at work. An example is given in this Stack Overflow answer, which points to Lasse Reichstein Neilsen's generic beget (clone) function, which is described in another Stack Overflow answer.

Object.beget = function(o) {
  var F = new Function() {};
  F.prototype = o;
  return new F();
}

(After this, all of o's enumerable properties5 also have to be copied over to the new object -- this only copies the constructor, which is not an enumerable property.) I don't find this copying intuitive at all.

Another qualm I have about Javascript (without type annotations like with TypeScript) is that it really isn't typed. Because there are no real classes, it's hard to determine what type an object is -- all objects really look like regular dicts. While this is nice when you know what you're doing, this becomes extremely frustrating if you're trying to reverse engineer other code, as I was for my MathWorks role. In Python, because there are true classes, you have some degree of introspection -- various methods to get the type of an object are discussed here.

To be fair, I am not the most up-to-date with Javascript. Like Python, it is a rapidly updating language, and there may be new metaprogramming and improved OOP features that I'm not aware of. But, at least in the Javascript of a few years ago, pure OOP is not as great as in a language with native class support.


Literate programming

I just think this is a cool idea. It was an idea introduced by Donald Knuth, in which code is interspersed with explanation in plain English. I feel that this corresponds strongly with my idea of "implicit documentation", except here the documentation is more explicit (but perhaps less explicit than a regular documentation page). This document can be run, and the text will be ignored like comments.

In Bird-style you have to leave a blank before the code.

> fact :: Integer -> Integer
> fact 0 = 1
> fact n = n * fact (n-1)

And you have to leave a blank line after the code as well.

Org mode supports literate programming, as does Haskell natively with the .lhs extension. The code is indented and the explanation is at the top-level, which seems to me like the explanation is more important than the code.

Compare and contrast this with notebooks (think Python's Jupyter notebook or MATLAB Live Scripts). We also see comments describing the code, but usually the non-code are charts or other data analysis, while in literate programming we are primarily describing the code's functionality.

An interesting way to see it is that we are "programming intent" in regular English, and only annotating it with runnable code that implements our intent. This reverses the paradigm of "write code to do a task and then describe what it achieves afterwards" comprising implicit documentation.


"What To Do When the Trisector Comes"

I find this paper and its title funny, especially after having learning the meaning behind it in our Algebra class this semester.

We learned that there are three classical impossible geometric constructions using only a compass and a straightedge: "doubling a cube," "squaring a circle," and trisecting an angle.

To motivate you to read the essay, I'll simply quote the first paragraph:

A trisector is a person who has, he thinks, succeeded in dividing any angle into three equal parts using straightedge and compass alone. He comes when he sends you his trisection in the mail and asks your opinion, or (worse) calls you to discuss his work, or (worse still) shows up in person. You may think that the problem of how to deal with trisectors is not an important one; I intend to show that it is.

For context, I encountered this paper after seeing this Academia Stack Exchange question, also about cranks. Despite the whimsical title, we do live in a world with conspiracy theorists and cranks, so it might be more practical than you think (if you're not convinced enough to read it).


Jeff Dean

He just seems like an awesome character. One of the earliest Googlers and inventor of MapReduce. Now a Chuck Norris of the Internet. I won't reproduce them here, but there are many good ones (including true stories) in the following links:


Footnotes

1. NPM's package.json is nicely integrated with NPM/yarn, whereas with pip you have manually manage dependencies -- yuck. And setuptools is a mess, with three different official file formats (setup.py, setup.cfg, and pyproject.toml) without a clearly recommended version.

2. Seriously, Python, what is up with this?

3. A funny anecdote about criticizing Python: Randall Munroe, the author of xkcd, gave a talk at the Cooper Union in 2019, which I attended. He mentioned once dissing off Python in another talk because he thought that there was excessive hype for Python and no one was really criticizing it. A member of the audience asked the question: "If you had to change something about Python, what would you change?", to which Munroe was left scrambling for an answer. The audience member was Guido. On a similarly unrelated note, it seems Munroe has been denied access to Python conferences in the past, so maybe his Python antagonism is related.

4. One library that supports MI in JS is the Dojo framework, but simply by the virtue of implementing MI in the language it is likely slower than in languages that support it natively, like Python.

5. Enumerable and non-enumerable properties have also long confused me in Javascript.


© Copyright 2023 Jonathan Lam