Correct me, I'm wrong

Naive rantings

Machine learning terminology is a mess

June 13, 2021 — ~nfg

Here, I nitpick and whine about things that are really quite alright. This is catharsis, not commentary.

From the glossary of a google ML course:

Embedding:

A categorical feature represented as a continuous-valued feature. Typically, an embedding is a translation of a high-dimensional vector into a low-dimensional space.

Ignoring the fact that this description is a little prohibitive to a someone not learned in vector maths, this just sounds like an Encoding, no?
It’s just a different representation of the same data, we arent Embedding anything into anything else really, just reformatting the same stuff.


Query: (also known as Context)

The information a system uses to make a reccomendation.

Huh? I thought a query was an action, like “I queried so and so with this question”.
Here it’s used to describe plain old data. “Also known as Context” Yea context is better but wow is “Context” an overused word in tech. The number of times I had to deal with some amourphous, abstracted “Context” while coding is too many to count.
(I really dislike the “Context” abstraction, it can mean anything! What’s in memory? Whats on disk? What async calls are waiting? What other processes are running? What sockets are open? AAARGH!)


Oh and don’t get me started on “training”, “testing” and “validation” data.

Again, this is all whining for the sake of whining, and really doesn’t bother me enough to write any more than a paragraph on, but machine-learning vernacular could really use a good refactoring.

tags: terminology, machine-learning, whine-posting