Understanding Vectors in AI: From Coordinates to Semantic Search
We are going to start this series by talking about vectors. Vector representations of data and vector search are a huge part of what you will be implementing when building AI applications.
While vector is a concept that is very common in physics, its application in software engineering is slightly different, though rooted in the same principles.
The Physics Perspective vs. Coordinates
In physics, you might have heard of vectors when discussing force and acceleration. A vector is defined as a quantity that has a magnitude and a direction.
In two-dimensional space, a vector might look like the image below. The direction is represented by the blue angle (30 degrees), and the length (magnitude) is 5.
This vector might represent a force or a magnetic field in physics. However, in our case, it is going to represent the meaning of some text.
There is another way we can describe this arrow, and that is with a coordinate. Instead of defining the arrow by its angle and length, we can say that the angle points to specific coordinates. For example, 4 on the x-axis and 3 on the y-axis.

This gives us a point in two-dimensional space. The x-axis is one dimension, and the y-axis is another dimension. These two numbers define exactly where this vector sits on the graph.
Scaling Up Dimensions
While the previous examples only have two dimensions (x and y), you can actually have vectors with as many dimensions as you like.
For instance, we can add a z-axis to represent a point in three-dimensional space.

You can go further than this. Vectors in machine learning typically have hundreds, and sometimes thousands, of dimensions. You cannot really represent them on a nice graph because it is impossible to draw a graph with 512 different axes. However, the concept remains exactly the same, just scaled up.
Vectors as Code
Here is an example of a vector with 512 dimensions.

Since you are a software engineer, you will probably recognize this format immediately. It is basically an array of floating-point numbers. When we deal with vectors in our code, this is really what we are dealing with.
The numbers inside this array have been normalized between -1 and +1, meaning they are all decimals inside that range.
[
0.0520851798, -0.0713075, 0.0256588403, -0.0477592833, -0.0728425,
-0.0319209248, -0.0966349244, -0.0711679608, -0.0731913671, 0.0186292604,
0.0227632821, -0.109054431, -0.0819827, 0.00105040334, 0.0334559195,
// ... continuing for 512 items
]
From a programmer's perspective, a vector is simply an array. The array above is 512 items in length. A 2D vector is an array of two items, and a 3D vector is an array of three items.
Why Are Vectors Useful in AI?
You might be wondering why this array format is useful. If you can represent data as a vector, you can perform calculations on those vectors. You can also feed them into neural networks to perform complex tasks.
Vectors are an extremely useful way to represent, store, and manipulate data when doing machine learning. Once you get your data into a vector format, you can use basic mathematics to find out how similar two vectors are to each other.
It is this specific technique of comparing vector similarities that underpins semantic search.
Recap
- Definition: In physics, vectors have magnitude and direction. In AI, we often view them as coordinates or points in space.
- Dimensions: While we can easily visualize 2D and 3D vectors, machine learning vectors often contain hundreds or thousands of dimensions.
- Code Representation: To a software engineer, a vector is simply an array of floating-point numbers (often normalized between -1 and 1).
- Utility: Converting data into vectors allows us to perform mathematical calculations to determine similarity, which is the foundation of semantic search.