Skip to main content

Text Functions

The vector function package contains functions for creating and comparing vectors.

IMPORT vector.*; -- imports all functions
IMPORT vector.toJson; -- imports single function

Reference

Function NameDescription
OnnxEmbed(String text, String modelPath)Embeds the given text using an ONNX model specified by the model path. This function handles text tokenization, tensor creation, and model inference, returning a FlinkVectorType representing the embedded text vector. Example: onnxEmbed("Sample text for embedding.", "/path/to/model.onnx") processes the text through the ONNX model.
CosineSimilarity(FlinkVectorType vectorA, FlinkVectorType vectorB)Calculates the cosine similarity between two vectors. This is a common operation in many vector space models in machine learning, especially useful in text analytics to find the similarity between documents. Example: cosineSimilarity(vector1, vector2) returns the cosine similarity score.
CosineDistance(FlinkVectorType vectorA, FlinkVectorType vectorB)Computes the cosine distance between two vectors, which is 1 - cosineSimilarity. This function is used to measure how different two vectors are. Example: cosineDistance(vector1, vector2) gives a measure of the distance between two vectors.
EuclideanDistance(FlinkVectorType vectorA, FlinkVectorType vectorB)Calculates the Euclidean distance between two vectors, a direct measure of the distance in the vector space. Example: euclideanDistance(vector1, vector2) computes the distance, useful in clustering and other machine learning algorithms that rely on distance calculations.
AsciiTextTestEmbed(String text)Generates a simple embedding of the text based on ASCII values. This simplistic approach can sometimes serve for quick tests or baseline embeddings in scenarios where text length or uniqueness is a factor. Example: asciiTextTestEmbed("Hello") generates a vector based on ASCII values of the text.
Center(vector)Used as an aggregate function to compute the center (mean vector) of a group of vectors during aggregation operations in Flink. This function is typically used in clustering or when calculating the centroid of data points.
DoubleToVector(double array)Converts a vector to a double array