Semantic Address-to-Postcode Retrieval with Ollama and Qdrant
This is the second problem from the previous post. Not only the address data is in the wrong format, the data is missing!
In tax-related documents, the most important data is the postcode which for some reason some people didn't include it in there. So I decide to tackle this problem with semantic search techniques. First, I created a vector store using Qdrant with Thailand data from ThepExcel.

Again, I used n8n to created data points and inserted it into Qdrant, Which I would need to use embedding for this, at first I used Google text-embedding-004 because it is fast and free but later found out that it didn't support Thai language. So I switched to bge-m3:567m on Ollama instead.

I used cosine as vector distance and 1024 as vector size for bge-m3.
Then in n8n I arranged node like this:

I used 100 chunk size, with 0 overlap because my data is only 1 rows with average token around 70-80 so I didn't mind use more then 100.

Finally, I used the vector store I created to convert address into postcode by using raw address as a query then convert text output into JSON object so n8n could write the data into CSV file.
