data science

Expert’s view: data science to allocate HS codes

By Camille Felappi

Data science has proven to be businesses’ best ally in many fields. But do you know how data science works? 

We spoke with Simo Jaanus, Data Scientist at Eurora. He explained what data science is, what challenges Data Scientists can face, and how Eurora uses data science to offer more reliable services. He said that “Given the enormous volumes of data being produced today, data science is a crucial component of many sectors, like cross-border compliance.” 

Cross-border compliance is precisely the field where Eurora pioneered the use of data science to take compliance to the next level. Hundreds of experts, scientists, analysts, and researchers developed the most advanced machine learning in cross-border trade compliance. 

Simo is part of the Eurora Data Science team. He carries out a hybrid role between Data Scientist and Machine Learning Engineer. He graduated from the University of Tartu, one of the world’s top 1.2% universities, where he obtained a master’s degree in Software Engineering. Simo obtained his degree after successfully defending his master’s thesis called “Building a classification model for HS code prediction from product images”. His research work will contribute to improving Eurora’s AI engine that was built by our experts.  

Here is what he shared with us. 

  

What is data science?

Data science is an area of study that combines domain expertise, programming skills, and an understanding of mathematics and statistics. Its aim is to extract meaningful insights from data. Data Scientists use machine learning algorithms on a variety of data types: numbers, text, photos, video, and audio. From it, they create artificial intelligence (AI) systems that can carry out activities often requiring human intelligence. These systems generate insights that analysts and business users can translate into tangible business value. 

“Given the enormous volumes of data being produced today, data science is a crucial component of many sectors, like cross-border compliance,” said Simo. Data Scientists deal with many questions on a daily basis: how should we use this information? How can we use it to our advantage? What practical uses can we find from it?  

Since data science has become more and more popular, businesses have begun to use it so they can expand their operations and improve their consumers’ satisfaction. In the cross-border compliance field, Eurora pioneered the use of data science that is at the heart of its solutions.

 

Why is Eurora using data science?

Eurora uses data science to automatically allocate HS codes. We assign HS codes using AI (Artificial Intelligence) with text-based product descriptions as input data. A Harmonized System (HS) code is an international coding system to classify traded goods. It is the backbone of cross-border trade because it is used by customs authorities to identify products and assess the applicable taxes. Given the importance of HS codes, there is no space for approximations and mistakes when allocating them.  

That is why using data science in trade compliance is much more secure. Kristi Helekivi, Eurora’s Head of Data Science, explained in a previous interview why using AI to allocate HS code is the most reliable way. Humans can easily make errors and it is impossible for a single person to know by heart the entire HS nomenclature that counts around 5300 HS codes. By using machine learning, however, you can provide the most accurate data, but also handle larger volumes than humans could. “A machine has a wider horizon than one person. With AI, all the knowledge coming from different people is in one machine together.” 

Simo added that businesses can benefit a lot from data if they know how to use it. One of the tasks of Data Scientists is precisely to find and introduce possibilities on the business side as well. Data science additionally helps Eurora differentiate itself from its competitors as it allows the company to keep up with the market demand and give clients what they want.  

Data science brings together domain knowledge, programming skills and mathematics to reduce manual labor.

Simo Jaanus 

  

What are the challenges of data science in the compliance field?

Although using data science is a real asset in regulatory compliance, it has its share of struggles, as Simo detailed. When you start a data science project, the first step is to find the data you need to start working. Data Scientists can therefore face challenges from the very beginning. When Eurora started developing its engine, there was no validated data available that could be used for eCommerce. Nonetheless, Eurora managed to collect high-quality data. “We built a team and system that is able to validate new data quickly and efficiently.” 

Not surprisingly, data science and machine learning are developing rapidly. New findings and solutions are coming out every month. As a consequence, keeping up with these developments is time-consuming and building usable system architectures around them takes time. Not only the huge number of new findings but also the variety of problem-solving strategies can be overwhelming, as Simo explained. 

Finding the right course of action and solutions takes a lot of knowledge, time, and experiments.

Simo Jaanus 
 

As Eurora is a pioneer in its field, the Data Science team faces other challenges regarding KPIs and metrics. “We are still searching for the best way to track progress in an effective, simple, and understandable way as we are building something that has never been done automatically.”  

  

How does Eurora overcome data science challenges?

As Simo explained, allocating HS codes using AI is the most reliable way to provide accurate results. Nonetheless, it can still be challenging because it relies on the input that the clients provide. Clients need to give accurate product descriptions to the engine so it can determine the correct HS code. To avoid situations where the engine cannot allocate an HS code because the description is incomplete, the Data Science team thought further and is implementing a root cause.

Our machine is automatically detecting the root cause that instantly indicates if any necessary information has not been filled in properly, so that customers could adjust their product description right away. The team has implemented product, material, and detail detection for descriptions. From these results, we can determine if the product, material, or detail is necessary for the final root cause prediction or if the HS code can be allocated without it. As we just implemented this functionality, we are internally evaluating the outcomes and only offering feedback to consumers upon requests. The objective for the near future is to promptly identify the root cause and even suggest additional words that may be utilized to assign HS codes correctly.”  

Furthermore, the team is currently investigating new types of data input to allocate HS codes. They found out that image-based data could be a very resourceful method, that was confirmed by Simo’s research work. Simo worked on a master’s thesis called “Building a classification model for HS code prediction from product images”. The aim of his research work was to discover the most effective method for predicting HS codes from images. He found very promising results: “With pictures, we do not need details to assign HS codes, as most of the information is seen visually. To find the best solutions, I experimented with various machine learning artificial neural networks for my thesis, taking accuracy and inference speed into consideration. The best performing model was able to produce findings that were very comparable to those of human experts, and even outperformed two of the three.”

Even though these results are promising, the team keeps doing applied research and investigating the best way to implement their findings in business. “It can be challenging to determine a product’s size from photos. Also, numerous accurate codes cannot be assigned only based on the appearance of an image”. That is why they are currently looking into using images as a complement to text-based product descriptions. 

For the future, it also paves the path to give out the most accurate prediction using both text-based product descriptions and images.

Simo Jaanus 

  

How does Eurora plan to further develop data science in the future?

The Data Science team will keep investigating models. They plan to experiment with embedded models or more powerful and faster models. Computer vision is constantly evolving, and new models are suggested regularly. “Being a Data Scientist also involves innovation and research. We have a lot of interesting ideas to improve Eurora’s AI engine. New customers come and use our services every day, so we can gather useful insights to improve the system,” said Simo. 

Besides HS code allocation, the team plans to use data science to improve Eurora’s activities in other areas. This could include using data science to improve Eurora’s webpage and user experience, or translation and language detection.  

If all goes according to plan, there will be billions of rows of data available for us to analyze and train models on.

Simo Jaanus 

Simo is convinced that data science will help Eurora achieve high sky goals. “I do believe in what we do here in Eurora and understand that it is something significant and important. As a result, there are indications that Eurora could become the next unicorn, decacorn, or perhaps hectocorn. I hope data science can assist the business heading in that direction.” 

 

Data science does not only help with HS code allocation processes but also Duty & Tax Calculation, IOSS, Restrictions Screening, and Customs Clearance services. Are you willing to automate your compliance processes with secure AI-based solutions? Get in touch with us!