Do all your Progressive Ideas need Machine Learning Modelling?

By Alex Willson On Nov 26, 2021

Check this post to know, when your company can easily solve the business task with no ML, and when it cannot help you.

In our daily work, we meet customers that want to use machine learning. They think ML is magic. The media have sometimes painted an inaccurate image of ML’s abilities. Sometimes the task they want doesn’t even need ML “it’s so simple” and sometimes it’s so hard “impossible to do it (until now)”. So, I am writing this article to give you some examples of what ML shouldn’t do, can’t do (yet), and can do.

What ML shouldn’t do?

Sometimes the task is so easy, or there’s an obvious pattern in the data. So, we can write a few lines of code to automate it without using ML. This rule-based automation here is better than ML. It’s less cost (time and resources) and accurate 99%.

As an expert ML programmer, I was asked by a client to build a machine learning model to classify some documents he got every day. With a simple visualization of these documents, I found there’s a list of keywords that differentiate these documents. He confirms that these keywords will be represented in the data in the future. So, I wrote a small script to look in each document for these keywords and return which category they belong to. He just wanted to automate the process and think the only way is ML.

Let’s pretend we’re going to build a machine learning model to do this task. The first step is to annotate some of these documents before training a model to classify them. This will be time and money-consuming. In the end, since the ML model will make a false prediction, the few lines I wrote would outperform this Model.

My colleague asked to do segmentations to pools and green areas in satellite images. He used the colors (static rule-based thresholds) to segment it and didn’t use any ML model. This was ideal for our case, and it was completed in a short period.

A friend of mine told me he did sentence segmentation with a parser that outperformed his company’s machine learning model. This is a library that does the same thing using no machine learning models.

In the document classification case, it was so obvious that we don’t need machine learning at all. But in other cases, like sentence segmentation and pool segmentation, it’s not clear if it’s better to use ML or rule-based. Many researchers in the literature used both and compared them. My friend Ray and his colleagues used rule-based and machine learning to classify high-resolution optical satellite images into morphological categories (e.g., ground, water, etc.) and compared the results.

So, it depends on the scenario. For example, the rule-based that my colleague used had a limitation with diversity.

Note that I am talking here about traditional rule-based with hand-crafted rules. it’s distinct from rule-based ML. The latter automatically identifies useful rules but needs data.

What ML can’t do?

One day customer came and asked us to build a model that does all tasks the customer service representative does (answering questions, responding to complaints, ensuring that customers are satisfied with services, etc.) We can build a model to route complaints to the right team to resolve; we can build a model to analyze customer reviews. But I think it’s hard to build a model that can respond to complaints, and there’s no Model (until now) that can do all these tasks.

What can ML do?

There’ are a lot of examples of what ML can do now, like translating one language to another, detecting faces in images, managing driverless cars, etc. There’s a rule of thumb that Ng’s statement in his course. He said, “If a typical person can do a mental task with less than one second of thought, we can probably automate it using ML either now or in the near future”. But that’s possible only when there is an enormous corpus of training data. No data, no learning.

Finally, it’s not always clear whether ML can help with a project at first glance. So, engineers do technical diligence on the project to make sure that it’s feasible. Quite often to ask for advice is the best way to get an expert conclusion.

More References

[1]. https://github.com/diasks2/pragmatic_segmenter

[2]. Raiyani, K.; Gonçalves, T.; Rato, L.; Salgueiro, P.; Marques da Silva, J.R. Sentinel-2 Image Scene Classification: A Comparison between Sen2Cor and a Machine Learning Approach. Remote Sens. 2021, 13, 300. https://doi.org/10.3390/rs13020300