RUS | ENG

Supported by:

CatBoost: Yandex’s machine learning algorithm is available free of charge

Russia’s Internet giant Yandex has launched CatBoost, an open source machine learning service

Aug 03, 2017
The algorithm has already been integrated by the European Organization for Nuclear Research to analyze data from the Large Hadron Collider, the world’s most sophisticated experimental facility.

Machine learning helps make decisions by analyzing data and can be used in many different areas, including music choice and facial recognition. Yandex, one of Russia's leading tech companies, has made its advanced machine learning algorithm, CatBoost, available free of charge for developers around the globe.

"This is the first Russian machine learning technology that's an open source," said Mikhail Bilenko, Yandex’s head of machine intelligence and research.

What do cats have to do with this?

CatBoost is no ordinary 'cat.' In fact, it means "categorical boosting": the algorithm works not only with numbers but also with many other "categories" of data, such as audio, and text or imagery, including historical data.

"CatBoost is based on gradient boosting, a machine learning technology that works very well with data from different sources," said Anna-Veronika Dorogush, head of machine learning systems development at Yandex.

The algorithm, for example, is great for weather forecasting, where it’s important to analyze a combination of historical data, weather models and meteorological data. Yandex is already using CatBoost as a part of its weather forecasting service to improve accuracy.

Contribution to machine learning

According to Yandex, the algorithm proved to be effective in different industries, including banking and production. CatBoost helped one client improve the quality of steel.

"Most machine learning algorithms work only with numerical data, such as height, weight or temperature," Dorogush explained. Other data, such as types of clouds or buildings, had to be "translated" into numbers before developers could use it. But sometimes information is lost in the process, and this impacts the final result.

"We made CatBoost an open source to give scientists around the world a simple and accurate instrument," Bilenko said. "That’s our contribution to the development of machine learning."