Log in

Quick registration

The voiceprint business war is in its prime

Author:Dongzi Time:2020/09/25 阅读:5978
Author | Wang Jinwang "The current 'voiceprint recognition' market is a bit like the 'face recognition' market in 2014, and the industry demand is starting to explode." Zhang[…]

声纹商战正当年

Author | Wang Jinwang

"The current 'voiceprint recognition' market is somewhat similar to the 'face recognition' market in 2014, and industry demand has begun to explode."

Dr. Zhang Weibin told Lei Feng.com.

Intelligent voice technology has been popular in China for many years. Even as early as more than ten years ago, a domestic intelligent voice technology company was listed in the country.

However, it was only in the past two years that voiceprint recognition technology was gradually "activated" in the public security and financial fields. On October 9, 2018, the People's Bank of China issued the "Technical Specifications for the Security Application of Mobile Finance Based on Voiceprint Recognition". At the same time, the public security organs also began to deploy systems using voiceprint recognition to combat new crimes...

From associate professor at South China University of Technology to chief scientist at Shengyang Technology, Dr. Zhang Weibin, as one of the early scholars engaged in intelligent speech technology research in China, has never left this competition for more than 10 years.

Three years ago, he and the founding team of Shengyang Technology entered the business field of voiceprint recognition. From academic research to voiceprint business war, many changes occurred in that year.

1

Exploring: Going to sea to make home appliances

In 2016, an intelligent voice technology company, Shengyang Technology, was established in Shenzhen.

At that time, there were iFlytek in Hefei, Sibichi in Suzhou, Yunzhisheng in Beijing, and Sound Intelligence Technology established at the same time... It can be said that intelligent voice technology was already blooming everywhere in this hot land.

How to choose an appropriate direction to enter this high-profile technology field amid the general trend was an issue that the Shengyang Technology start-up team needed to seriously consider at that time.

If you look at the backgrounds of the core founding members of Shengyang Technology, you will find that whether it is CEO Li Yatong, CTO Chen Dongpeng, or chief scientist Zhang Weibin, they are all veterans who have been involved in the fields of biometrics and intelligent voice for many years. Such a team is destined not to decide the company through paper talks. direction, the best decisions come from actual combat.

At the beginning of its establishment, Shengyang Technology was catching up with the beginning of domestic smart hardware. Smart voice technology brought more possibilities to smart hardware and even traditional home appliances.

It was also at this time that Shengyang Technology came into contact with its first project with cash flow - the development of an intelligent voice control module for a range hood from a leading domestic appliance manufacturer.

This project, which is not technically difficult, still encountered various problems when it came to actual engineering projects. Zhang Weibin told Lei Feng.com that the biggest problem at the time was false triggering. In order to solve these problems, Zhang Weibin joked that during the project process, he spent a long time in a "small dark room" to conduct closed R&D and tuning.

It was also during this project that Zhang Weibin deeply realized:

In terms of technology implementation, previous scientific research projects in universities may have completed less than 10%, and problems exceeding 90% must be solved in the engineering process.

Zhang Weibin also had deeper feelings about this kind of experience in many subsequent projects.

Come by the wind, and go by the wind.

This project did not leave much impression on Zhang Weibin and his team.

Perhaps, for Zhang Weibin, Li Yatong, and Chen Dongpeng, who all have technical backgrounds, they did not find the feeling they wanted when they started their businesses.

2

Opportunity: a cross-border project

In the first half of 2017, by chance, Li Yatong came into contact with the National Civil Service Insurance and Savings Company of Indonesia.

At that time, the company encountered a problem:

Retirees need to go to relevant local agencies for on-site certification every month to receive pensions. Firstly, it is to prove whether the person is still alive and eligible to receive pensions, and secondly, it is also to verify that it is he who is receiving the pension.
However, it is such a procedural process that has brought a lot of inconvenience to the local elderly who should be enjoying their old age in peace. As an island country with a population of 200 million, it is common to see elderly people in their 70s and 80s crowded in front of a bank, some even sitting in wheelchairs, queuing up for certification and pensions.

In 2017, biometric technology has been changing some people's lifestyles. Fingerprint recognition has been commercially available on mobile phones for many years, and facial recognition also appeared on the iPhone X released by Apple in September of this year.

Is it possible to solve the authentication problem when receiving pension through online biometric method?

This was an issue that the Indonesian local government was considering at the time.

Subsequently, the local government in Indonesia began to try to introduce biometric technologies such as face recognition, fingerprint recognition, and voiceprint recognition. Among them, the voiceprint recognition technology is purchased from Shengyang Technology, which was just established.

I remember very clearly that when two of our colleagues went to Indonesia to collect data, they saw that even in Jakarta (the capital of Indonesia), the transportation was very inconvenient. Although the elderly received pensions using the traditional on-site verification method, they avoided false claims to a certain extent. Insurance fraud has brought a lot of inconvenience to these elderly people.
 
Some of them are 70 or 80 years old, and some are even in wheelchairs...
 
After they came back, they told me from the bottom of their hearts: Even if our project does not make money, we must do it well.

It was this original driving force that made Zhang Weibin and his team work hard on this project.

After actually getting into this project, Zhang Weibin and his team felt that there were three real problems encountered in the implementation of voiceprint recognition technology:

First, noise.Although this problem is not special and will inevitably be encountered when doing voice technology in any scenario, it will still have an impact on the recognition accuracy;

Second, voiceprint comparison of very short speech texts.In this project, when authenticating, residents need to read 12 Indonesian digits that randomly appear on the mobile phone interface into their mobile phones, and then the system will verify the content and voiceprint at the same time. It takes about three or four seconds to complete 12 digits, so it is necessary to use such a short Voice comparison shows whether it is the person you are speaking to;

Third, cross-channel.During on-site registration, residents use professional microphones with better sound pickup effects, and for daily authentication, they use microphones on ordinary telephones. The sampling rate of traditional landline phones is limited, and the voice data sampled at 4kHz forms a distinct Comparison - the original sound is high-quality sound information. During verification, due to the poor transmission channel, the quality of the voiceprint information will decrease, which is also a challenge.

After five months of hard work, Zhang Weibin and his team achieved a measured voiceprint recognition accuracy of 99.7% by applying self-developed AI algorithm models in front-end signal processing and voiceprint feature extraction.

Finally, in May 2018, this system was officially launched and began to serve 2.5 million Indonesian retirees - they only need to use the mobile APP to achieve online certification at home every month.

It can be said that the Indonesian social security annual review project made Shengyang Technology earn its first pot of gold, and it also convinced them in their hearts that voiceprint and other voice technologies can do something truly meaningful to society.

Zhang Weibin told Lei Feng.com that until now, there are still many photos left on our company's photo wall from that time, and everyone thinks "this project is very meaningful."

声纹商战正当年

Shengyang Technology collects elderly voiceprints on site

Because of this, in June 2018, when Li Yatong, Chen Dongpeng, and Zhang Weibin reviewed their early explorations in voiceprint recognition, speech recognition, speech signal processing and other directions and decided on the future direction of the company, everyone had a tacit understanding. It is unanimously agreed that voiceprint recognition should be invested as the main strategic direction in the next stage.

3

Directions: Where is the starting point?

During the review, in fact, apart from the above reasons, the founding team of Shengyang Technology also carefully sorted out the business logic of voice technology:

We know that voice is used for communication, so it contains a lot of information, including emotional content, age, language and other information, but the most important thing is actually the person's identity information.

In the same sentence, what the engineer says may be "suggestion", and what the CEO says may be "decision".

We felt at that time that voice was the unique and most commonly used communication method for humans, and it would also be one of the important human-computer interaction methods in the future.Human speech carries rich information such as identity, age, gender, emotions, wishes, etc. To connect, organize, manage, and apply so much information, the premise is to first identify the speaker's "identity", which also makes us decisive We chose voiceprint recognition technology as the entry point.

Previously, in the field of intelligent voice, there have been several large-scale companies that are promising by the market. Does Shengyang Technology still have a chance?

Zhang Weibin told Lei Feng.com,In fact, the strengths of traditional voice companies lie in speech recognition and natural language processing. As far as voiceprint recognition technology is concerned, Shengyang Technology is leading.

This can be seen from the fact that in the 2019 Global Voiceprint Recognition Competition, they ranked second in the world and first in the Asia-Pacific region. They also won what everyone jokingly calls the "first bank in the universe" - the Industrial and Commercial Bank of China (hereinafter) Abbreviation: ICBC) is embodied in the voiceprint recognition project.

4

ICBC exam: the real battle

Although the title of "No. 1 Bank in the Universe" is somewhat joking, ICBC's strength, especially its technical strength, cannot be underestimated.

According to relevant statistics, in 2019, ICBC invested 16.374 billion yuan in science and technology and had 34,800 scientific and technological personnel. You know, as of the end of 2019, the technology giant Tencent only had about 40,000 scientific researchers.

It is ICBC, which can truly be called a technology-based financial company, that began preparing to introduce biometric technology in 2017, one of which is voiceprint recognition.

Shengyang Technology was one of the first dozen teams to participate in the bidding for this project. Perhaps they did not know that this project only took POC testing for nearly three years.

"At that time, the ICBC Zhuhai Software Development Center organized the test of this project. In this industrial park alone, ICBC has more than 7,000 R&D personnel. This shows ICBC's own technical strength and they know how to choose the technology they want," recalled Zhang Weibin was filled with emotion when he recalled his feelings when he first came into contact with ICBC.

How to evaluate the voiceprint recognition technology of different manufacturers on the market?

Regarding the bidding for this project, ICBC conducted several rounds of POC testing for nearly three years.The key test indicators are mainly dozens of indicators in terms of stability, recognition speed, recognition accuracy, etc.

There is another interesting story during the first round of project testing.

In 2017, the newly established Shengyang Technology was still a small team of only a dozen people. During the early POC test of the ICBC project, they only sent one engineer to bring their "FinVoice intelligent voice authentication system" to the ICBC project. ICBC conducted on-site testing. In comparison, other manufacturers sent small teams of seven or eight people.

On the contrary, this left a good first impression on the person in charge of ICBC: sending only one person showed that the technology of their team was relatively mature and stable.

Zhang Weibin smiled bitterly afterwards, "Actually, it was because we really couldn't send more people."

Whether it is one person or ten people, during the actual project testing process, what really competes is the actual technical ability.

In the last round of POC testing, there was a requirement: the participating test teams needed to find known target groups from a massive vocal library.

At this round, only three teams remain from the original dozen teams, which can be said to represent the top domestic strength in this segment.

Even so, this question is not as simple as it seems. It examines the comprehensive strength of the technical team.

Zhang Weibin told Lei Feng.com, "Our system ran out in about an hour or two, and there was another company that ran for several days and didn't finish."

When Zhang Weibin said this, he was full of pride for his team. As for the results, it is self-evident.

Just when the team was excited about winning the ICBC tender, the "nightmare" had just begun.

5

Metamorphosis: The next "nightmare"

Just when everyone thinks that after going through the rigorous POC test of ICBC, "the first in the world", they can "sit back and relax"?

The answer is obviously not.

From the completion of the POC test in December 2019 to the actual launch in June 2020, Shengyang Technology experienced another "nightmare".

The requirements in the POC test are only a "miniature version" of the requirements for the officially launched application. In the course of 6 months, Shengyang Technology's "FinVoice Intelligent Voice Authentication System" experienced another improvement in concurrency, recognition accuracy, and recognition stability. A radical change.

In the three or four months before going online, basically all the results obtained during the POC testing process were retested in ICBC's production environment.

After that, the "FinVoice intelligent voice authentication system" that can achieve "tens of millions of human voice databases and second-level response" is now available in ICBC.

声纹商战正当年

It is reported that the system has been applied in the first batch of credit card voiceprint anti-fraud scenarios in Beijing and Beijing in June 2020.
Four ICBC branches in Hubei, Sichuan and Shaanxi went online. It took only one week to go online and successfully prevented dozens of frauds and prevented economic losses of hundreds of thousands of yuan.

6

A bigger battlefield

It is such a potential team in the field of intelligent voice that recently completed a round of financing of nearly 100 million yuan led by Guangyuan Investment, with participation from Qianhai Fund of Funds, China Merchants Qihang Capital, Shuimu Capital and Hong Kong X Technology Fund.

It is not difficult to find that several investors who invested in Shengyang Technology have close ties with large domestic banks, insurance, finance, and financial technology companies. It can be seen that after this financing, Shengyang Cross-Border has not only received financial supplements, but also business synergy.

Why are employers so optimistic about a startup company in the voiceprint recognition segment?

In fact, voiceprint is just an entry point for Shengyang Technology in the field of intelligent voice:

On the one hand, taking large B-side customers such as banks as an example, it is difficult to become their supplier, but once they enter their supplier system, in addition to voiceprint, their project needs in voice will be given priority in the future. Cooperating with manufacturers within the supplier system is something that Shengyang Technology has the ability to do;
On the other hand, what the voice planning needs to do is not the single-point technology of voiceprint. What is more important is that it serves as a key attribute in the voice data - the identity of the person - to integrate the multi-dimensional value of the entire voice data. To lay out, from a single "identity recognition" to enhance "risk control", to the comprehensive use of multiple voice technologies to achieve users' "voice portraits", "intelligent marketing", etc., these are actually the industry's further application of the value of voice data. Expect more.

Obviously, Shengyang Technology is targeting a larger battlefield.

In this battlefield, Shengyang Technology will inevitably encounter several current unicorn companies in the future, and this is the charm of this world full of changes.

Leave a Reply


copyright © www.scitycase.com all rights reserve.
Beijing ICP No. 16019547-5