Three phones with incoming calls of spam, an insurance company, and an unknown number
An arrow pointing leftHome

Hiya promises to use AI to thwart robocalls

  • Hope Reese
9/12/2022

But is it just helping to make them more advanced?

Seeing that unknown number pop up on your cell phone or listening to that voicemail that informs you that you “qualify for a lower car payment” is more than a nuisance—it puts consumers at serious and constant risk of fraud. And these calls have only been on the rise. There were more than 4 billion robocalls calling US phones in 2020, according to the FCC, costing Americans about $14 billion to scam calls.

A Seattle, Washington-based startup called Hiya says its AI is here to help. The company claims that its app, which uses machine learning, can catch 20% more of these illegal and undesired spam calls than what’s on the market, by analyzing more than 13 billion calls per month. Hiya Protect is being used by wireless carriers, phone producers, and app makers in services such as AT&T ActiveArmor, Samsung Smart Call, and the Hiya app. And its new “adaptive AI” is being used to learn from, and block, nefarious calling behavior in real-time.

PNW.AI caught up with David Kay, the senior director of product development at Hiya, to learn how the AI works to prevent junk calls and how it’s changing the game for spammers.

Interview has been edited and condensed for clarity.

Alex Algard founded whitepages.com in 1997—and years later, spun Hiya out of it. What was his intent here?

There was a need for an online directory service, and Whitepages included background checks, fraud analysis and more while being targeted towards e-commerce. While at Whitepages, Alex recognized that there was a growing spam and robocall problem, and existing technology around trust and identity in the mobile space was very primitive—in some ways it still is. Hiya started an initiative around trust, identity, and mobile at white pages, but he decided to spin it off into its own company as a direct-to-consumer application.

Hiya launched in 2017, following similar apps like RoboKiller and Truecaller. How is it different?

The main difference between Hiya and the apps like Truecaller and RoboKiller is that we focus our solution in partnership with carriers and mobile OEMs and integrate directly in their network, whereas RoboKiller or Truecaller are sort of over-the-top apps. Hiya focuses our business on working with carriers and OEMAS to help them solve their consumer spam problem.

The advantage of that is every consumer for a carrier gets the benefit of spam protection. It’s not only those who have the initiative or even know about the existence of a spam protection application. Every call that goes through one of our partners’ networks, we have the opportunity to put a warning on. That will show up on an Android phone or an Apple phone, even if that consumer has not downloaded it.

What is the benefit to working with OEMs and carriers?

It allows us to take advantage of data from the network. We’re able to get network signals, see how those calls look on the network, how they came on the carrier’s network and aspects of the signaling layer that happens before the audio part of the call is connected. There’s communication back and forth between the originator of the call and the dissemination of the call. Network signals are indicative, or that have patterns that are indicative, of a customer behavior. So that’s an advantage to a solution like Hiya’s over the application solutions.

In a recent survey you conducted, 62% of respondents say they received an impersonation call, trying to perpetrate fraud. How have these scams gotten more sophisticated?

There is an expanding power in telecommunication tools. There are software platforms that make calls and buy numbers, lease numbers, and release numbers. There’s sophistication around just the tools that are available for scammers. There is evidence that these scammers are running these scams like a business—they are trying to maximize their success. We see evidence of scammers doing AB tests, or using different prompts in different regions, testing which are more successful, [as well as] regional targeting. Some of that is understanding, through data they’ve collected, where the more susceptible populations are living.

How is Hiya’s algorithm different from traditional analytics?

Historically, analytics companies or apps that do spam detection, look at attributes of calling numbers to see whether a calling number has patterns. Things like: what are the volume characteristics? How many calls are being created? What is the ratio of outgoing calls to incoming calls to these numbers? What is their geographical diversity of their calls?

Our machine learning has gotten better at recognizing patterns as they use thousands and thousands of numbers. We call it a snowshoe approach where a scamming business may lease 10,000 numbers and work through those numbers. None of those numbers get much traffic, so it’s very difficult for these historical machine learning based systems to see those traffic patterns.

What kind of data does Hiya use in its machine learning algorithm?

We include three types of data, and we use event data and analysis. For every call that comes in you can see its attributes, and we can look for patterns in those calls. But we also have crowdsourced data. So, Hiya’s network has over 200 million users, 18 billion calls a month. Many of those users have the ability to report whether a call is spam or if it’s not spam. If we label the calls as spam when it was really a call they wanted, they’re able to tell Hiya it was a mistake. So we get between 20 and 30 million bits of consumer information back to us every week. It’s a large set of data that’s actually important for us in terms of training our models, because that’s a ground truth as to whether these calls were spammers or not.

For the third type of data: since we work in the networks, but can’t really look into the audio channel—we’re not going to be listening to the beginning of calls to look for robocalls—Hiya is still interested in getting that content and understanding robocalls. So we use honeypot frameworks, as a technical term used in fraud detection, where you’re setting up something to attract bad actors to learn about them. And we have hundreds of thousands of phone numbers that are just waiting for calls from spammers, which gives us a lot of insight.

We use that information to train models on what their characteristics are, but it lets us see which campaigns are very active. Campaigns are typically generated through many umbrella organizations, but they’re generated by small sets of individuals who are orchestrating all of that. But these honeypot frameworks tell us which campaigns are occuring so we can target their patterns.

Are calls simply marked, or are some blocked?

The FCC has certain instructions and guidelines for the industry on when calls can be blocked. And those guidelines are really focused on the perpetration of fraud—like those trying to extract money. We work very closely with our partners to decide what category of calls they want to block, but ultimately, our partner’s mission and our mission, is not to label causes as spam. It’s to improve the experience for their customers if you get those off the network. There’s a small category of calls that will be blocked before they get to the end consumer, though, because they’re especially egregious.

Do telecoms already have spam protection? What does Hiya add to that?

Typically, carriers have systems in place to protect themselves against telco-targeted fraud—fraud against the carrier where scammers might be calling on premium lines or trying to eke money out of the telecommunications companies through some of the fees that are charged across the telecom industry. But they typically have not developed their own sophistication in this consumer-targeted spam. Those carriers would look to an analytics company and sort of do a dip, an API call into that analytics company with every call that goes through. The more sophisticated carrier solutions will do that. They’ll look at every call, they’ll get an assessment in real-time, as to whether the analytics company like Hiya thinks this call has a risk of being a spam or a fraud call. Their skill set is not necessarily the best in machine learning—they’re big network companies. And so they look to companies like Hiya to help.

We can’t always attach a label or an assessment to a phone number because these callers are moving around among different phone numbers. So we need to look at their tactics more than phone numbers. We have continuous learning in our machine learning platform that watches for changes in those tactics. Those tactics might be easy to grasp onto—like the time of day that callers are calling among other things—but it’s also tactics related to where they’re generating their calls from. We can see some information about where the calls are getting on the telecom networks, the originating carrier of the call, and some aspects of how they look in the network.

What about legitimate callers?

We can’t falsely label good calls. We survey both consumers and enterprises, and 40% of our businesses really prefer the voice call for customer care, closing sales, and scheduling appointments. And the other side of our business is more around identity. Hiya Protect was oriented towards our carriers, protecting our consumers from spam. And Hiya Connect is business oriented, helping enterprises work within this situation, by branding their names on calls.

If a number is legitimately being used and then we see an anomaly, it may be because that call is not actually being made by the owner of the number but is instead by an impersonator that is spoofing the number—we can detect that it looks different from the normal usage. And the scammers may change their tactics over the course of the day to be generating their calls from this platform. Typical machine learning systems have data scientists that are looking at historical data and training machine learning models, putting them into production, and then using those to determine whether a given call is spam. There is a lot of drift away from what’s actually happening right in real time. So adaptive AI is real-time learning that helps us keep up on an hourly basis with the tactics that we’re seeing.

Are scammers also using AI?

I can only speculate there, but from the level of sophistication we see, I would say that they are. I’m not sure if it’s necessary for them to make a huge investment in that yet, because of some of the weaknesses in the telecom system. I think they’re still able to be successful without going that far. But since it is a huge business—there are estimates around the loss of dollars: 10 to 20 billion in the US, 3 billion globally. Those are big numbers. And we expect that scammers will continue to invest and become more sophisticated.