Shaping unstructured data : Heather McKenzie

Shaping unstructured data.

Heather McKenzie examines how the buyside is using alternative data to make better decisions.

For investment managers, consistently generating a strong performance has been a challenge since the global financial crisis. Many firms are using the same data sets, making it difficult to identify sources of alpha that have not already been found.

This has led to a growing interest in unstructured, or alternative, data. Social media, speeches, news stories, television broadcasts, press releases, presentations, websites, internet of things sensors, proprietary databases and government sources all generate enormous amounts of data that hedge funds in particular are picking through to find a competitive edge.

Social media has become a significant channel not only for social interactions, but also for political communications, company and financial news and commercial marketing. According to social media management company Hootsuite, 59% of users regularly get their news from Twitter and 187 million people use the platform daily. Facebook has 2.74 billion monthly active users and reaches 59% of the world’s social networking population.

According to analysts Greenwich Associates, the early adopters of alternative data were the most sophisticated quantitative hedge funds who had the expertise and resources to take in the often unstructured data and incorporate it into their investment models. It did not take long, however, for more traditional asset managers to look to alternative data as a source of investment insights. Greenwich Associates research showed that 50% of institutional investors planned to increase their usage of alternative data during 2019.

Among the various types of alternative data available, web-scraped data, which is harvested from public websites, is the most popular. Software programmes access targeted websites and collect and store the scraped information on a periodic basis. In some cases, vendors will use public APIs as a way to access the data within those pages directly without visiting the actual website.

The types of data ‘scraped’ includes job listings (a company increasing its hiring is likely to be growing), company ratings and online retail data. As Greenwich Associates points out, with around 4 billion web pages and 1.2 million terabytes of data on the internet there is a mountain of information that can be valuable to investors.

Data company Refinitiv estimates that 80% of data is unstructured data. However, this data needs to be turned into structured content for investment research.

Refinitiv Labs uses artificial intelligence, machine learning, natural language processing and textual data to bring meaning to unstructured data. One example is using unstructured text and machine learning to assess a company’s credit risk and default probability.

Refinitiv’s model uses StreetEvents conference call transcripts, company filings, the Reuters news feed and selected broker research. Each of the document types are dealt with differently because the language used is different, depending on whether a lawyer, journalist or sell-side analyst created it. Text from the document types is transformed into a profile where companies are ranked, based on the percentage probability of default over the next 12 months. Refinitiv says its blend of traditionally structured fundamental data, unstructured data, AI and alternative data sets reflects how the investment industry is beginning to work with these resources.

Unearthing value
Being able to contextualise unstructured data is the key to unlocking its value, says Sam Sundera, head of future business at infrastructure and post-trade services company SIX. “SIX spent a lot of time looking at unstructured data and there are many alternative data providers out there, but none of the data sets is stitched together. SIX has a single data model and database, where everything is linked so we thought hard about how we could integrate unstructured data into that.”

SIX has worked on a proof of concept (POC) with a large bank, augmenting its private banking data that has been uploaded onto Google Cloud with SIX’s security and entity level data. The POC has since been extended to other institutions. “None of the alternative data sets that banks have are very valuable if they sit in silos on their own. They have to run in models that have meaning,” adds Sundera.

Earlier this year SIX took a majority stake in Canadian company Orenda Software Solutions, which operates an AI platform specialising in environment, social and governance (ESG) and alternative data sets.

Sundera says Orenda’s sophisticated AI platform can capture and analyse a range of unstructured data including Twitter data, news announcements, press releases etc. The platform provides insights and quantifies public perceptions. For example, Orenda’s platform was able to track a significant fall in public sentiment following allegations of slavery at a clothing manufacturer long before a fall in the company’s share price.

Russell Dinnage, head of the capital markets intelligence practice at GreySpark Partners, also believes contextualisation is an important element in extracting value from unstructured data. “Unstructured data is just another source of information for many firms. Whether information comes from the LSE or from satellite data of truck movements at Chinese factories, it is all data to hedge funds. Buyside firms are trying to get to the point where this data is a streamlined component of their overall trading system that presents them with information that is out there that might affect their trading decision.”

Sustainability
The burgeoning ESG investment market has provided another platform for analysing social media data in Refinitiv’s MarketPysch ESG Analytics. Based on natural language processing (NLP), the platform locates, filters and scores ESG content pertaining to specific companies as well as cities, regions and countries. It covers more than 2 million articles and posts each day in 12 languages. The engine excludes corporates press releases and websites and regulatory filings.

Richard Peterson, CEO, MarketPsych, said at the launch of the platform: “Through the lens of this data, our clients can explore how media perceptions and corporate behaviour impact business performance over time. For example, we have found that the share prices of companies with higher Workplace Sentiment scores significantly outperform their peers, and it appears that happier employees generate more value for shareholders.”

While there is a plethora of unstructured data available to financial firms, Rezwan Shafique, head of consulting UK at Delta Capita, says many even large organisations still struggle with leveraging all of the internal data they have on a particular client. “Many institutions are trying to get to grips with this,” he says. “People understand the value of unstructured data and how to consume it and the technologies exist but doing it is proving to be more challenging than it ought to be.”

One reason it is challenging, he says, is the regulated nature of the financial industry. While a firm such as Cambridge Analytica demonstrated how unstructured data could be exploited, financial institutions are under a regulatory obligation to explain what they are doing when they harvest data. “Increasing regulation, particularly of hedge funds, is making it more difficult to extrapolate insights from unstructured data en masse,” he says.

Successful sellside firms, he adds, are not just looking at hard data and graphs, but also at how to make business easier to do. “Unstructured data is great, but it will be powerful only if you know the questions to ask, to make it actionable intelligence. Sell side firms want to take all of the data they have on a client and use it to create a much more powerful relationship. But it is a struggle.”

©Markets Media Europe 2021

Buyside focus: It’s all about the data 

Heather McKenzie explains why robust frameworks are key in unlocking the value of data. Although...

Laying bare the risks

Jannah Patchay examines the digital solutions that can be applied to the ESG data...

Research: Spinning the truth

Lynn Strongin Dodds looks at how analysts are trying to hold companies accountable for...

Buyside focus: It’s all about the data 

Heather McKenzie explains why robust frameworks are key in unlocking the value of data. Although...

Laying bare the risks

Jannah Patchay examines the digital solutions that can be applied to the ESG data...

Research: Spinning the truth

Lynn Strongin Dodds looks at how analysts are trying to hold companies accountable for...