Data publishers love structure. Very granular fielded data allows their customers to locate very specific things — for example, all investment banks in Boston that invested in green energy start-ups — instantly. Fielded data allows quick apples-to-apples comparisons, lets you know more about your prospective customers, and makes data analysis efforts far easier.
Unstructured data is, well, a mess. To find anything of value in it you need to either structure it (via an industrial grade tool like Khemeia) or you need to search and process it to find the diamonds of data buried in the landslide of information. Until now, the searching and sifting process has been an arduous and time-consuming one that has not made “on-the-fly” searching of large datasets a viable alternative to the parametric searching of a structured database. All the activity around semantic search, however, may be changing the landscape.
Enlyton, for instance, offers an approach to search that bridges the gap between structured and unstructured data. It can:
- Associate related news from articles/newsfeeds, blog/Facebook posts, tweets, etc., with a structured record. This gives the user the ability to find the “right” company record quickly and to see all the latest news mentioning the firm, its executives, or its products in reverse chronological order without having to actually capture, field, tag, and store the news information.
- Find a needle in a haystack better than most search tools, making it ideal for building structured databases of company executives from scratch.
- Monitor the Internet for mentions related to a company record, person, or event. In other words, Google Alerts on steroids.
- Act in real time, including defining search criteria, defining target data sets, and calculating results.
- In some cases, obviate the need to set up a robust, formal structure — a time-consuming and complicated process, especially when dealing with growing and changing data sets.
All of these things don’t add up to the death of structured data, but the cost savings of unstructured data and the challenges of extracting meaning from ever-growing sets of “big data” may mean that searchable unstructured data will play an increasingly prominent role in how information services are designed.