Advisors at your service 24/7


Calculate Price

Get a 10 % discount on an order above $ 100
Use the following coupon code :

Information Technology









Almost all organisations store their data within databases or spreadsheets applying some sort of structure to keep them organised. This way they attempt to keep information readily accessible. No doubt it is easier to gain information from such structured data by using data mining tools. However, relying only on the structured data in order to cope with real world issues is not enough.


IDC Health Insights is a research based advisory and consulting firm that uses structured data stored on Electronic Health Records (EHRs) and Computerized Physician Order Entry (CPOE) system during diagnosis and treatment programs. With the help of structured data, medical practitioner can advise what the disease is and what the treatment should be but they are not able to identify the reason why such disorders occur in first place. The missing key information in this scenario lies within the communication, notes and narratives present in unstructured format (Fledman et al. 2012).


“Structured information can give you answers to questions that you already know to ask, but what unstructured information can tell you is the answer to questions you didn’t even know you needed to worry about. It lets you know what you don’t know”, clarifies Scott Spangler, a chief technical expert in text mining and software development at IBM Almaden Research Center and co-author of ‘Mining the Talk: Unlocking the Business Value in Unstructured Information’ (TAKMI 1997).

There are unlimited sources of potential data generation (which are discussed later in this paper) and it is not necessary that the data so generated would be in a structured format. So the problem is identified as how to make the use of unstructured data and why is it necessary.







A research project is carried out in order to answer some prevailing questions centred on a topic. The aim of this paper is to answer the following questions:


3.2.1          What value is there in unstructured data?

3.2.2          What is Text Analytics and is it a technology which can be used to make use of unstructured data?








Data is being created every moment. Whether someone is writing an email, conversing over a telephone or creating a blog, recording an event or citing an article, more and more data are being built up. To put a number, IBM experts agree that 2.5 quintillion bytes of data is created every day, including highlights that 62 billion emails being sent every day and existence of over 100 million blogs plus a new one being created every second (Mastering new challenges in text analytics 2010). Here, the questions arises, are these indicators positive? Grimes (2008) announces that it is possible to obtain information from ambiguous resources. This huge data amount of is referred as Big Data, among which almost 85% is in unstructured text format. Unfortunately without the data being structured, it is neither possible to predict the speed and volume of data growth nor identify the type of information that it contains (Making Sense of Unstructured Data 2011).


In 1986, Seton Healthcare Family, a healthcare facility in Texas and IBM’s Watson technology conducted a pilot project to detect and predict cases of Congestive Heart Failure (CHF) for personalized longitudinal management of high-risk patients. When the results were out, it amazed both the companies, as they discovered that the unstructured data was more precise and more complete than the structured data (Fledman et al. 2012).


Organisations are monitoring and listening to what the user might have thought a random comment on Facebook or a simple tweet on Twitter. On the basis of those comments and/or tweets they are creating new action plans and business strategies. Grimes (2013) believes that such social media monitoring has improved functional areas such as customer service, reputation management and crisis early warning. Analysis of social media monitoring has made it easier to discover patterns, sum up distinctive opinions and determine trends. Moreover, blogs, reviews and ratings are of a data type called unstructured user generated content (UGC) that can be utilised for activities such as travel recommendation via landmark and tourism attraction mining. The authors Jiang, Liu, Xiao & Yu (2012) mentioned that UGC served as contextual information when a tourism recommendation service was built by mining landmarks from geotagged photos in photo sharing websites such as Flickr.


To make use of such great a volume of data, it is necessary to identify what exactly is important in the text and what is not. However, as the growth of data is speeding up to “zettabye”, it is becoming impossible to manage data with conventional database management tool. The need of management for such enormous yet valuable unstructured text data has been realised by most industries. They have initiated the use of the tools and technology of text analytics to improve Customer Experience Management (CEM) and gain insights. In fact, a study in 2012 by IBM revealed that 73% of 1700 CEOs made decisions to invest heavily in 2013 for customer insights. (Big Data and Text Analytics: Solving the Missing “Big Language” Link n.d.). Grimes (2008) believes that unstructured sources coupled with text analytics can be utilised to create comprehensive business-intelligence programs, which are crucial component of enterprise competitiveness.


With so much value yet so difficult to organise this massive amount of unstructured data, the data has become both an asset as well as a liability (Making Sense of Unstructured Data 2011).




Text analytics has been a tech catchphrase for some years but according to research performed by Anderson in 2011, it seems that apart from its early adopters i.e. the defense, intelligence, finance and pharmaceutical industries, the technology is still being discovered by others. Basole, Suess and Rouse (2013) have mentioned that there is tremendous potential for text analytics but unfortunately there has not been extensive use yet.


Text analytics is a dynamic technology designed to manage Big Data. It has the ability to classify unstructured text data. Lamont (2014) describes a situation where a website user is looking for a product or answer to a question but is unable to do so due to some unidentified issues. With the help of text analytics, an information system “can determine whether the issue is a navigation problem, a content problem, a support problem or a product problem,” states Seth Earley, CEO of Earley Associates, a consulting firm that provides services related to content management.


A definition offered by Kent (2014) is that, “Text analytics is the process of deriving high quality information from text by turning text into data through the use of machines and software”. Text analytics is viewed as a process that coverts random text into meaningful data. This process pulls out the real information from text, which were categorised as unstructured data and converted to a sensible structured data.


Lamont (2011) describes text analytics as a process that extracts information from huge volume of documents. Documents are classified via linguistic and statistical techniques to conceptualise and create relationships within them. She has also mentioned that a linguistic technique is necessary to find out all the alternative meanings a word could have and a statistical technique is needed to calculate frequency and proximity of words. Despite having similarity with the traditional view, she brings light to the importance of linguistic and statistical techniques.


A much wider view given by Grimes (2014), the text analytics expert, is that “Text analytics goes beyond counting words to extract meaning and gives some context to that meaning. It often can give you the why of something that has occurred, where data analytics can tell you what has occurred”. Compared to the traditional definition, Grimes’ definition on text analytics seems to be driven more towards finding insights from data, not just the meanings of data. The focus of text analytics is centred not only to what is created but reveal why it was created on first place. Grimes says that from making e-discoveries to keeping an eye on social media or to provide best customer experience or a technical support, the sky is the limit for text analytics.


A similar view on text analytics is expressed by ClearForest Corp, which provides tools for data analysis and visualization. This technology extracts quality information from unstructured data to refine meaning from the text and create structured data, moreover this technology not only refines meanings but also gives context to that meaning (ClearForest Corp n.d.).


IBM Business Analytics (2010) defines text analytics as “a method for extracting usable knowledge from unstructured text data through identification of core concepts and trends and then using this knowledge to support decision-making.” This definition of text analytics has highlighted an important factor that text analytics will now also help to make decisions from the processed unstructured data.


Although there is not one best definition of text analytics, Lamont (2013) says it is viewed as a versatile technology and Bulik (2008) agrees that text analytics technology is recognized as a facilitator that interprets data to create sensible outcomes.




Before getting into the detail on text analytics processing, it is necessary to understand the concept of text mining. Witten (Text Mining, n.d.) describes text mining as a system that performs analysis of huge amount of natural language text. The process of analysis is further enhanced as the discovery of lexical or linguistic usage patterns are detected to mine valuable information. According to Grimes definition “Text mining is a tool that lets companies directly understand stakeholder views – the voice of the customer – about their products and services, and also those of their competitors” (Hosford, 2008). This shows that text analytics is the application of text mining techniques. Business can gain valuable insights from text-based contents, but the process of extracting these important figures is very challenging. Text mining deals with the natural language text, which is rather inconsistent in nature, so the challenges grow even higher when there is use of slang language, ironic expressions or language that is specific to a certain group or trade. So how text analytics makes the job easier is it gives numerical values to the words and phrases in unstructured data which is then linked with structured data stored in a database. By doing this, analysis of content-specific values such as sentiment, emotion, intensity and relevance has become much effective (Top Data Management Terms to Know n.d.).


Moreover, with the transformation of unstructured data, business’ gain major benefits in numerous areas. Keeping an eye on brand reputation had never been easy, now with the help of text analytics solutions, organisations do not miss a thing regarding monitoring and analysing their brand reputation in the market. Unstructured data from surveys, claims records, blogs, social media, online forums, support calls and more are transposed and linked with structured data in such a way that organisations can predict customer purchase behavior. This further helps identification of prevailing and potential issues with the product and/or services offered by businesses. Business can use text analytics solutions for overall customer experience management such as improvement in customer service, understanding and summarizing the results of surveys, reviews and feedback as well as customer retention. Apart from customer experience management, with the help of text analytics, organisations can develop strategies to improve upsell and/or cross-sell by creating best of the best offers to their customers as well as identify and prevent losses from claim frauds (Understand the Voice of the Customer).



The technology behind the processing of text analytics as mentioned by Kent (2014) involves:


“Word sense disambiguation is the ability to determine the meaning of a word with multiple definitions, often from context; an example is the word ‘well’.


Part-of-speech tagging is assigning to and labelling each word with a part of speech, such as noun, verb, adjective and adverb, using software.


Word stemming produces the semantic root of a word by applying heuristics, for example flying becomes fly. A word stemmer will look for both forms in text.


Lemmatization is similar to word stemming but uses a more sophisticated and rigorous procedure that will find word stems with greater accuracy.


Name entity extraction looks for and labels things such as people, places, organizations, dates, forms of money, and companies, in text.


Relationship extraction identifies how things are related. Two examples would be ‘Bakers make cakes’ and ‘John lives in London’.”




Text analytics works with text and text only. According to José Carlos González, founder of Daedalus, which is a Spanish Company specialising in the automatic extraction of meaning from text, there are many industries that are interested in text analytics solution, where opportunities are coming up constantly from almost everywhere for text analytic tools with new abilities as well as low costs. The main challenge is to provide text analytical solution to companies in a fast and inexpensive way (Grimes 2015).


Sid Banerjee, CEO of Clarabridge, a firm that develops text mining software to deliver end-to-end solution to transform customer feedback for marketing, service and product improvements, mentioned that the market of text analytics is growing with the growth of text analytics vendors. Text analytics vendors now are integrating text analytics solutions to their product mix. Social Customer Relationship Management (CRM) and social media marketing vendors, companies who earlier were only focusing on social communication and marketing automation processes in recent years are showing more interest in introducing text analytics capabilities into their product mix (Grimes 2014).

Kent (2014) mentioned in her paper that Bank of America’s Senior Vice President, Allen Thompson is using text analytics to listen to the voice of customers in order to retain existing clients. His team is highly specialised to examine every form of customer response from emails, phone conversations, letters and surveys which are used to visualise customers’ behavior pattern for future reference. He believes that he can generate a leading indicator with the application that will be helpful in text analytics.

Tom Anderson of Anderson Analytics and OdinText, shares his view as a win-win approach on the present scenario of text analytics. Anderson Analytics and OdinText have been providing customer service professionals with a platform of text analytics software to generate insight to the customer behavior and pattern. He believes customers have now become more aware of their products than in the past, which in fact a good sign to save energy and time. Previously they had to start from the basics. They had to begin with the explanation on what text analytics was and what was its significance to its users. They had to make clear what text analytics could do and what it would not do, as customers came up with totally impractical and irrelevant prospects with text analytics. Today his customers know what their real needs are and are eager to know how text analytics solutions can be customized in their software to get higher results (Grimes 2014).

UNC Health Care is one of the best examples of organizations that makes the use of unstructured data by text analytics tools. They collect unstructured data from their doctor’s diagnosis observations, patient’s registration forms, doctor’s discharge records and doctor-patient phone conversation notes. The use of text analytics tools with natural language processing helps UNC Health Care to design phenomenal courses for highly risked patients ().


Another example would be US government, who has been reviewing the massive clusters of unstructured data from NASA’s past records. Mostly the pilot reports and mechanics’ records put into study to find out impending crises with flights and also to sense any evidence of terror campaign and biological coercions existing in the social media (Kamauff 2014).


LeClairRyan, a law firm offering corporate and litigation services has extended their services to e-discovery collection and its review as well as production. The firm uses Relativity, an e-discovery software solution that reviews and manages electronic as well as paper documents. A recent upgrade on Relativity has incorporated text analytics to its platform. Due to identification of relevant documents and potential areas of risk has become much easier with the text analytics’ capability of document clustering (Lamont 2011).


IBM uses LanguageWare and IBM Content Analytics (ICA) that are highly flexible and extensible and known to capture the knowledge of business domain experts into syntax and semantic rules. Both solutions allow Information Extraction that is fully able to be tailored to work seamlessly for logical reasoning from natural unstructured data and offer Entity & Relationship Recognition for classification of words and phrases into business categories (Discover the Hidden Value in Unstructured Information).




Dow Chemical Company and Union Carbide Corp. were merged in 2001. When these companies were merged into one, they had to merge their document management systems as well. Union Carbide had 35,000 research and development reports to integrate with Dow Chemical’s existing volume of data. To manage this newly acquired volume of data, Dow Chemical then partnered with ClearForest Corp., which provided an advanced text analytics solution for document integration. The highlights from the solution were documents classification and identification of key indicators such as chemical components, core products, associated firms and individuals. The results from this integration were outstanding. Dow Chemical enriched its system with Union Carbide’s 80 years’ worth of research. In fact, they were successful at accomplishing this huge task by spending approximately $3 million less with 10% to 15% reduced data errors and saving 50% of entire document integration processing time (Fan, Wallace, Rich & Zhang 2006).


Another supporting example would be identification of fraud claims in insurance companies. Insurance Networking News revealed that the mechanism that were used to investigate insurance claims gave a false positive rate of 95 percent. This means among all the claims that were made, only 5 percent was identified as fraud, and the rest 95 percent was considered authentic. Then, they investigated the same claims using predictive modeling without text mining. This time the false positive rate was reduced to 73 percent. Again, they made investigations on the same claims, but this time there was use of predictive modeling with text mining. As a result, the false positive rate was significantly reduced to 46 percent, increasing the figure of actual fraud claims by more than half of previously analyzed results (King & Malida 2014).









Get a 10 % discount on an order above $ 100
Use the following coupon code :

Category: Sample Questions