Digital Knowledge – Overcoming the Challenge

May 30, 2008

Capturing, sharing and retaining knowledge a critical business activity. A firm’s expertise and knowledge is spread across a wide range of IT systems such as document management, case records, archives and e-mail, but more importantly is ‘hidden’ within the know-how and insights of the firm’s people and teams. In fact, historically, it is this need for information and knowledge sharing that has been one of the key drivers for investment in document management and related technologies.


The Challenge  

However, document management (and most related systems) are not designed to harness the intellectual property of organisations and have tended to lack the ability to re-utilise information and exploit people’s expertise for productivity and efficiency gains, especially of the fee earning staff.  This is where genuine enterprise search technology comes into play.


At its core, what makes harnessing knowledge from a firm’s digital repository particularly challenging is that the majority of the content and intellectual capital resides in the form of documentation and correspondence, which by its nature, has not been created to be easily found. Also, the content tends to be incomplete and is not linked to other content as it is in more search friendly environments, (eg the Web).


In addition, within billing-driven work environments, users are often looking for an exact and specific answer to their query, such as,’ where is the correct and most recent case note?’; rather than an answer offering a number of options, as is the case in Web-based search. For instance, a search on the Web to the question, ‘where can I get a quote for a holiday cottage in Norfolk?’, will yield 365,00 results.


Further, very often employees need to search for their colleagues’ expertise within their organisations with the aim of finding out who may have worked on a certain type of case before, in order to leverage that experience. This makes the task of searching digital information to mine for knowledge even more complex.  


As a result, enterprise search is a complicated discipline and most traditional enterprise search technologies struggle with this complexity. This is perhaps why they have fallen short of their promise.


The good news is that there has been a major change in the mechanics of enterprise search solutions over recent years, making effective mining of digital knowledge a realistic and affordable goal. There is a new breed of technology that is challenging the traditional vendors by massively driving down both the complexity and cost of deploying such solutions.


Organisations looking to implement enterprise search technology should be aware of three key areas that are critical to a successful deployment and drive supplier selection accordingly.


Overcoming the Choke Points of Enterprise Search


There are three deployment and cost choke points for enterprise search to overcome.


Connectivity is key. For an enterprise search solution to access all the information that exists within a firm, make sense of it and present it in the most relevant manner to a user, connectivity of the search engine to information systems, be they applications, legacy or proprietary, is critical.


The search engine should be able to connect in a non-intrusive manner, eliminating the need for any kind of manual coding, reformatting or re-purposing of content. This not only ensures a faster and more secure implementation, but also helps contain the cost of future deployments to new or additional data sources.


The new breed of solutions today offer out-of-the-box connectivity to a number of applications and technologies including Hummingbird, SAP, eRoom, Worksite, Documentum, Siebel, Sharepoint, Oracle Applications, My SQL, RSS feed, Web Crawler and MS Exchange, to name but a few.


Furthermore, connectivity brings with it the issues of information security and search relevance. It is vital that the search solution adopted is fully integrated and compliant with the existing security and access controls to a wide range of sensitive information such as confidential client case materials. This means that the security checks need to be carried out in real-time before the search process commences, so that the search engine can produce results that are not only most relevant to users, but also only deliver the content they are authorised to see. Security support for Active Directory and LDAP, which is brokered alongside application level security or user role-based security, is essential.


Search and indexing techniques are not all made equal. There are fundamental differences in approach between the more traditional solution providers and the market challengers. Many traditional enterprise search engines have their roots in mathematics and/or statistics, with most based on inference, a probability approach built on the principle of using past events to predict future events. However, the deployment of solutions based on these approaches requires significant and continuous manual intervention to fine-tune data sets and to maintain relevancy of results. This has an impact on implementation timescales and significantly increases the maintenance cost of the technology.


The newer solutions are based on the combined use of semantics and linguistics, alongside statistical, structured and syntactical analysis. These methodologies help drive far better relevancy on search results out-of-the-box, making enterprise search solutions easier to deploy with significantly shorter implementation timescales and lower maintenance costs.


A true semantic-based engine will process each piece of content and subsequent queries and place them within a vector in an ‘n-dimension’ space.  A search in a semantic index is meaning-based, so the query terms may not feature word-for-word in the retrieved document. This ensures better search relevancy.




Mapping this technique on to a vector diagram, the distance between the query and related topic highlights the relevancy of the search, i.e. the closer the query to the topic, the more relevant the search. A semantic-based engine is able to adjust this distance to ensure the content selected is most relevant to the query terms. For instance, these vector diagrams illustrate the ‘semantic space’ for two different interpretations of the term ‘glass’.  In the top diagram, ‘glass’ was used in the expression ‘Drink a glass of juice’. In the bottom diagram, the expression used was ‘Resistant glass’.


This is a more intuitive way for users to search and does not rely on heavy duty manual tuning to drive relevancy of results.


A linguistic and semantic approach also allows for extraction of key elements from within content, such as people, concepts and companies. These can later be used to orientate search results allowing users to quickly drill down to the correct answer or to find the right person to call.


And finally, deployment of enterprise search technology should not require investment in non-standard hardware or software. Most of the newer solutions reside on standard hardware and support a wide range of standard platforms. This too significantly lowers the total cost of ownership.



The crux of the matter is that enterprise search is not simply an information retrieval process. Due to technical complexity and cost restrictions, often legal organisations have often deployed search in a limited fashion, if at all.  Newer search solutions offer much wider scope and allow for a broader, more strategic application of search across the business. Utilised to its full potential, it can expose hidden expertise, promote collaboration between stakeholders and assist with business development processes, delivering significant competitive advantage.


Colin Hadden is Managing Director of Sinequa UK.