| |||
|
Wednesday November 28 2001 UDDI - The Weather Report The Outlook is Mixed Over the next few years many tens, possibly hundreds, of thousands of Web Services will be made available across the world. Many of these will find their way onto the UDDI registry, a registry which when it delivers on all of its promises will revolutionize the way we do business with others by allowing us to search in a systematic way for help, contacts and functionality (Web Services) that other companies supply. UDDI is a great idea and the registry is undoubtedly industry recognized as the direction of eBusiness for the foreseeable future. Therefore you may find it surprising to know that there is a potential and largely unknown flaw, which if left unsolved will over the next few years make finding information within UDDI difficult, time consuming and in some cases impossible. That flaw is data Inconsistency and Inaccuracy. Defining The Problem The UDDI has one main industry wide registry, which is jointly run by possibly the largest consortium of companies to ever agree to manage the same project at the same time. However... 48% of the production UDDI registry (tModels tested only) has links that are unusable. These pointers contain missing, broken or inaccurate information. Why do inaccuracies occur within UDDI? Think of UDDI as a newsgroup where people add messages about what they sell, we all know that in reality what happens in newsgroups is that moderators have their work cut out for them attempting to regulate misuse and duplication. Therefore in the largely unregulated arena of the UDDI registry it's not surprising that duplication and data inconsistency has already found its way into a significant amount of the information it contains. The UDDI design has been concentrated around creating a database registry that caters for very large quantities of information, this possibly at the expense of data and Web Service validation. Another consideration is the fact that Web Services and SOAP are still emerging technologies, not all the questions have been answered, and certainly the term Web Services is still misunderstood as a concept for running functionality over the Internet. It is therefore not surprising that individuals are unclear about what Web Service URL pointers that describe remote functionality are (WSDL files). What affect will inaccurate data have on UDDI? The UDDI project has been designed and developed using advanced features to combat disaster recovery with enormous databases and is capable of storing millions of pieces of information about companies and individuals. As the registry grows, it will become progressively more difficult to filter out data inconsistencies. Imagine a production registry with 1,000,000 http links in it with nearly half of which are of no use. This will mean that one out of every two searches you perform to find a supplier or business partner will end up with invalid information and having to try again. If we had this type of response from an Internet search engine then I suspect we'd all stop using it. Research to Prove the Extent of the Problem We'll now prove the problem exists by showing you the joint research by SalCentral and WebServicesArchitect that was performed over the last few weeks. A browseable version of this research is available on-line at http://www.salcentral.com/uddi/default.asp. At SalCentral we run the largest Web Services directory outside the UDDI project, during our past twelve months of existence we have become aware of how developers perceive Web Services. In many cases some kind of support is needed to make sure that firstly Web Service URLs exist, and secondly that they do what they say they're supposed to do. Not a new idea but undoubtedly more important within an emerging technology, when individual ideas differ. This data sifting takes time, but makes sure that our information is displayed with as few errors as possible. It occurred to our team one day: does UDDI sift the information going into the registry? Using the Microsoft UDDI SDK we first of all extracted ALL tModel records from the UDDI production database. tModels are used to store links to Internet resources such as Web Services, Web Sites etc. in the form of http: locations. We then took every http: location and checked some simple information:
Research Notes The following results are a snapshot of the UDDI registry taken on 16th November 2001. It is important to remember that they signify a single point in time and do not show whether there is an improving or worsening trend. The intention of this ongoing research by SalCentral and WebServicesArchitect is to update http://www.salcentral.com/uddi/default.asp on a weekly basis to enable visitors to see any trend materialize. In addition you should note that our research concentrated on tModel links. These provide links from Organization information held within the registry to Web sites, pages, Web Services etc. This, however, is not the only useful information contained within UDDI; in fact the organization information itself has email and contact information for a company and can therefore be used to locate businesses or individuals. tModels use their URL locations to signify additional information or resources that they describe. It's important to note therefore that these URLs are optional and if the tModel can sufficiently convey it's meaning through its name and description then a URL is not required. However during our research we checked all 296 blank URL tModels and have found that not one actually conveys its meaning in this way. Therefore they have all been counted as errors where tModels should of supplied an Internet URL and did not. Category Research Summary The below table shows a breakdown of the entire 1581 records (tModels) contained within the UDDI Production registry. This categorization is meant to show how the registry is split into tangible areas, such as "Web Services" and "Downloadable Documents". Categorization is performed by looking at the tModels URL and positioning it by using its suffix, for example .WSDL is in the "Web Service" category and .HTM is in the "Named Web Page" category.
Comments on Research One of the more interesting figures above is the "Web Service" category. The UDDI registry is being strongly denoted as the main registry to contain Web Service locations; it is therefore surprising that the above figure tells us the current uptake has been poor. This, however, simply reflects two significant situations in the industry, first that at present Web Services are mainly being developed behind firewalls for internal corporate use, and second that ASP.NET and many SOAP platforms and servers are only in Beta stage and few are willing to produce commercial systems until the full product is available. In addition this "Web Service" category becomes significantly worse once we check each URL and divide them into what was available and what was not, as we have with the following chart.
Note that these "Web Service" category values do not include information concerning whether each Web Service was working, only whether we could access the Web Service URL location in the tModel URL. It is the intention of our on-line research to cover this information soon. Error Research Summary We scanned through the entire database and checked every tModel URL to see whether it was formatted correctly and whether it was available across the Internet.
Comments on Research Really the big confirmation in these figures is what has been suspected for a while, in that currently data within the Production UDDI has significant data inconsistencies and even though the Test UDDI registry exists there is no enforcement of which registry an individual or organization chooses to use. The following chart simply summarizes these values for us:
Conclusion This article has come off the back of the groundbreaking research we've done on the UDDI registry. This information is not commonly known in the industry and in fact the research contained at http://www.salcentral.com/uddi/default.asp is certainly the first of its kind that has been made publicly available. Though data inconsistency is not uncommon, especially in emerging technologies, the concern is to the extent of the current problem. With 48% invalid data and only 1600 records it is certainly manageable. Over the next few years, however, the registry is expected to reach 1,000,000 entries. If these statistics remain unchanged it would mean that 480,000 entries would contain inconsistent information, a position that would be difficult or even impossible to rectify. In reality we don't expect 100% pure data, however we should expect a higher percentage than is shown in the current trend. I believe it proves, however, that the requirement for Value Added Service Suppliers (VASS) is essential to the general day to day working of the registry. Value Added Service Suppliers provide the same function as Intermediaries. In the short term, something urgently needs to be done about data consistency on the UDDI registry. The release of this article is not meant to disillusion people about UDDI, rather the contrary. As UDDI becomes widely adopted, these problems will be given more attenion, now that we are aware of the problem at an early stage. Readers of this article may also be interested in
reading Using
UDDI as a Search Engine, Keep up to date with all the new articles and features on
Web Services Architect: |