Read an Excerpt
Professional AlfrescoPractical Solutions for Enterprise Content Management
By David Caruana John Newton Mike Farman Michael Uzquiano Kevin Roast
John Wiley & SonsCopyright © 2010 John Wiley & Sons, Ltd
All right reserved.
Chapter OneIntroducing Alfresco
WHAT'S IN THIS CHAPTER?
* Understanding Alfresco and its uses
* Looking at the origins of Alfresco and its place in the ECM industry
* Using Alfresco in different scenarios
* Considering factors when implementing an Alfresco content application
* Exploring the importance of open source and community for Alfresco
Alfresco is an open source Enterprise Content Management (ECM) system. It was originally created in 2005 by a team from Documentum, including its co-founder, as an open source alternative to proprietary vendors in the $4 billion ECM market.
Alfresco manages all the content within your enterprise: documents, images, photos, Web pages, records, XML documents, or any other unstructured or semi-structured file. What makes Alfresco stand out are the services and controls that manage this content and features, such as metadata management, version control, lifecycle management, workflow, search, associations to other content, tagging, commenting, and much more. This allows you to find the content you are looking for in the mountain of information accumulating in enterprises and to ensure that it is accurate. It also enables you to present and publish information through the Web or any other channel appropriate to allow users to access that information.
For the End User
For end users, Alfresco appears as a suite of applications or extensions to their existing tools that manages their content. Alfresco exposes itself as though it were a shared drive to replace networked shared disk drives that have no organizational, search, or control mechanisms in place. Alfresco can replace networked shared drives with a store that organizes and controls information and provides a portal interface for searching and browsing content. By emulating the SharePoint protocol, Alfresco also helps users manage their office documents from within Microsoft Office by using the tools in the Office Suite designed to be used for Microsoft SharePoint. More importantly, Alfresco provides an out-of-the-box suite of applications to browse, search, manage, and collaborate on content in the repository. These applications include document management, Web content management, content collaboration, records management, and email integration. These applications can supplement and can be supplemented by new applications developed on the Alfresco platform.
For the Business
For the business, Alfresco is designed to support the content requirements of a number of business-critical processes and uses. The document management tools, applications, and interfaces support general office work, search, and discovery. The workflow management capabilities support numerous business processes, including case management and review and approval. The collaboration applications and services support the collaborative development of information and knowledge in the creation and refinement of content and documents. The scalable Web content management services support the delivery and deployment of content from the enterprise to its customers. The records management capability provides an affordable means to capture and preserve records based upon government-approved standards. The standards-based platform also provides access to applications that use these standards, such as publishing, image, and email management.
For the Developer
For the developer, Alfresco provides a full-featured, scalable repository and content management platform to simplify the development of the content-centric applications. Based on content management and Internet standards, Alfresco exposes the content management capabilities as services that can be accessed from REST-based or SOAP-based Web services, the new OASIS Content Management Interoperability Services (CMIS) standard Web-based services, or the PHP programming language. It can also be incorporated directly into a Java-based application with core Java services. In addition, Alfresco incorporates lightweight scripting languages that can access these services and provide a lightweight-programming model when speed of development is important. These services provide patterns similar to those used with databases, repositories, or user interface components, but have been extended for the unique challenges of content-centric applications (such as full text search and hierarchical content structures). Being open source, the platform is transparent, and the developer can peer into the internal repository patterns. Alfresco also provides a framework application that delivers much of what end users need, but can be extended by the developer for unique application logic and a customized user interface through Surf, CMIS, Web scripts, and Core Services.
For the IT Organization
For the IT organization, Alfresco provides a low-cost alternative to closed-source, proprietary systems from IBM, EMC, Open Text, Oracle, and Microsoft. Alfresco fits within the enterprise IT governance standards by working with virtually any database, application server, operating system, and system-monitoring infrastructure. By being a 100 percent Java application, the system is portable to virtually any hardware. A multi-tenant capability built into the core applications allows the IT organization to provision virtual instances of Alfresco systems on-demand and maximize the use of existing hardware. The Alfresco system's small size also means that it works well within existing virtualization platforms such as VMware and Xen. Alfresco can also be configured into clusters with built-in redundancy to provide high availability and disaster recovery. Most importantly, Alfresco's open source and open standards approach means that users are not locked into a proprietary platform.
WHAT IS ENTERPRISE CONTENT MANAGEMENT?
According to the Association for Information and Image Management (AIIM) - the leading professional group devoted to ECM - Enterprise Content Management is the collection of strategies, methods, and tools used to capture, manage, store, preserve, and deliver content and documents related to organizational processes. ECM systems use a repository, a number of different applications, and application development platforms to enable this control, access, and delivery of content. Content can be any unstructured information, such as documents, Web pages, images, video, records, or simple files. (Unstructured information refers to computerized information that does not have a data model. The term distinguishes such information from data stored in fielded form in databases or semantically tagged in documents.) The ECM system manages the content and its lifecycle the way a database management system manages data in a database.
An ECM system manages the actual digital binary of the content, the metadata that describes its context, associations with other content, its place and classification in the repository, and the indexes for finding and accessing the content. Just as important, the ECM system manages the processes and lifecycles of the content to ensure that this information is correct. The ECM system manages the workflows for capturing, storing, and distributing content, as well as the lifecycle for how long content will be retained and what happens after that retention period.
According to the research group Forrester, there are five main application areas of ECM: document management, Web content management, records management, image management, and digital asset management. These areas of content management have similar requirements of storage, organization, access, and processing, but each has a different organizational focus and different end users.
* Document management tends to deal with the capture, editing, and distribution of office documents and files.
* Web content management organizes an enterprise's Web site, Web pages, and Web publishing processes.
* Records management deals with the long-term archival or disposal of important documents and records as well as any compliance or regulatory action.
* Image management may deal with documents or records, but handles content in the form of scanned images. Image management manages the process of scanning, quality control, meta-data capture, and storage.
* Digital asset management is used primarily by creative and marketing professionals to handle the capture, creation, and editing of photos, videos, and illustrations.
By managing content in an ECM system, organizations are generally able to reduce costs of manual processing, increase the accuracy of information, and aid the search and discovery of important documents and information. Some of the benefits of using an ECM system include:
* Reduction of paper handling and error-prone manual processes
* Reduction of paper storage
* Reduction of lost documents
* Faster access to information
* Improved online experience for customers
* Online access to information that was formerly available only on paper, microfilm, or microfiche
* Improved control over documents and document-oriented processes
* Streamlining of time-consuming business processes
* Security over document access and modification
* Improved tracking and monitoring with the ability to identify bottlenecks and modify the system to improve efficiency
The Origins of ECM
The ECM market started in the late 1980s with image management vendors like FileNet (now owned by IBM) and in the early 1990s with electronic document management such as Documentum (now owned by EMC). Document management married the thenrelatively new relational database systems with electronic file management, integration with scanning equipment, and workflow management tools. These companies built platforms to develop applications for managing electronic documents in the emerging client-server enterprise application space and enterprise systems that were large in both scale and price. In the mid 1990s, as the Internet became important, Web content management vendors, such as Interwoven and Vignette, entered into the Web space adjacent to document management.
With the collapse of the dot com bubble in 2001, healthier companies started to acquire smaller players with overlapping functionality. As document management and Web content management vendors started to compete with each other and acquire new technologies, such as archival and records management, the ECM market was born. It was in this period that Microsoft entered with SharePoint. With the intervening years, there are now fewer, but much larger, companies in the growing ECM space. The ECM market is now estimated to be $4 billion according to the analyst firm Gartner.
In 2005, Alfresco started using the open source development and distribution model to spread ECM globally. This enabled them to address underserved parts of the market where existing ECM systems were either too expensive or too complex. At the time, existing ECM vendors put more effort into consolidating disparate product sets than creating new technology, resulting in more difficult development, deployment, and usability, which adversely impacted scalability and performance of these systems. Older technology design, built with no reusable components, meant that these systems were very large, incurred a high overhead for managing information and storing data, and were very expensive to develop and maintain, which affected the cost of the systems. Extending these systems became a major integration problem and applications built upon these systems could cost as much as ten times the sale cost of the core repository and system.
Alfresco built a different kind of system using open source development and incorporating open source components, such as Spring, Lucene, Hibernate, jBPM, FreeMarker, and POI. The Alfresco system incorporated the major applications of ECM - document, image, Web content, record, and digital asset management - but in an easier, more deployable package.
By adopting appropriate open standards, Alfresco generally fits in any enterprise environment. Alfresco's Content Application Server provides a platform for developing content applications in a number of different development styles suitable for the programming task at hand. By being scalable, Alfresco can adapt from small departmental solutions to large-scale Internet solutions.
The end result is a system that is lightweight, flexible, and easy to deploy, with a powerful set of development interfaces. Alfresco recently exceeded 2 million downloads of the ECM system with a large and growing community. Using the professional open source model, Alfresco now has over 1,000 enterprise customers.
The Alfresco system in many ways looks similar to other ECM systems. (See Figure 1-1.) At the core is a repository supported by a server that persists content, metadata, associations, and full text indexes. There is a set of programming interfaces that support multiple languages and protocols upon which developers can create custom applications and solutions. Out of the box applications provide standard solutions such as document management, records management, and Web content management.
However, because Alfresco has been created relatively recently compared to other ECM systems, it has been able to take advantage of a more modern architecture. The Alfresco system has grown organically as an entirely Java application, which means that it runs on virtually any system that can run Java Enterprise Edition. At the core is the Spring platform, which provides Alfresco the ability to modularize functionality such as versioning, security, and rules, among other things. Alfresco makes liberal use of scripting to simplify adding new functionality and developing new programming interfaces. This portion of the architecture is known as Web scripts and can be used for both data and presentation services. Alfresco has kept the architecture lightweight to make it easy to both download and install and to be able to take advantage of new packaging and deployment options such as in the Cloud.
The Content Application Server and the Repository
At the heart of the Alfresco system is the Content Application Server, which manages and maintains the Content Repository. The repository is comparable to a database except that it holds more than data. The binary streams of content are stored in the repository and the associated full-text indexes are maintained by the Lucene indexes. The actual binary streams of the content are stored in files managed in the repository, although these files are for internal use only and do not reflect what you might see through the shared drive interfaces. The repository also holds the associations among content items, classifications, and the folder/file structure. The folder/file structure is maintained in the database and is not reflected in the internal file storage structure.
The Content Application Server is responsible for the business logic for the control, access, and update of content in the repository. The Content Application Server allows you to execute applications either as Web scripts or as Java extensions. All the applications of the Alfresco ECM suite are built upon and executed by the Content Application Server. The sample Knowledge Base application used later in this book is built using the Content Application Server and the Alfresco Share application.
Alfresco applications are built upon the Content Application Server and rely on the Content Application Server to persist, access, query, and manage content. The Alfresco applications exist to provide the basic capabilities that most users need to manage content. The two main applications are Alfresco Share and Alfresco Explorer.
Alfresco Explorer is the original application built with the Alfresco system to manage content. Alfresco Explorer allows you to browse the repository, set up rules and actions, and manage content and its metadata, associations, and classifications. Alfresco Explorer was built using JavaServer Faces and is integrated into the Content Application Server. It is currently being phased out in favor of Alfresco Share. However, many extensions and language packs have been built for Alfresco Explorer. It also has extensive capabilities for managing the repository and should be considered a system administrator tool.
Excerpted from Professional Alfresco by David Caruana John Newton Mike Farman Michael Uzquiano Kevin Roast Copyright © 2010 by John Wiley & Sons, Ltd. Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.