The Technology of “Articles Online”


The technology underpinning the Greyden Press “Articles Online” product is PageArk, a software package developed by Aptigent Inc. of Beachwood, Ohio. The PageArk software consists of a custom application written on top of a native Microsoft Windows 2000 operating environment that manages a dynamic content repository, provides full text and metadata-based searching of the repository and dynamically delivers repository content to authorized end-users.
Fundamental to the design of PageArk were several apparently contradictory objectives:
  1. First and foremost, PageArk was designed to efficiently and reliably manage collections of content elements (articles) no matter how few or how many articles constitute the collection.

  2. Within each collection, PageArk supports a variety of article organization structures allowing numerous publication types (including but not limited to Journals, Transactions and Proceedings) to simultaneously be represented and managed within a single collection.

  3. PageArk permits metadata (such as author(s), article title, date of publication, abstract, etc.) to be associated with each article in the collection. Commonly used metadata fields are natively supported by PageArk while unique, client-specific fields can be added as needed. Efforts are continually underway to assess and adopt emerging metadata standards (For example – native Dublin Core support was recently added).

  4. PageArk was required to provide exceptional search performance while supporting searches against metadata fields, against the full-text of an article or unified searches against both the full-text and selected metadata fields. PageArk queries may be formed as simple word/phrase queries or as complex Boolean queries. Search results are returned as a list of matching articles with results returned ordered by date, title, relevance or other sequences as specified by the publisher. PageArk search strategies may be tuned on a collection-by-collection basis to reflect the size, structure and search requirements of individual publishers.

  5. PageArk was built using a standard Windows operating system plus MS Office functions and features. No special 3rd party software or hardware products or features - including search engines, programming tools or hardware assists are used in the program. The source code is documented and can be installed and maintained on most Intel PIII platforms using current versions of Microsoft’s Windows Server 2000 operating system and Microsoft Access 2000 or 2002.
Fundamentally, articles are stored in a repository that is represented by a standard file directory structure. No attempt is made to store article files in 3rd party data management systems. Although such approaches are frequently used in other implementations, adoption of such strategies invariably introduces processing delays and overhead that is both unnecessary and frequently counter-productive. PageArk exploits the inherit stability of the OS file management system and is tuned to provide an optimum blend of performance, publication modeling and disaster recovery capabilities. Microsoft Index Server is the search engine driving the PageArk program. Index Server is integrated with Microsoft Internet Information Server and the Windows 2000 operating system. Index Server automatically builds an index of your Web server that can be easily searched from any Web browser. The index maps words and/or metadata to documents, and to locations within documents.

To support searching across a wide range of file types, Microsoft’s IFilter technology is exploited. Although custom filters can be developed to support arbitrary file types, PageArk typically relies on vendor provided Filters to support full text searching of individual articles. To address those cases where the articles to be delivered are provided in a non-supported file format, PageArk support staff create an alternative article representation in supported IFilter format; this strategy preserves the delivery integrity of the original article while providing effective search capabilities.

For each article within the repository, a limited number of associated data files/structures are created and managed. The publisher provided article itself is stored. An XML encoded index file consisting of article level metadata – and possibly an alternative content representation if a suitable IFilter is not available – is created and indexed. Special, publication-specific, metadata and selected organizational metadata (publication date, publication title, publication description, ascension date, etc.) are stored in an Access or other SQL database file. Other index structure may be developed depending upon publisher needs; for example keyword indices or DMOS-like article directories and taxonomies can be supported as value-added supplement index files.

The only license fees associated with PageArk are those charged by Aptigent for the development and ongoing maintenance of the software, along with any fees associated with the standard Microsoft products needed to run the PageArk software.