The technology underpinning the Greyden Press “Articles Online”
product is PageArk, a software package developed by Aptigent Inc. of
Beachwood, Ohio. The PageArk software consists of a custom application
written on top of a native Microsoft Windows 2000 operating environment
that manages a dynamic content repository, provides full text and metadata-based
searching of the repository and dynamically delivers repository content
to authorized end-users.
Fundamental to the design of PageArk were several apparently contradictory
objectives:
-
First
and foremost, PageArk was designed to efficiently and reliably manage
collections of content elements (articles) no matter how few or
how many articles constitute the collection.
-
Within
each collection, PageArk supports a variety of article organization
structures allowing numerous publication types (including but not
limited to Journals, Transactions and Proceedings) to simultaneously
be represented and managed within a single collection.
-
PageArk
permits metadata (such as author(s), article title, date of publication,
abstract, etc.) to be associated with each article in the collection.
Commonly used metadata fields are natively supported by PageArk
while unique, client-specific fields can be added as needed. Efforts
are continually underway to assess and adopt emerging metadata standards
(For example – native Dublin Core support was recently added).
-
PageArk
was required to provide exceptional search performance while supporting
searches against metadata fields, against the full-text of an article
or unified searches against both the full-text and selected metadata
fields. PageArk queries may be formed as simple word/phrase queries
or as complex Boolean queries. Search results are returned as a
list of matching articles with results returned ordered by date,
title, relevance or other sequences as specified by the publisher.
PageArk search strategies may be tuned on a collection-by-collection
basis to reflect the size, structure and search requirements of
individual publishers.
-
PageArk
was built using a standard Windows operating system plus MS Office
functions and features. No special 3rd party software or hardware
products or features - including search engines, programming tools
or hardware assists are used in the program. The source code is
documented and can be installed and maintained on most Intel PIII
platforms using current versions of Microsoft’s Windows Server
2000 operating system and Microsoft Access 2000 or 2002.
Fundamentally,
articles are stored in a repository that is represented by a standard
file directory structure. No attempt is made to store article files
in 3rd party data management systems. Although such approaches are frequently
used in other implementations, adoption of such strategies invariably
introduces processing delays and overhead that is both unnecessary and
frequently counter-productive. PageArk exploits the inherit stability
of the OS file management system and is tuned to provide an optimum
blend of performance, publication modeling and disaster recovery capabilities.
Microsoft Index Server is the search engine driving the PageArk program.
Index Server is integrated with Microsoft Internet Information Server
and the Windows 2000 operating system. Index Server automatically builds
an index of your Web server that can be easily searched from any Web
browser. The index maps words and/or metadata to documents, and to locations
within documents.
To
support searching across a wide range of file types, Microsoft’s
IFilter technology is exploited. Although custom filters can be developed
to support arbitrary file types, PageArk typically relies on vendor
provided Filters to support full text searching of individual articles.
To address those cases where the articles to be delivered are provided
in a non-supported file format, PageArk support staff create an alternative
article representation in supported IFilter format; this strategy preserves
the delivery integrity of the original article while providing effective
search capabilities.
For
each article within the repository, a limited number of associated data
files/structures are created and managed. The publisher provided article
itself is stored. An XML encoded index file consisting of article level
metadata – and possibly an alternative content representation
if a suitable IFilter is not available – is created and indexed.
Special, publication-specific, metadata and selected organizational
metadata (publication date, publication title, publication description,
ascension date, etc.) are stored in an Access or other SQL database
file. Other index structure may be developed depending upon publisher
needs; for example keyword indices or DMOS-like article directories
and taxonomies can be supported as value-added supplement index files.
The
only license fees associated with PageArk are those charged by Aptigent
for the development and ongoing maintenance of the software, along with
any fees associated with the standard Microsoft products needed to run
the PageArk software.