Isabel F. Cruz
Wendy T. Lucas
Department of Computer Science
Database Visualization Research Group
Worcester Polytechnic Institute
Tufts University
ifc@cs.wpi.edu
wlucas@cs.tufts.edu
http://www.cs.wpi.edu/People/faculty/ifc.html
http://www.cs.tufts.edu/~wlucas
Multimedia data has become readily available from a variety of resources, such as the Web, to users (ranging from naive to sophisticated) who need to select and to present the data in a way that is meaningful to their particular applications. DelaunayMM is our framework for querying and presenting multimedia data stored in distributed data repositories, including the Web. It is unique in combining user-defined layouts with ad hoc querying capabilities, thereby enabling users to tailor, in a simple way, the layout of virtual documents composed of retrieved multimedia objects. In this paper, we focus on the object-oriented data models, on the declarative query languages, and on how the results of the queries to disparate resources are integrated to form coherent user-defined documents.
To address these new requirements, concepts and tools are needed that
enable users ranging from naive to sophisticated to not only select
the information they need but also to present it in a way that is
meaningful to their particular
application [17,21]. Our framework for
querying and presenting multimedia data stored in
distributed data repositories, including the Web, is called
Delaunay
. Its uniqueness lies in its combination of
user-defined virtual document layouts with the ability to define
document content through ad hoc queries to multiple repositories.
Delaunay
is a multimedia extension to the Delaunay Database
Visualization System [9], an interactive,
[5]
constraint-based system for visualizing object-oriented databases.
Delaunay users pictorially specify, in an intuitive yet formal way,
the visualization of database objects. By arranging graphical
geometric objects and graphical constraints, users form a ``picture"
that specifies how to visualize data objects belonging to a
class.
Following a similar approach, users of
Delaunay
visually represent the spatial layout of the data
to be retrieved from distributed multimedia repositories.
The Delaunay
document layout model defines a virtual
document as being a set of user-specified style sheets.
Therefore, the layout of a document is
based on one or more style sheets (e.g., for the layout of the
title page or of the chapter pages).
Within the document, a set of pages is associated with each style
sheet, which serves as a template for the layout of these
pages. The user associates queries with the
style templates, thus combining data selection with presentation.
Graphical icons, including a scrollable box for text, a re-sizable window for images, and a control box for audio, are assigned to each query and given presentation attributes. The icons are then arranged into a style sheet by either snapping to a grid or by explicitly specifying spatial constraints [8].
The Delaunay
query language interface supports standard
SQL clauses including select, from, and where.
It is flexible enough to address queries to
distributed relational and object-oriented databases as well as to the
Web. In the latter case, an object-oriented model of multimedia documents
and elements provides the attributes on which to query. This model
extends the HTML 3.2 DTD [22] by incorporating additional
metadata attributes, including some from an emerging standard called
the STARTS protocol for Internet retrieval [15].
Queries to
the Web are complicated by its cyclic structure and the fact that the
destination for a query is often not known ahead of time
(for example, in relational queries the
names of the tables storing the sought-after data
are supplied by the person forming the query; on the other hand,
a query to the Web may or
may not include a URL from which to extract the data). Navigational
queries, which enable the browsing of document links during query
processing, and keyword searches by multiple search engines are therefore
supported.
By combining the user-defined style templates with the answers to the queries, a virtual document with pages populated with the retrieved multimedia objects is automatically generated. Each page is associated with one style sheet that determines the layout of the page's elements. Pages are linked together and can be traversed via a previous/next mechanism.
The content of each page is based on the answers of the queries associated with it. In some cases, more than one set of page elements (i.e., multimedia objects) may be retrieved in response to a query. The default display specification is to show sets of objects in order of retrieval, with additional sets connected by links that are traversed in a similar manner to page links.
Thumbnail views of each page provide an overview of the entire
document, as shown in Figure 1.
. The style sheet for
this document contains a text icon, an image icon, and an audio
icon. Queries to populate the icons on each page are first translated
into the syntax of the repository to be queried. For example, queries sent
to the Web are translated into WebSQL queries [20]. After invoking
the queries, further processing of the retrieved objects is performed in
order to create the user-specified presentation.

Figure 1: Thumbnail overview of a document.
Within a thumbnail view, pages are arranged in accordance with their position within the generated document, and can be reordered via a drag and drop operation. Selecting a particular thumbnail enlarges that page and makes it the active view.
In addition to the Web, an example of an application domain for which
Delaunay
is currently being implemented is the Perseus
Project, a digital library on ancient Greek culture [4].
Knowledge of the data schema, as captured by the data wrapper,
provides the attributes on which to query.
By providing an integrated query/presentation interface, visitors to
the Perseus site [5] will be able to examine the many
vases, coins, texts, and other works in ways that are currently not
possible. For example, one could display multiple views of one piece
of sculpture, compare the same view of many different vases, or
arrange a virtual document in which each page represents the artwork
of a different artist. In this last case, users could click on one
image of a work by a particular artist to view the next work in the
set, or could view the works of another artist by clicking on the link to
the next page.
Following [9], two types of save operations are required to take full advantage of the capabilities inherent in the framework presented here. One saves the actual virtual document for future viewing. The other saves only the query and layout specifications, so that new virtual documents based on previous specifications can be generated, either by editing the specifications or by using more sophisticated mechanisms, such as inheritance and deductive rules (see [6,7]).
The remainder of this paper is organized as follows.
Section 2
describes the Layout and Query Framework,
including the Delaunay
layout and data models.
Section 3
contains descriptions of Query Processing
and Virtual Document Generation. Our current implementation is described in
Section 4
, while Section 5
contains a comparison with related work. Our paper concludes with the
discussion of future work in Section 6
.
The simplest way to specify the layout is by snapping to a grid and adjusting the icons to fill the desired space. Each icon will ultimately be replaced by a set of objects that fit the query criteria associated with it. Rather than snapping to a grid, the user can enter numerical values for the dimensional attributes of an icon, such as length and width for a text box, or can place visually specified constraints on the values of those attributes [8]. These constraints are (1) length constraints or (2) overlap constraints (if an object is to be placed on top of another). Length constraints are linear (unary, binary, or ternary), maximum or minimum constraints.
Since more than one object within a class may satisfy the query, one can specify how many instances of each class to view at a time by selecting a predefined presentation view. Alternatively, links inherent to a chosen presentation (e.g., stack of cards) can provide the navigational path from one element of the set to the next.
Also assigned to each icon are presentation attributes, such as font for text and color composition for images, whose values are specified by the user. All instances of the database class that fit the query criteria for an icon are presented throughout the document in accordance with these attributes.
The layout shown in Figure 2
is an example of a
style sheet containing an image icon, a scrollable text box, and a
non-scrollable text box. Figure 3
shows the query
tree associated with that style sheet, which will be described in
Section 2.2
.

Figure 2: Style sheet.
Figure 3: Query tree.
In this example, the length and width of the image icon are proportional to those of the largest image that will be contained in that space. These constraints are specified via dialog box options for the length and width attributes. For the non-scrollable text box, the ``fill area" attribute has been selected, so that the font and letter size of the text to appear there will be automatically chosen to fill the specified area. A maximum constraint on the height of the page is set to be either (1) the sum of the heights of the image object, the non-scrollable text box, and the space between them, or (2) the height of the scrollable text box, whichever is greater.
In addition to the layout of each page associated with a particular
style sheet, the user can organize the overall layout of a virtual
document by specifying the relationships between the sets of pages
belonging to the different style sheets.
Figure 4
shows a layout

Figure 4: Tree layout of a virtual document.
for a virtual document, or ``book'' on ``Greek Vases'' composed with objects resulting from queries to the Perseus database. It depicts a hierarchical organization, with the page associated with the ``book cover'' style sheet at the top. The level below contains the cover pages for the ``chapters'' of the book. The pages contained in the chapter on vases found at Harvard University are drawn as children of the Harvard node of the hierarchy. The specification of the layout of virtual documents is achieved using visual rules [11].
The object-oriented model chosen for representing the document layout and
the retrieved data is based on the O2 [13] and
F-logic [18] data models. Figure 5
shows
the structure of this data model for a virtual document. It has the
two primitive type constructors: tuple and set. Syntactically in our
representation, tuples are included between square brackets and sets
are included within braces.

Figure 5: Document layout model.
The Document class is defined as a tuple containing name and styles attributes. The latter is a set valued attribute, since its value is a set of objects of class Style. The different objects of class Style allow the user to model the different kinds of pages found in virtual documents, as previously described.
As an example, there may be one page with a ``Table of Contents" style, and many pages with a ``Body" style. Attributes of the Style class are (1) description, which contains a name (of class string given by the user) and (2) pages, which contains the set of page objects inheriting the layout defined for a particular style.
The Page class has the attribute elements, which is set-valued. Each element of the set is a tuple with two attributes: icon_id and location. The value of the latter is a set of coordinates that define the position of the icon within a page. Other attributes of the Page class are a reference to the next page, and one to the previous page.
An object of class Icon is a tuple made up of a data attribute and a query attribute. The data attribute is associated with a data set (e.g., the set of all images of Greek vases). Each data element in the set has a physical identifier (pid) to denote the data repository in which it resides and a value to identify it within that repository. For Web-based data, that value is its URL. A set of data points representing a physical location within an icon is also associated with each data element (e.g., the coordinates of the lower left corner and of the upper right corner of a rectangular region). The query attribute stores the query used to populate the icon, as described in the next section.
The multimedia classes of Text, Image, Audio, and Video are all subclasses of the Icon class. Each inherits the data and query attributes, and in addition has its own type-specific ones. For example, attributes of class Text include font and size, while attributes of class Image include color content and resolution.
Grouping icons together has both a presentation and a query significance. In terms of presentation, elements of sets associated with one icon are matched with elements of sets associated with the other icons in the group. When the user iterates on a group, the next object in all sets within the group is displayed.
Icons within a query group are the values for the select portion of the query. Iterating through an inner query group will change only the presentation associated with that particular grouping. Iterating through the outermost query group will change the presentation for the entire page (this is similar to nested loops in a programming language, where the inner loop changes ``more quickly'' than the outer loop).
An example illustrating the query formation process that uses the Perseus
database is the creation of a book of vases from the Harvard
Art Museums. The user first creates the style sheet of
Figure 2
, and places the image and text box (which
contains a label associated with that particular view)
within one query group, so that the two change together
as she iterates through the
many different views of each vase.
The text area icon, however, is in its own grouping box,
because the texts she will be
retrieving relate to the vase as a whole. Iterating through these
texts should therefore be independent from iterating through the different
views of the vase. Finally, all three icons are placed within an outer
query group so that she can link from one page to the next, with each
page containing information on a different vase within the Harvard
collection.
Our data model includes some attributes that are not currently part of the DTD. Most of these have been put into a new MDATA class for metadata attributes. Included here are attributes defined by the STARTS protocol for Internet retrieval and search [15]. Namely, the SRange attribute relates to the ScoreRange field, and lists the minimum and maximum query scores a document can get within a search engine, while the AlgID attribute relates to the RankingAlgorithmID field and identifies the ranking algorithm used for computing scores in that search engine. Once available, both of these attributes could be used for more effective merging of files retrieved by multiple search engines. The links attribute, also included in the STARTS protocol, is used for storing all links contained in a file. At the present time, its values are also not provided by search engines.
Other attributes that we added to the MDATA class are currently
provided by search engines. These include length, for
the length of the file, and moddate, for the last date of file
modification. Both of these attributes are also supported by
WebSQL [20], the query language into which our Web-destined queries
are translated, as explained in Section 2.2.2
.
The WebSQL classification of links as interior (within the same page), local (within the same site), or global (outside the current site) has also been added to our model under the A class, which is used for describing anchors. The base attribute tells the URL of the document containing the link, and the href attribute tells the URL of the target of the link.
Figure 6
shows a partial data schema representing the
additions we have made to the existing DTD model. Sets of elements,
such as the set of URLs represented as {URL}, indicate that zero or
more such elements may be present. The symbol ``|" is used to
represent an OR condition.

Figure 6: Search engine classes.
The user first specifies the repositories to be queried, so that the
query interface can display the attributes, in scrolling lists, for
that repository. The values of the select clause are
partially specified during the layout specification process: when a
text icon is added to a style sheet, Delaunay
automatically assigns an object identifier (oid) to it, such as
``Text1". The user must then select the text attribute to retrieve,
such as ``title". In the case of an image or an audio recording, the
file type to retrieve is specified, such as ``gif" for an
image, or ``wav" for a recording.
To illustrate query formation and the grouping of queries, we will
continue with our Perseus example. The relational tables from
the Perseus database that are relevant to the queries that follow are
shown in Figure 7
.
The image of a vase in the query is assigned the oid Image1
by the system, and the text box is assigned the oid Text1. Using
scrolling lists and dialog boxes, the user creates the query in
Figure 8
.
If this were the only query defined for the page, then clicking on the query group's forward and backward links would result in the display of each vase in the Harvard collection along with its name.
By adding a separate query group containing a text box, the user is
able to view all the descriptions for each vase. The
query associated with this group is shown in Figure 9
.
The last query group contains all of the icons defined for the page,
and encompasses the queries shown in the examples of
Figures 8
and 9
. The user would like each page of the
document to contain information on one of the vases in the
Harvard collection. The query for the entire page is shown in
Figure 10
. Each page of the document created from this
query refers to a different vase. Within any page, it is possible to
iterate through all of the different images of the vase and read the summary
information describing it.
In the next example, the user would like to view the two sides
(obverse and reverse) of each of the 523 Dewing coin images in the
Perseus database. The Images table has a Sequence attribute,
which is an ordered integer list of the different views stored for each
object. Two image icons are added to the style sheet
and placed within one query group. The query associated with
that group is shown in Figure 11
. The result is a document
in which each page shows the two sides of every coin with images in
the database.
In posing queries to the Web, a particular
URL can be specified from which to start the search. Attributes from the
Delaunay
Web file schema appear as selections within
scrolling lists. Anchor attributes supporting the interior, local,
and global categorizations found in [20] are also available
for selection so that the types of links on which to
navigate can be specified. Figure 12
shows a query
that finds all images of George Washington connected by two or fewer
local links to a particular URL, while Figure 13
shows that query as entered into the Delaunay
query
interface to WebSQL.

Figure 13: DelaunayMM query interface to WebSQL.
If the user does not know the starting location for the above query,
then a keyword search is needed. All the images
connected by two or fewer local links to a Web document containing the
keywords ``George Washington" are specified.
This query is shown in Figure 14
.
Note that the where clause further specifies that the keywords appear
in the title (as opposed to, say, anywhere in the document).

Figure 15: Architecture of DelaunayMM.
The Query Processing component is responsible for (1) mapping the schemes of the underlying data repositories to an object-oriented representation for use by the Query Formation component, (2) formatting queries from the Query Formation component into the syntax recognized by their destinations and then executing them, (3) sorting and merging the results of queries, and (4) passing those results on to the Virtual Document Generation component. There, the user-specified layouts are combined with the processed data to form the completed document.
After the queries that define a document have been
[5]
formed, they are sent to the Query Processing component for
translation into a syntax recognized by the query destination. In the
case of Perseus, that syntax is SQL. In the case of the Web, queries
are translated into WebSQL and then executed by the WebSQL server.
The files returned in this latter case go through an additional
selection process in which attributes not defined for querying within
WebSQL are evaluated. For example, a user might only want a document
if a particular phrase appears in one of its headings (as in the query
of Figure 14
where the document's title must contain
``George Washington''), believing that phrase to be more strongly
associated with that document than with one in which the phrase only
appears in the document's body. This kind of selection is not
performed by the WebSQL server. Therefore, we need to parse the HTML
documents that have been returned by the WebSQL query and select only
those where the phrase appears in the title.
Next, the retrieved data must be merged on the basis of page content
as defined by the queries associated with each page. For example,
after executing the queries to form a book of vases from the Harvard
collection described in Section 2.2.2
, the images are
matched up with the name of the vase and the text describing them.
Figure 16

Figure 16: Structured map instance.
shows the content, by means of a structured map [12], of the page for the Harvard 1895.247 vase. There are three image-name pairs for this page (the name is required to be shown with each image by the Harvard Museum), and one Decoration_Description.
The Layout Specification component provides the front-end interface
through which users define how to present the data to be retrieved.
The tool box, as shown in Figure 17
, provides
icons for adding multimedia element representations to each style
sheet.

Figure 17: Tool window from DelaunayMM.
The first five buttons in the top row are used for adding text, images, video, audio, and label elements, in that order, to a style sheet. Once an element has been added, double-clicking on its representation brings up its presentation attributes. The sixth button in that row is used for attaching queries to page elements.
In the second row, the first two buttons are for adding length and overlap constraints. Length constraints can be added between system-defined locations, called ``landmarks", on the borders of elements. For example, a distance specification can be set between the center of an image element and the center of a text element by adding a length constraint between those two landmarks. Alternatively, the user can add user-defined location markers called ``anchorpoints" to elements by clicking on the third button in this row. This makes it possible to specify length constraints between any two points on two elements, such as the upper left corner of one image and the lower right corner of another. The fourth button is used for viewing and organizing the overall layout of a virtual document, while the fifth button adds a page border that is used for defining page attributes and in setting constraints between the borders of a page and the elements that fall within it. Finally, the sixth button is for a snap-to-grid option. Standard editing functions (e.g., copy, move, delete, and select all) are available from the pull-down menu labeled ``Edit".
Figure 18
shows the style sheets window, which contains two
templates called ``chapters" and ``body" that have been created for the
virtual document on vases. In the ``body" style sheet, the blue line
running vertically through the center is a length constraint used for
defining the overall size of each page relative to its contents. After
a virtual document has been generated, its pages are displayed in a thumbnail
view similar in layout to the style sheets window. Clicking on a page
makes it the active view.

Figure 18: Style sheet window from DelaunayMM.
Two parallel efforts are being pursued at this time in terms of interfacing to distributed data repositories. One of these corresponds to queries destined for the Web. Our query interface translates queries into WebSQL, checks for correctness, and sends them to the WebSQL server for processing. The other effort is focused on the Perseus Project. A data wrapper for the Perseus database is currently under development.
While in [2,28] documents are generated from a set of
known objects, our approach is designed with external datasets,
including the Web, in mind. The work by
[5] Weitzman and Wittenburg was
an important source of inspiration for the current work. As for the
expressiveness of the spatial layout, the work by Bertino et
al. is quite similar to our former
work [6,7], but differs from it in that it is
based on the relational data model.
However, they also consider temporal constraints, which we have not
yet incorporated into Delaunay
.
In addition, we offer a visual approach that spans from the laying out of
the content of individual viewable pages to the modification of
features and page orderings found in the completed virtual document.
Other related work includes Garlic [3], DISCO
[5](Distributed
Information Search COmponents) [27], and
InfoHarness [25] for querying heterogeneous distributed
databases. The first of these approaches differs from ours in that
they query one database at a time and do not try to integrate data
obtained from a variety of sources. While the second approach does
incorporate these features, it does not focus on multimedia data and leaves
the presentation of retrieved data up to applications programmers. The
InfoHarness system uses metadata extraction methods to create information
repositories that support run-time access to the original information.
While our system incorporates retrieved multimedia objects into user-defined
presentations, the information retrieved by InfoHarness
is converted to a system-generated combination of HTML forms and hyperlinks,
which are then viewed using the Mosaic browser.
The system developed at Xerox PARC [23] uses a variety of 3D displays and integrates an algorithm for the effective browsing of a large collection of documents. Two important differences are our emphasis on user-defined layouts and the availability of our interface over the Web. We have also elected to use 2D displays for faster prototyping and easier access over the Web.
The work by Hüser et al. [16] is directed to the
generation of documents on the fly. Although this work is intended for the
visualization of a single information repository, its
presentation objectives are remarkably similar to ours. An interesting
difference is that they do not assume pre-defined templates while we
have done so, mainly with the objective of simplifying the user's
interaction. Using Delaunay
, the more sophisticated
user can, however, achieve similar functionality by using
visual rules to shape the layout of the virtual documents [11].
In the future, we will expand our access to data repositories other than the Web and Perseus. Examples of other data wrappers and repositories include Garlic [3], QBIC [14], and DISCO [27]. QBIC will allow us to test our ideas on querying images using attributes that are not of type string.
While we have an expressive framework for the specification of spatial layout [10] we have not yet addressed the temporal layout of multimedia components within the virtual documents that are user specified (see for example [19,29]).
We also plan on conducting usability studies, which are of particular importance to applications intended for a large variety of users. Although the user interface is based on the one in [9], it must support a host of new features related to multimedia data types and distributed data. Our first users are the members of the Perseus Project with whom we have been cooperating. While they fulfill the role of the users who are digital librarians, we would also like to have an experimental site available to the casual users of the Perseus site [5]. Given the popularity of this site, we believe that it would be an ideal testbed for our ideas.
This document was generated using the LaTeX2HTML translator Version 97.1 (release) (July 13th, 1997)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -split 0 mm97html.tex.
The translation was initiated by Isabel Cruz on 8/14/1997