<<Biblioteca Digital del Portal<<INTERAMER<<Serie Educativa<<Digital Libraries and Virtual Workplaces Important Initiatives for Latin America in the Information Age<<Chapter 5
Colección: INTERAMER
Número: 71
Año: 2002
Autor: Johann Van Reenen, Editor
Título: Digital Libraries and Virtual Workplaces. Important Initiatives for Latin America in the Information Age
2. Students and EDTs
Students are the most important
participants in ETD activities. They are the main target of the education
effort. They are the ones who learn by doing, and so promote access to the
ETDs they prepare to help communicate their research results.
Benefits to students utilizing ETDs
There are many reasons for
ETDs. Indeed, if one asks “What are the reasons to not have ETDs?” it is difficult
to find any convincing, forward-looking answer. Almost all TDs are produced
as electronic documents, and if students know in advance about how to prepare
ETDs, then creating their own ETD usually is a very simple process. In addition,
there are special benefits that result from ETD creation:
- New genre
The first benefit is that
new, better types of TDs may emerge as ETDs develop as a genre. Rather than
be bound by the limits of old-style typewriters, students may be freed to
include colour diagrams and images, dynamic constructs like spreadsheets,
interactive forms such as animations, and multimedia resources including audio
and video. To ensure preservation of the raw data underlying their work, promote
learning from their experience, and facilitate confirmation of their findings,
they may enhance their ETDs by including the key datasets that they have assembled.
As the new genre of ETDs
(Fox, McMillan, & Eaton 1999a) emerges from this growing community of
scholars, it is likely to build upon earlier forms. Simplest are documents
that can be thought of as “electronic paper” where the underlying authoring
goal is to produce a paper form, perhaps with color used in diagrams and images.
Slightly richer are documents that have links, as in hypertext, at least from
tables of contents, tables of figures, tables of tables, and indexes – all
pointing to target locations in the body of the document. To facilitate preservation,
some documents may be organized in onion-fashion, with a core mostly containing
text (that thus may be printable), appendices including multimedia content
following international standards, and supplemental files including data and
interactive or dynamic forms that may be harder to migrate as the years pass
by. Programs, applets, simulations, virtual environments, and other constructs
yet to be discovered may be shared by students who aim to communicate their
findings using the most suitable objects and representations.
- Minimize duplication of effort
A second benefit of ETDs
is a reduction in the needless repetition of investigations that are carried
out because people are unaware of the findings of other students who have
completed a TD. Except in unusual cases, masters’ theses are rarely reported
in databases (e.g., very few, except those from Canada, appear in UMI services
like Dissertation Abstracts). Few dissertations prepared outside North
America are reported either. With a globally accessible collection of ETDs,
students can quickly search for works related to their interest from anywhere
in the world, and in most cases examine and learn from those studies without
incurring any cost.
- Improve visibility
Once ETDs are collected
on behalf of educational institutions, digital library technology makes it
easy for works to be found. Through www.theses.org, NDLTD directly makes ETDs
available, and points to other services that facilitate such discovery. As
a result, hundreds or thousands of accesses per year per work are logged,
for example, according to reports from the Virginia Tech library regarding
the ETDs it makes publicly accessible (Fox, McMillan, & Eaton 1999a; Eaton,
Fox, & McMillan 1998). As the collection of ETDs available grows and reaches
critical mass, it is likely that it will be frequently consulted by the millions
of researchers and graduate students interested in such detailed studies,
expositions of new methodologies, reviews of the literature on specialized
topics, extensive bibliographies, illustrative figures and tables, and highly
expressive multimedia supplements. Thus, students and student works will become
more visible, facilitating advances in scholarship and leading to increased
collaboration, each made possible by electronic communication, across space
and time (Fox, Hall, Kipp, Eaton, McMillan, & Mather 1997b).
- Accelerate workflow
ETDs can be managed through
automated procedures honed to take advantage of modern networked information
systems. Since the shift to ETDs requires policy and process discussion among
campus stakeholders, it is possible to streamline workflow and save time and
labor. Checking of submissions and cataloging is sped up, moving and handling
of paper copies is eliminated, and delays for binding are removed. The time
between submission and graduation can be reduced, and ETDs can be made available
for access within days or weeks rather than months.
- Save money
ETD submission over networks
has zero cost, which compares favorably with the charges of hundreds or thousands
of dollars otherwise required to print, copy, or publish TDs using paper or
other media forms. In many institutions, the networking, computing, and software
resources available to students suffice so that students preparing ETDs need
make no additional expenditure. Similarly, on many campuses, assistance is
available to answer questions and train students regarding word processing
and other skills valuable for authors of electronic documents and users of
digital libraries. If students elect to use personal computers and acquire
their own software to use in ETD creation, these will later be useful in other
research and development work, for both professional and personal needs, with
low marginal expense specifically required for ETDs. Thus, it is typical that
the pros far outweigh the cons regarding students preparing ETDs.
Access to ETDs
Since in most cases it is
in the interest of students and universities to maximize the visibility of
their research results, the general approach of NDLTD is to encourage all
parties interested to facilitate access to ETDs. There are a number of well
known sites/resources for ETDs and the NDLTD runs the Web site http://www.theses.org,
which also has alias http://www.dissertations.org,
as a central clearinghouse for access to ETDs. This site points to various
others that support portions of the worldwide holdings of ETDs. For example,
the largest corporate archive, with over 1.5 million entries, is managed by
UMI, and has most doctoral dissertations from USA and Canada, as well as most
masters’ theses from Canada, in microfilm form, with metadata available as
a searchable collection through Dissertation Abstracts. Since 1997
UMI has scanned new submissions (originally from microfilm, later directly
from paper) and made the page images available through PDF files. With over
100,000 ETDs accessible through subscription or direct payment mechanisms,
UMI hosts the largest single collection of electronic TDs as well as of microfilm
TDs.
Other corporations as well
as local, regional, national, and international groups associated with NDLTD
have Web sites too, such as http://www.cybertheses.org
for the international Francophone project or http://www.dissonline.org.
In addition, a number of WWW search engines have indexed some of the ETD collections
available so this genre is included in general Web searches. Some other schemes
also allow access to ETD collections. Using Z39.50, the “information retrieval
protocol”, for example, the Virginia Tech ETD collection can be accessed through
suitable clients or from some library catalogue systems. OCLC’s WorldCat
service, with over 20 million catalogue records, has an estimated 3.5 million
entries for TDs. Perhaps most promising is that the global as well as regional
and local metadata information about ETDs may become widely accessible through
the Open Archives Initiative (Van de Sompel 2000). This initiative
is discussed in greater detail in van Reenen’s chapter on scholarly communication.
Searching EDTs across sites and in local collections
As part of the education
component of NDLTD, it is hoped that graduate students will become facile
with searching through electronic collections, especially those in digital
libraries. If we regard managing information as a basic human need, ensuring
that the next generation of scholars has such skill seems an appropriate minimal
objective. Most specifically, since graduate research often builds upon prior
results from other graduate researchers, it seems sensible for all ETD authors
to be able to search through available ETD holdings. NDLTD encourages the
provision of online resources, self-study materials, individual assistance,
as well as group training activities so that graduate students become knowledgeable
about resource discovery, searching, query construction, query refinement,
citation services, and other processes – both for ETDs and for content in
their discipline.
Classification systems and schemes
Considering further the
educational mission of NDLTD, it is hoped that students will learn other concepts
from the fields of library and information science. As emerging scholars,
they should grasp the entire information life cycle that is now being supported
through digital libraries (Borgman 1996). Some of those aspects are considered
below. Here we note that manual or automatic schemes are often deployed to
categorize or classify documents so they can later be found by referring to
an appropriate category. Indeed, when people browse through a collection,
they often navigate through a suitable classification system or “concept space”
to find likely portions to examine.
There are general classification
schemes, such as the Library of Congress Subject Headings, Dewey Decimal Classification,
and simpler schemes prepared by UMI and UNESCO. The US National Library of
Medicine has MeSH (Medical Subject headings) as well as the more extensive
UMLS (unified medical language) scheme, while for computing the Association
for Computing Machinery (ACM) maintains the Computing Classification System.
Many other services are offered for other disciplines.
Learning about creating EDT systems
Since education is the core
of NDLTD efforts, it is important to ensure that a wide variety of mechanisms
are in place, for students, with their varying learning styles, to be aided.
First, learning by example is facilitated because thousands of ETDs are available
that can be consulted, including many in ones own discipline, as well as exemplary
or notable works such as those highlighted from http://www.theses.org.
Second, participants in NDLTD typically have online training resources available,
such as the Virginia Tech site at http://etd.vt.edu,
where general information as well as specific local requirements are addressed.
Third, most universities in NDLTD periodically offer workshops to explain
about ETD preparation, often tailored to both novice and expert groups. Some
of these involve presentations, while others involve hands-on activities.
The latter may occur in special classrooms or laboratories, sometimes with
scanners and other multimedia devices, to serve specialized as well as common
needs. Typically, a campus will have a small cadre of helpers who are knowledgeable
about the ETD process, and can resolve unusual problems or address special
needs. Though such services are seldom needed at sites where comprehensive
computer and information literacy programs are in place, it is appropriate
that when ETD submission becomes a mandatory requirement, those who face difficulties
should be quickly aided.
Guide to preparing an ETD
Since students learn best
by doing, developing their own ETD is the most effective way for the next
generation of scholars to be prepared regarding electronic document production.
Though details will vary over the years, this practice will ensure that students
at any point in time have relevant knowledge and skills appropriate for the
available technology.
Students preparing their
ETDs should learn about the entire information life cycle, and work so their
research results can be accessible to all interested parties, into the foreseeable
future. This objective means that they must consider a variety of concepts
and practices, related to document preparation and representation, as well
as preservation and access, sketched briefly in the following subsections.
Writing in word processing systems
Most authors today use word
processing systems. The most popular is Microsoft Word. Corel WordPerfect,
in earlier years more popular, is also widely used. For those working frequently
with mathematical expressions, the TeX and LaTeX family of tools (including
BibTeX for bibliographies) has replaced the earlier-used UNIX suite of troff,
tbl, eqn, refer, and other routines.
Office systems, developed
for document preparation and high quality typesetting services, also are appropriate
for long and complex works such as ETDs, when authors have requisite knowledge
and skills. FrameMaker, PageMaker, Staroffice, and other packages are among
the popular solutions.
Because ETDs often are complex
documents, that may be developed over the years required to complete a graduate
research program, it is essential that students master more than the superficial
word processing skills required to produce letters and short reports. They
should understand key concepts related to fonts, tables, figures, styles,
and document structuring. They should be able to migrate files between versions
of software, from one machine to another, to differing types of platforms,
and through varying media and networks – while maintaining the message behind
their content.
Since ETDs should be usable
across time and space, it is imperative, however, that access to them be through
suitable interchange formats, rather than transient, unpublished representations
produced by particular versions of word processing systems. Accordingly, ETD
initiatives have recommended widely used interchange formats like PDF, SGML,
XML, and the various schemes preferred for particular types of multimedia
content. As was mentioned in Section 1.3, it is preferred to have both a rendered
form, like PDF, and a descriptive form, like SGML or XML. However, when that
is not feasible, it is better to have one of these forms than to delay implementing
an ETD initiative.
Preparing a PDF document
The most popular page representation
scheme, a published de facto standard developed by Adobe, now being considered
as an international standard, is the Portable Document Format, PDF. Adobe
has promised to provide a Reader free of charge into the foreseeable future,
which will read current as well as previous versions of PDF, so that archives
of documents will remain easily usable. Adobe also provides tools for creating,
annotating, and manipulating PDF documents, through its own word processing
software, printer drivers, and distilling from PostScript. In addition, some
public domain tools work on the published PDF format, such as ghostview™.
Adobe’s Acrobat software,
installed on a Windows, Macintosh, or UNIX platform, allows most suitable
documents to be converted to PDF in moments. From word processors such as
Word, WordPerfect, and Framemaker™, each document portion can be “printed”
to the Distiller printer driver, yielding a PDF file. The Distiller converts
PostScript files to PDF files. Acrobat software allows multiple PDF files
to be assembled into larger PDF files by inserting documents or deleting pages
in an existing PDF file.
To avoid problems for future
readers, authors should embed all fonts in their documents (when that is allowed).
Otherwise, software displaying or printing PDF content will attempt to find
a similar font and extrapolate from it, which may cause serious problems.
Similarly, authors should use so-called “outline” fonts as opposed to bitmap
fonts, so that display and printing can proceed to scale characters as required.
Thus, when using TeX or LaTeX, the bitmap fonts commonly found in a standard
installation should not be used. Instructions at http://etd.vt.edu,
for example, explain how publicly available outline fonts can be obtained
and substituted. Related problems occur when bitmap images are included in
documents and scaled. Vector graphics, special outline font symbols, or object-based
image tools should be used instead when possible so that rendering in PDF
conveys the correct message. Most problems can be avoided by: planning in
advance, following the advice of knowledgeable authors, and testing samples
of all types of content that will be in the final ETD.
Preparing for conversion to SGML/XML
Converting from word processing
forms to SGML or XML (Standard Generalized Markup Language and Extensible
Markup Language, respectively) requires more planning in advance, different
tools, and broader learning about document processing concepts than does working
with PDF. In addition, the end result is a representation that is easier to
preserve, more reusable, and supportive of more powerful and effective schemes
for searching and browsing. All of these advantages, however, must be weighed
against the facts that there are fewer people knowledgeable about these matters,
that often tools to help are more expensive and less mature, and that the
process may be complicated, difficult, and time consuming. In 2000, there
are tens of thousands of ETDs created by scanning (mostly by UMI, but also
at sites like MIT and the National Document Center in Greece), thousands converted
from word processors into PDF, and hundreds in SGML or XML – illustrating
the relative effort required of students to prepare ETDs in each of these
forms.
SGML and XML are markup
languages. Both use tags, normally shown in between “<” and “>”
symbols, with names or labels inside, around sections of documents that are
thus “marked” or “bracketed”. Technically, structures describable this way
conform to labelled bracketed grammars. This means that parts are nested within
parts, just as subsections are contained within sections. The grammar or structure
scheme for a type or class of documents – e.g., book, article, poem, musical
score, or dictionary – is specified by a Document Type Definition (DTD). SGML
requires a DTD and so is used with well-understood documents. XML, being more
extensible while at the same time having stricter rules about closing tags,
employs DTDs optionally.
Word processing emphasizes
layout or what-you-see-is-what-you-get (WYSIWYG) editing. Emphasizing
what documents look like is quite distinct from focusing on the logical structure,
for which markup schemes are best. Shifting from word processing representations
to XML requires a different way of thinking, a different approach. The problem
is harder than producing HTML by exporting from a word processor, since instead
of just having a document that looks like the original it is necessary that
the marked-up version itself is correctly tagged.
Some word processors have
been extended to facilitate such an approach. Microsoft produced SGML Author
for Word™ as an add-on package for Word 95, and new versions of WordPerfect
can export content according to markup schemes. Eventually it is likely that
most popular word processors will export to XML. Clearly, the resulting markup
can surround document sections, headings, paragraphs, lists, figures, tables,
citations, footnotes, hyperlinks, and other obvious constructs. In addition,
regions with the same style can be tagged. Thus, to allow easy conversion
from word processing to markup schemes requires choosing a target DTD and
then consistently using document objects and styles so that there is a clear
mapping from them to tags.
Conversion from LaTeX is
slightly simpler since the TeX approach involves using formatting commands
that can be mapped to tags in XML. However, LaTeX does not require strict
nesting of commands, so it may not be clear where to place end-tags. Further,
LaTeX users may not consistently use the same sequences to designate changes
in structure, making translation more complex. Finally, LaTeX coding of mathematical
expressions is very difficult to translate to markup schemes for mathematics,
like MathML.
Because of the inherent
complexity of converting from word processing schemes to markup representations,
it is necessary to include steps for checking and correcting converted forms.
Parsers can ensure syntactic correctness, so detecting problems is often simple.
To ensure semantic correctness, however, manual inspection may be required.
A further test would involve rendering the marked-up document, for example
to a printed or PDF form, and ensuring that the result suitably matches the
output resulting from the original word processing version. In any case, human
labour is likely to be needed to correct conversion errors, and presupposes
that students understand enough about the process and desired output to accomplish
this with facility.
Writing directly in SGML/XML
Since having an ETD encoded
using SGML or XML is a desirable result, it also is appropriate to use special
word processors or other tools developed for directly producing marked up
documents. This is somewhat analogous to the process of directly producing
HTML, and no doubt a broad range of tools like those available for HTML will
eventually be suitable for XML authoring.
One approach, suitable for
experts, is to prepare a text document using a text processing tool or editor
like notepad, vi, or emacs. Then all tags must be manually
entered, and document structure specified by hand. Alternatively, structure
editors designed specifically for XML can be employed. Since the demand for
such is smaller than for conventional word processors, currently available
tools either are expensive, limited, or not very mature. Further, it is necessary
that a syntax checker or parser either be built into the editor, or used in
coordination with it, so that errors are quickly corrected.
Integrating multimedia elements
While most training related
to word processors covers conventional text documents, perhaps along with
simple drawings and inserted pictures, handling of multimedia portions of
an ETD is often best managed through separate processes. Tools and special
hardware exist for entering and editing complex graphics, images, sound, music,
animations, video, and interactive multimedia productions. On most campuses,
special laboratories or offices exist that have suitable facilities along
with experts who can train seriously interested authors. However, the learning
curve for such is often steep, and students should not lightly choose to include
multimedia content unless it really helps them express their research results
and/or will lead to skills they desire for the future.
Once produced, multimedia
content should be saved in a suitable standard form. International standards
like JPEG for images or MPEG for audio and video should be employed so that
in future years it will be easy to understand such content. Since such conversions,
however, may lead to some losses due to translation and compression, authors
may wish to include both the original multimedia content as well as the standard
version.
Similarly, as an aid to
those interested in reading an ETD, multimedia content may be included in
a number of forms. Thus, if a reader wants to view a video but only has moderate
bandwidth available to download the ETD, they may be satisfied with a much
smaller low-resolution version of a video. At the same time, another reader
with a faster connection may prefer to view a high-resolution version. Finally,
a reader with a very low bandwidth connection may want to see only a small
set of images that are key frames summarizing the video.
Ultimately, multimedia content
must be connected to the rest of an ETD. Usually the multimedia information
is stored in separate files. These may be referred to or even linked (through
hypermedia constructs) to the text or other multimedia constructs. One often
appropriate scheme is to have a thumbnail image in the body of the document,
which, when selected, links to a corresponding much higher resolution image,
and/or video.
Providing metadata – inside and outside documents
In addition to multimedia,
documents are often supplemented with metadata (i.e., data about data), typically
catalogue information. Through a series of meetings that started in January
2001, a metadata specification conforming to the Dublin Core (Dublin Core
Community 1999) standard and tailored to describe ETDs has been under development.
The aim is that eventually every ETD will have an associated metadata description
following that specification.
Such metadata can be included
inside an ETD, making it a self-describing document, especially when XML is
used. It is straightforward to encode Dublin Core based metadata in XML, and
that can be included near the beginning or in a header portion of an XML ETD.
This is similar to the practice with documents encoded in SGML according to
the TEI or TEI-lite standards, developed through the Text Encoding Initiative.
Alternatively, and clearly
required for previously prepared SGML or XML documents, or documents represented
in PDF, metadata can be a separate XML file that is associated or linked with
a particular ETD. Varying approaches to packaging data and metadata together
are possible. Note, however, that when metadata is separate, it is then possible
for it to be replicated, distributed, and harvested so that ETDs can be more
easily discovered without requiring that the actual ETD be examined. Indeed,
to allow such processing, even when metadata is included inside an ETD, it
is recommended that routines be prepared that can extract the metadata portion
to allow separate use.
Protecting intellectual property and dealing with plagiarism
Although in most cases it
is beneficial to share research results, so that others can learn from student
studies and give credit to them through citations, it is necessary to provide
various types of protection when desired by authors or to deal with potential
abuses. Automated schemes can help, such as watermarks, digital signatures,
and checksums; these are discussed further in the section below on producing
EDTs. Programs to detect plagiarism also can be used to compare a new ETD
with already available ETDs, ensuring that blocks of identical or similar
text are not copied. Further, education, training regarding ethical and professional
behavior, and suitable policies can support the guidance of faculty and university
staff to promote the spirit of scholarly investigation and collaboration.
Naming standards
To maximize portability,
students should name the various parts of an ETD using the lowest-common-denominator
standard for file names, typically the “8.3” form used in old systems like
DOS, where a name of no more than 8 alphabetic characters is followed by a
period and an alphabetic file type (e.g., pdf, jpg, mpg, txt, xml, sgm). If
possible, complex directory structures should be avoided and a simple flat
list used, also to ensure portability. Further, references to those names
should be relative, rather than absolute, e.g., as etd.pdf rather than c:\documents\etd.pdf
or /usr/student/thesis/etd.pdf.
Clearly, each file should
have a unique name. Similarly, each ETD in a collection should have a unique
and permanent identifier. Since each degree-granting institution can use a
unique identifier for their archive, every ETD in the world can have a unique
overall identifier made by composing the archive and ETD identifiers.
Submission of individual ETDs
Once a student has prepared
an ETD, in most institutions involved in NDLTD, they can submit their work
over the internet to a local or regional site for further processing. Following
local policies, procedures, and instructions, delivered through training sessions
or explained on a Web site, they will typically invoke a Web browser on the
computer where their ETD resides. The workflow usually involves them entering
a password or other authentication of their identity, filling in a form that
provides needed metadata information, and uploading each of the files in the
ETD “package”. Since they will supply their email address during this process,
they can be notified, by those enforcing quality control standards in the
graduate program and library, regarding any corrections or missing data they
must supply, as well as when key stages in the approval process are achieved.
Becoming a researcher in the electronic age
In addition to learning
about word processing, electronic document processing, and key concepts related
to digital libraries, students also must gain other skills in order to be
prepared to be researchers and scholars. They must be ready to meet future
challenges of the electronic age, where technology continues to advance, often
leading to changes in common practices that may save time or improve accuracy.
Caution regarding unproven technology is sensible, but straightforward advances
like increases in computing and networking speeds, or decreases in prices
of experimental equipment, may be unwise to ignore. Further, innovations may
lead to tools dramatically aiding their investigations. Thus, learning to
deal with change is part of the wisdom that scholars must develop to survive
in the complex modern world.
At the same time, scholars
must remained anchored by core values such as honesty, integrity, curiosity,
ingenuity, generosity, friendship, diligence, perseverance, and responsibility.
They must follow the dictates of society and ethics as well as reason and
truth. They should give credit as due to those who have helped them or advanced
knowledge in ways related to their work.
With the aid of faculty
and colleagues, following departmental and other local and discipline-specific
practices, they must choose what type of access is appropriate to the various
parts of their ETD, and when. Generally, the decision will be simple, allowing
universal access to the entire work. If they must limit access, it is recommended
that they do so for as short a time as possible and for as few parts of the
ETD as is necessary, to maximize the amount and duration of access. In general,
scholars are rewarded most by sharing their discoveries as widely as possible,
but in today’s entrepreneurial world they may seek patent protection in order
to have time to commercialize their work, if it involves one of the small
number of inventions that are ready for technology transfer. If publishing
is appropriate, on the other hand, they should seek to ensure that their ETD
is available as well as any related prior or derivative works released in
the form of articles or books. In some cases they may be required to delay
access to (part of) the ETD (for some period of time). What is most important
in all this, however, is that students and faculty honestly confront their
responsibilities as scholars, learn key concepts related to intellectual property
rights, respect laws and policies, follow contracts and agreements with sponsors
and publishers, and strive to achieve balance among the many conflicting opportunities
and demands they face. All in all, preparing an ETD should greatly expand
the learning experience of graduate researchers, thus helping better prepare
the next generation of scholars for the Information Age.