dec

Guide to Risk Assessment and Biosafety in Biotechnology, GRABB

An Initiative of the United Nations Environment Programme (UNEP)

SECTION:		INTRODUCTION
TITLE:		The Ability of a Decentralized Systems Design to Meet the Requirements of IRRO
BY:		David A Portyrata
LABEL:	DEC		UPDATED:	31 Dec 1997

Contact IRRO Secretariat

The function of an Information Resource on Release of Organisms would be to provide a system for the distribution and processing of data at an international level. Such a system will have to contend with a wide range of problems. These problems will include funding restrictions, availability of support personnel, local levels of available technology, special interests, local and international politics, the amount of data to be processed and response time. Taking these problems or parameters into consideration, one immediately sees that any kind of international network will have to be highly flexible if it is to ever be successful. These same parameters also point to the selection of a decentralized network with a distributed - partitioned database. (The term 'database', as used in this dsed in this discussion, refers to all the information in the system available to the user, regardless of location.)

A decentralized network would consist of two or more nodes interconnected by one or more lines of communication. A node would be any processing center or collection recognized as a member of the network. The nodes may consist of computer centers, local area networks, long distance networks, personnel computers or some type of manual filing system. There are many advantages to allowing for a diverse node composition. (1) The network can take advantage of existing technology thereby cutting down on costs. (2) Individuals with desirable collections who do not have access to high technology can be incorporated into the network. (3) A node can be upgraded without a major impact to the rest of the network. For example, a node may automate storage of their collection data. (4) The elimination of a node for any reason will not dramatically affect the rest of the network. (5) The network will be k will be able to incorporate additional nodes as it expands.

The communication links between nodes can take the form of satellite and land line telecommunications, correspondence or personnel contact. Telecommunications can consist of voice, data and teletype modes of communications. Depending upon the resources of the nodes, this may or may not involve the actual transmission of data. If data cannot be transported via a tele- communications link, then it will have to be physically transported via forms, magnetic tape, floppy disk or some other recording medium. The form of data transmission will depend upon the resources of the sending and receiving nodes. The network database will most likely be distributed throughout the various nodes in the network. Each node would have one or more subsets of the database with some overlapping occurring among nodes. This overlapping will most likely be due to those nodes which support users with similar interests. The concept of distributing parts of a database wdatabase with little or no replication is often referred to as a distributed-partitioned database by computer specialists.

Any design of a database for the network must contend with the same problems which confront the design of the proposed network. The advantages of a distributed database make it a strong candidate for the proposed network's database. However, there are problems associated with a distributed database. These problems will be mentioned later on in this paper.

There are several immediate advantages to a distributed- partitioned database. (1) The database and processing facility can be tailored to the needs of the local users. (2) Close proximity of the data to those most likely to use it will result in better response time and reduced processing costs. (3) Since the data is of local interest, funding support will be easier to obtain. (4) Users will have an incentive to maintain the database if it is in their interests. (5) Since local users will be familiar with tliar with the database, they will be better able to respond to queries on the contents of their database. (6) Changes to the format of the data will be localized and will not create a major impact to other users of the overall database. (7) Users will feel that they have control over their data which may allay any fears they have about security. (8) A distributed database cannot be completely controlled by any one group which would limit access by other users. (9) Since the data would be distributed throughout various nodes, the loss of a node will not dramatically affect the rest of the database. Consider what would happen if the database was centrally located and that node was lost to the rest of the network; the network would be either useless or lose a major part of the database. (10) Additional data can be added to the database by incorporating new nodes into the network without requiring that node to reformat its database. This gives the network a cheap and quick method of expanding the database. This ae. This also gives the database a degree of flexibility which is crucial to its ability to accommodate new users. (11) The small subsets of the database will be easier to maintain and backup thereby reducing the strain on any existing support personnel. This will also reduce operating costs for the overall network and database.

Distributed-partitioned databases do have disadvantages which are magnified by the scope of the project. These disadvantages raise problems which must be addressed by the international community, preferably before implementation of the network and network database.

Standardization of the data format will be almost impossible due to the diversity of existing databases, conversion costs, limited resources and sociol-political factors. However, in order to facilitate the distribution (transportation) of data, it will be necessary to develop standard formats for the transportation of data. The format definition will have to be both physical and logical. Physiccal. Physical refers to how the data is physically transported, i.e. forms, magnetic tape, records, etc. Logical refers to the contents and sequence of data, i.e. type of data, what is stored in each field, etc. These standards would allow the various nodes to develop standard procedures for incorporating incoming data and sending out data to various other nodes or users. This also gives new nodes sources of existing standards which will reduce costs. It is important that these standards be flexible enough to accommodate existing nodes and any new nodes into the network.

Quality control of the database will be hard to enforce due to the geographical range of the network's nodes. Most likely, the enforcement of quality control will have to be done at the node level. Data quality will have to be addressed as many users will not want to knowingly or unknowingly use data of questionable quality. The international community will have to issue quality control guidelines which can be given to members members of the network.

Since the data would be distributed among the nodes in the network, a mechanism will have to exist which will allow users to locate sources of data throughout the network. Although each node in theory would be responsible for maintaining a directory for its data, a master directory will have to be developed for international use. This directory may also contain a list of resources and capabilities of each node to assist users in their requests for information. The responsibility for overseeing the maintenance of this master directory would have to be delegated to one of the nodes in the network which would function as the headquarters node. The location and final responsibilities of the headquarters node will have to be decided among the international community. This master directory could be updated on a periodic basis with updates submitted by the various nodes in the system. In cases of major updates, special bulletins could be issued to other nodes in the network. &network.

One other problem which must be addressed is that of data security. This problem is rather complex in that determining who has access to the database or to specific parts of the database can raise additional questions related to security and access. Many of the users may have some data which they do not want to make public for some reason or other. There is also the problem of determining who is an authorized user of the network and has access to the database. This security problem is usually amplified by a distributed-partitioned database due to the fact that the data is not stored confidently at one location where security can be rigidly enforced. However, data security is a problem which plagues all existing networks and databases in existence today. In any case, the international community will have to address this problem before any serious attempts to implement a network are undertaken.

Many arguments can be made for a centralized network and database. Centralized entralized systems usually rely on high technology, such as large computer centers, with some type of telecommunication links to the various users. All this technology is expensive and not available worldwide. Systems analysts have found that most of the centralized systems which have had to support a large database with a diverse group of users have failed to meet expectations. This is due mainly to the amount of time and expense required to meet the needs of a constantly changing group of users. Those centralized systems which have succeeded are usually limited to a relatively small database with a group of users with a narrow range of interests. A good example would be systems used by the banking industry to keep track of their transactions. In general, it has been found that once a large database must be modified to allow for user expansion, a centralized system starts to break down. Where the centralized system starts to break down, a well designed distributed system usually starts to thrive.

A d

A decentralized system will be able to encourage new users because of its ability to absorb them and their data into the system. This will inevitably lead to greater user participation. This will in part be because the users will not be alienated from the network and their data due to technological barriers. As the users of the system come in contact with each other through use of the system, information and assistance on how to improve their resources will most likely be exchanged. This information will be quite useful to those nodes which are considering automating storage of their data. Users with similar interests will be able to exchange data thereby consolidating and expanding their own collections.

The possibilities of the use of such a network are endless and are only limited by time, money and user imagination and initiative. The network will be in a state of constant evolution as a result of this dynamic interaction. Technological changes and increases in database and user diversity wilversity will also contribute to the evolutionary process. For these reasons, a distributed network with a distributed-partitioned database seems to be the best qualified to meet the challenge of an Information Resource on Release of Organisms.

Guide to Risk Assessment and Biosafety in Biotechnology, GRABB

INTRODUCTION

The Ability of a Decentralized Systems Design to Meet the Requirements of IRRO

David A Portyrata