Towards_modular for ps

arallel neural computer,” Connectionism in a Broad Perspective: Selected Papers from thewedish Conference on Connectionism - 1992, Niklasson and Bodén Eds. Ellis Horwood, TOWARDS MODULAR, MASSIVELY PARALLEL
NEURAL COMPUTERS
Bertil Svensson
Department of Computer Engineering, Chalmers University of Technology, Göteborg, Sweden and Centre for Computer Science, Halmstad University, Halmstad, Sweden Tomas Nordström
Division of Computer Science and Engineering, Luleå University of Technology, Luleå, Sweden Kenneth Nilsson and Per-Arne Wiberg
Centre for Computer Science, Halmstad University, Halmstad, Sweden email: [email protected], [email protected] A new system-architecture, incorporating highly parallel, communicating processing modules,is presented as a candidate platform for future high-performance, real-time control systems.
These are needed in the realization of action-oriented systems which interact with their envi-ronments by means of sophisticated sensors and actuators, often with a high degree of parallel-ism, and are able to learn and adapt to different circumstances and environments. The use ofartificial neural network algorithms and trainability require new system development strategiesand tools. A Continuous Development paradigm is introduced, and an implementation of this,in the form of an interactive graphical tool, is outlined. The architectural concept is based onresource adequacy, both in processing and communication. Learning algorithms are cyclicallyexecuted in distributed nodes, which communicate via a shared high-speed medium. The suita-bility of SIMD (Single Instruction stream, Multiple Data streams) processing nodes for ANNcomputations is demonstrated. An implementation of the system architecture is presented, inwhich distributed SIMD-nodes access their data from local real-time databases, updated withdata from the other nodes via a shared optical link. Keywords: Parallel processing; learning systems; neural networks; action-oriented systems,control system design, real-time computer systems.
1 INTRODUCTION
“Action-oriented systems”, as described by Arbib [Arbib, 1989], interact with their environ-ments by means of sophisticated sensors and actuators, often with a high degree of parallel-ism. The ability to learn and adapt to different circumstances and environments are among thekey characteristics of such systems. Development of applications based on action- orientedsystems relies heavily on training, rather than progamming of the detailed behaviour.
Response time requirements and the demand to accomplish the training task point to mas-sively parallel computer architectures. A network of homogeneous, highly parallel modules isforeseen. The modules perform perceptual tasks close to the sensors, advanced motoric con-trol tasks close to the actuators, or complex calculations at “higher cognitive levels”. The newsystem-architectural concept that we introduce for the implementation of this kind of highly parallel real-time systems is based on the principle of resource adequacy [Lawson, 1992b] inorder to achieve predictability. This means that enough processing and communicationresources are designed into the system and statically allocated to guarantee that the maximumpossible work-load can always be handled.
Not only do these trainable control systems require new architectural paradigms, they alsorequire the acceptance of new system development philosophies. The traditional application-development model, characterized by a sequence of development phases, must be replaced byan interactive model based on training.
Both the system development model and the architectural paradigm are first presented on theconceptual level and then examplified by describing implementations meeting the demands oftypical advanced real-time control tasks. Specifically, this paper points to the possibilitiesbased on multiple SIMD (Single Instruction stream, Multiple Data streams) arrays on whichstatic allocation of processing tasks is made and on the power and appeal of graphical applica-tion-development tools. We have shown, by own implementations and detailed studies, as well as by reviewing theimplementations of others, that typical neural network algorithms used today map efficientlyonto SIMD architectures [Nordström and Svensson, 1992]. Based on this, and the discussionabove, a hypothetical architecture for Artificial Neural Systems (ANSs) would look like the Figure 1. A multi-module architecture for an action-oriented system Different modules (SIMD arrays) typically execute different Artificial Neural Network (ANN)models, or different instances of the same model. Full connectivity may be used within themodules, while the communication between modules is expected to be less intensive(although we will also devise solutions that satisfy the potential demand for tighter connec-tions between pairs of modules).
The work is part of REMAP3, the Real-Time, Embedded, Modular, Action-oriented, ParallelProcessor Project, partly funded by STU/NUTEK, the Swedish National Board for Technicaland Industrial Development, under contracts No. 9001583 and 9001585.
2 LEARNING ALGORITHMS AND MODULE ARCHITECTURE
Studies of the brain indicate that adaptation takes place in basically two ways: by changing thestructure and by changing the synapses (connection strengths in the structure). The first onehas the nature of long-term adaptation and often takes place in the first part of an animal's life.
The second one, the changes of connection weights (the synapses), is a more continuous proc-ess and happens throughout the animal's entire lifetime.
Modeled after this, the design of an action-oriented system should first be concerned with theprocess of selecting and connecting (possibly adapting) ANN structures and other signalprocessing structures. Later, the system moves into a tuning phase and a state of continuouslearning. The two stages described may also be interleaved in an iterative fashion, which callsfor some kind of incremental or circular development model as will be described later.
Only very few of the most used ANN models are found in the context of continuous learning,but with minor modifications most of them can be turned into a continuous learning model.
The mapping of ANN algorithms onto highly parallel computational structures has beenwidely investigated. A summary is provided in [Nordström and Svensson, 1992], where proc-essor arrays of SIMD type are pointed out as the major candidate architecture for fast generalpurpose neural computation. AWe have performed detailed studies ofthe execution of the predominant ANN models on this kind of computing structures [Gustafs-son 1989, Svensson 1989, Svensson and Nordström 1990, Nordström 1991a, Nordström1991b]. The mappings of the models and the results obtained are summarized in the subse-quent subsections. A major conclusion is that broadcast or ring communication among theProcessing Elements (PEs) of the array can be very efficiently utilized and actually providesthe necessary means for communication within the array. Multiplication is the single mostimportant operation in ANN computations. In bit-serial architectures, which have been ourprimary target, there is therefore much to gain if support for fast multiplication is added. Insome of the ANN models, for example Sparse Distributed Memory (SDM), tailored hardwareto support specific PE operations pays off very well.
The system architecture, described later, permits two or more modules to be linked together toform a larger module, if necessary. This linking may be done either over the communicationmedium, in which case the intermodule communication shares time with all modules of the system, or over a separate medium. In the latter case the cooperating modules form a clusterwith more available bandwidth for internal communication. Special “dual-port” nodes formthe interface between the cluster and the main medium.
2.1 Parallelism in ANN Computations
As described more thoroughly in [Nordström and Svensson, 1992], six different dimensionsof parallelism can be identified in neural network computations: Node parallelism and weightparallelism are the two most important for consideration in a parallel implementation for usein real time. Node parallelism means treating all, or several, nodes in a layer simultaneouslyby several PEs. Weight parallelism means treating all, or several, inputs to a node simultane-ously. The two forms of parallelisms may be combined. In typical ANN applications thedegrees of these two forms of parallelism are usually very high (hundreds, thousands,.). Thesame, or even higher, degrees are available by training-session and training-example parallel-ism, but these forms are not available for use in real-time training situations, thus are of minorimportance in action-oriented systems. Layer parallelism (treating all layers in parallel and/orgoing forward and backward simultaneously) and bit-parallelism (treating all bits in a dataitem in parallel) complete the picture, but the degrees of these are seldom greater than theorder of ten.
In the architectures and mappings described in subsequent sessions we find it practical to referto the different dimensions of parallelism as defined above.
2.2 Feedforward Networks with Error Backpropagation
The mapping of feedforward networks with error backpropagation on highly parallel arrays ofbit-serial PEs is described in [Svensson, 1989] and [Svensson and Nordström, 1990]. Nodeparallelism is used. A quite simple bit-serial multiplier structure using carry-save technique[Fernström et al. 1986] is added to the basic PE design. By this, multiplication time is equal-ized to addition time. When performing multiply-and-add operations, which is the dominatingoperation in this algorithm, both units work in parallel. Connection weights are stored inmatrices, one row of the matrix per PE module.
An interesting result is that the computations do not require the PE array to have a very richcommunication structure. The facilities needed are the ability to broadcast a single bit fromany processor to all others, a means for selecting processors in order, one by one, and a bit-serial adder tree to add the values of a field. As an alternative to broadcast, ring communica-tion may be provided; in that case the adder tree is not needed.
A typical module (about the size of one small printed-circuit board using common state-of-the-art technology) would be a 1024 PE array of bit-serial processors incorporating a bit-serialmultiplier. Such an array is capable of training at 265 MCUPS (Million Connection UpdatesPer Second) or recall at 625 MCPS (Million Connections Per Second) using 8-bit data at 25MHz. A four-layered feedforward network with 1024 neurons per layer would run at the speedof 85 training examples or 200 recall examples per second.
2.3 Feedback Networks
As reported in [Gustafsson, 1989] and [Svensson and Nordström, 1990], a simple PE arraywith broadcast or ring communication may be used efficiently also for feedback networks(Hopfield nets, Boltzmann machines, recurrent backpropagation nets, etc.). The MCPS meas-ures are, of course, the same as above. On a 1024 PE array running at 25 MHz, 100 iterationsof a 1024-byte input pattern takes 106 ms.
2.4 Self-Organizing Maps
[Nordström, 1991b] describes different ways to implement Kohonen’s Self-Organizing Maps(SOMs) [Kohonen, 1990] on parallel computers. The SOM algorithm requires an input vectorto be distributed to all nodes and compared to their weight vectors. This is efficiently imple-mented by broadcast and simple PE designs. The subsequent search for minimum is extremelyefficient on bit-serial processor arrays. Determining the neighbourhood for the final updatepart can again be done by broadcast and distance calculations. Thus, also in this case, broad-cast is sufficient as the means of communication. Node parallelism is, again, simple to utilize.
Efficiency measures of more than 80% are obtained (defined as the number of operations persecond divided by the maximum number of operations per second available on the computer).
2.5 Sparse Distributed Memory
Sparse Distributed Memory (SDM), developed by Kanerva [Kanerva, 1988], is a two-layerfeedforward network, but is more often – and more conveniently – described as a computermemory. It has a vast address space (typically 10300 possible locations) which is only verysparsely (of course) populated by actual memory locations. Writing to one location influenceslocations in the neighbourhood (e.g. in the Hamming-distance respect) and, when readingfrom memory, several neighbouring locations contribute to the result. The SDM algorithm requires distribution of the reference address, comparison and distancecalculation, update, or readout and summation, of counters at the selected locations. Nord-ström [Nordström, 1991a] identifies the requirements for these tasks and finds a “mixed” map-ping (switching between node and weight parallelism in different parts of the calculation) thatis especially efficient. A counter in the place of the multiplier in the bit-serial-PE based architecture described abovemakes the array especially efficient for SDM. A 256 PE REMAP3 realization with counters isfound to run SDM at a speed 10 - 30 times faster than that of an 8K PE Connection MachineCM-2 (clock frequencies equalized). Already without counters (then the PEs becomeextremely simple) a 256 PE REMAP3 outperforms a 32 times larger CM-2 by a factor of 4 -10. One explanation of this is the more developed control unit of REMAP3 which makes themixed mapping possible to use.
3 APPLICATION SYSTEM DEVELOPMENT
Increased flexibility, adaptability, and the potential to solve some hard problems are the mainreasons for introducing ANN in real-time control systems. A new development philosophy,that allows conventional control engineering and ANN principles to be mixed, is required.
3.1 Trainability in Real-Time Control Systems
The most common development philosophy today in the domain of computer-based systemsis the “sequence of phases” strategy, often referred to as the waterfall model [see, e.g., Som- Analysis
Implementation
The sequence of phases is no longer relevant when trainable systems are to be developed. Atrainable ANN system may be considered as having two parts: structure and data. The struc-ture is the ANN algorithms and the hardware architecture. The data is the information that thesystem gets from the environment and the stored information that yields the behaviour of thesystem (e.g., the connection weights). In most of the models that have been suggested so farthe structure is static in the sense that it is not changed by the system itself, but there is aninteresting development going on towards dynamic structures. The stored information can bestatic after a training session or dynamic meaning that the environment constantly influencesthe system’s behaviour. In a development model feasible for trainable systems, the analysis activity has similarities tothe waterfall model in sorting out the demands on the system, but turning these demands into,e.g., functions or objects is not relevant here. In contrast to programmed systems the maindesign task is to determine an adequate set of ANN-algorithms and a system architecture. Thisdoes not give the system its function, which is an important difference to conventional sys-tems. The function of the system is given by training, either in a special training session or byrunning the system in its proper environment. To describe development of trainable systems we need a circular development model as illus- Figure 5. The circular development model.
In contrast to the waterfall model, where system development is considered as a project andmaintenance as a process, the circular development model incorporates development andmaintenance as two activities in the same process. The parts of this process are:.
Analysis. Each instance of this activity handles a portion of the demands that the system is tofulfil. The treated demands may have impact on the system as a whole or only a small part ofit.
Design. To meet the demands, existing algorithms are tested/modified or new ones are devel-oped. This design style can be compared to rapid prototyping to encourage the creativity ofthe developer. The activity leads to a structure which includes ANN-algorithms and conven-tional control algorithms.
Training. When the structure of the system is updated, the system is given its new propertiesby exposing it to environment data or a set of training data. Training may be a part of the oper-ation activity but can also be a separate activity succeeded by verification.
Verification. In most cases the updated trained structure of the system has to be verified beforeletting it influence the environment. In this activity the developer can use own data or datafrom the environment and structures dedicated to verification. Operation. There is no sharp distinction between operation and other activities. The behaviourof the system might change constantly during operation due to adaptation. The system mighthave only a fraction of its functionality implemented but still be a good test-bench for analy-sis, design and training.
In control applications the security aspect is often emphasized. Letting ANN-based systemsact on the environment without special precautions could lead to severe problems. It is a majorresearch challenge of neural control engineering to devise solutions for handling these mat-ters. One possible approach is to have a "security shell" which gives limits for the outputsfrom the ANN algorithms.
3.2 The Continuous Development Paradigm.
To support the development model we introduce the Continuous Development Paradigm (CD-paradigm). This paradigm can be expressed as “Development by changing and adding”. Thisis a well-known approach in modern Software Engineering but in this context the aims are extended to include both hardware and software. A development environment to support theuse of the CD-paradigm should share the following characteristics: Easy to change the system structure (hardware and software) and data “on the fly”.
Incremental Development using the running system as development platform.
No undesired side-effects on the already tested parts of the system
System data and structures can be viewed with emphasis on understandability.
Developer gets immediate response to a change of the system.
Developer can use concepts and symbols of the application domain.
3.3 An Implementation
We describe an implementation of a system development tool based on the CD-paradigmdescribed above.The most important features of the tool are: Graphical developer’s interface.
Cyclic execution with temporal deterministic behaviour.
Dynamic change of the running software.
Dynamic inspection/change of data “on the fly”.
Change of the distributed hardware “on the fly”.
Use of symbols and concepts from the domain of control engineering and ANN.
The tool is used to develop applications running on a set of distributed, communicating nodes.
Each node is to have a cyclically executing program. The cyclical execution scheme is chosenin order to achieve a time-deterministic behaviour. The cycles have two parts: the Monitor andthe Work Process. The Monitor (i)starts on a given time (a new dt has passed), (ii) takes careof input data that has arrived during the previous cycle and prepares output data that is to bedistributed during the present one, (iii) handles program changes, and (iv) starts the WorkProcess.
Aferent pathsof the Work Process are indicated. Continuous lines indicate processing that consumes time,dotted lines show idle processing, and lines splitting up means a selection in the control flow.
The development tool guarantees that the worst case branch is within the cycle time, dt.
Figure 6. Temporal view of Monitor and Work Process 3.3.1 Graphical Developer’s Interface
To support the CD-paradigm and demands of understandability the developer's interface to thesystem is an interactive graphical tool. The most basic properties of the tool are outlinedbelow.
All development is done on a system in operation. That is, a system operating in real time
but not necessarily affecting the system environment.
Hierarchical way of describing the application. The levels of abstraction span from the ins-
tructions of the node control unit to the abstract concepts of the application.
Support for reuse of system components. Part of the tool is a browser where system compo-
nents (processes, data, and connections) are stored.
Tools for viewing data. Data can be viewed in various ways, e.g. using bargraphs, dia-
On the highest level (system level) the user works with a display showing an overview of aThis is actually a map ofthe system configuration.
Figure 7. System level display (left) and basic symbols.
The user may open up a process symbol to work with a graphical specification on the nodelevel. an example of such a display. In the Work Area (WA), surrounded on both sides by the Inputand Output areas, respectively, the designer can place symbols that specify the operation ofthe node. The placement of symbols in WA has temporal meaning relative to a time scale Tthat indicates the total time of the process. Every symbol in WA can be opened to move thedesigner one level of abstraction lower in the system hierarchy. When the designer places asymbol in WA, using the browser, the corresponding process will be added to the execution thread. The designer can then immediately use the inspection tools to verify the function ofthe added process. Figure 8. Node level (or lower levels) display 4 SYSTEM ARCHITECTURE AND INTERMODULE COMMUNICATION
4.1 Concept
The system-architectural concept is based on the notions of nodes, channels, and local real-time databases: Nodes, which differ in functionality, are communicating via a shared medium. Input nodesdeliver sensor data to the rest of the system and may perform perceptual tasks. Output nodescontrol actuators and may perform motoric control tasks. Processing nodes perform variouskinds of calculations. I/O nodes and processing nodes may have great similarities but, becauseof their closeness to the environment, I/O nodes have additional circuits for interfacing to sen-sor and actuator signals.
Communication between nodes takes place via channels. A communication channel is a logi-cal connection on the shared medium between a sending node and one or more listeningnodes. The channels are statically scheduled so that the communication pattern required forthe application is achieved. This is done by the designer. Two types of data are transportedover the medium: Code changes are distributed to the nodes to allow modifications "on thefly" of the cyclically executed programs in the nodes. Process data informs the nodes aboutthe status of the environment (including the states of other nodes). If the application requiresintensive communication within a set of related nodes a hierarchical communication can beset up. The related nodes form a cluster with more available bandwidth on the internal chan-nels. Rather than being individual signals, the process data exchanged between the nodes is morelike patterns, often multi-dimensional. Therefore, the shared medium must be able to carrylarge amounts of information (Gigabits per second in a typical system).
Every node in the system executes its program in a cyclic manner. The cyclically executedprogram accesses its data from a local real-time database (LRTDB). This LRTDB is updated,likewise cyclically, via channels from the other nodes of the system. The principle of resource adequacy, the cyclic paradigm and the statically scheduled commu-nication via the LRTDBs imply the time-deterministic behaviour of the system which is soimportant in real-time applications (cf [Lawson, 1992a]).
establishes a channel to an executing node when it needs to send program changes. Instruc-tions along with address information are sent to the executing node where the monitor makesthe change between two executions of the Work Process.
Figure 9. Multi-node target system and multiple-workstation development system The Development Node is connected to a Local Area Network (LAN) of workstations (WS)running the development system. The LAN connection can be removed without affecting therunning system.
For inspection of the LRTDB and other local data the Development Node opens channels inthe same way as when other process data is moved between nodes.
4.2 Implementation
Implementations of the processing modules have been briefly described in earlier sections (seearchitecture. A more detailed description is given in [Nilsson et al., 1992].
An all-optical network (the entire path between end-nodes is passive and optical) is used asthe shared medium. Communication channels between SIMD Nodes are established by time-multiplexing (TDMA) in a statical manner. In every scheduled time slot there is one senderand one or more listener (broadcast). If higher capacity is needed, WDMA (wavelength division multiple access) may be usedinstead. Then, scheduling of communication is not required. The nodes scan the wavelengthspectrum to fill their LRTDBs. The scanning can be statically determined or a function of theinternal state of the node. As an interesting future possibility, it may also be trained.
Broadcast implies that it is important to synchronize the communication. The synchronizationis done via a global, distributed optical clock. Alternatively, a communication slot can be sev-eral time slots, which gives a slower communication speed. is done by a factor k by means of shiftregisters (k is the size of the PE array, e.g. k=256). It isimportant to synchronize the dataflow with the shift clock. This is done by sending the clockand the data in the same medium. Clock and data use different wavelengths (f1 and f2), imply-ing that the communication interface must include two laserdiodes and two optical filters (F)for the flow of process data.
In addition to the exchange of process data there is also a distribution of code caused by pro-gram changes made "on the fly". Figure 10. Communication interface. T is transmit, R is receive. Grey boxes indicate the optical/ Due to the high speed the communication interface must be integrated into one IC to workproperly. Today there are shiftregisters available implemented in GaAs-technology for veryhigh speed (some Gbit/s). The GaAs-technology also gives the possibility to integrate opticaldevices with logic. The topology of the all-optical network is a star, which has a decibel lossproportional to logN, while a bus topology has one proportional to N (N is the number ofnodes in the system) [Green, 1991].
The SIMD Module accesses data from its own local real-time database (LRTDB) reflectingthe status of the environment. The LRTDB is implemented as a dual-port memory. At one sidethe SIMD Module accesses data; at the other side the control unit in the communication mod-ule is updating the LRTDB via the communication interface. The control unit cyclically exe- cutes the statically scheduled send and receive commands necessary for carrying out the 4.3 REMAP Prototype Development
REMAP3 is an experimental project. A sequence of gradually evolved prototypes is beingbuilt, starting with a small, software configurable PE array module, implemented as a Mas-ter’s thesis project [Linde and Taveniku, 1991]. With only slight modifications in PE arrayarchitecture, but with a new high-performance control unit, the second prototype is now beingbuilt [Bengtsson et al., 1991], almost full-scale in PE number, but far from miniaturizedenough for embedded systems. The early prototypes rely on dynamically programmable logic cell arrays (FPGAs) [Linde etal. 1992]. Therefore, different variations of the prototypes can be realized by reprogramming.
The FPGAs are designed for high speed. Thus, the speed and the logical size of the prototypesystems suffice for new, demanding applications, but the physical size does not allow embed-ded multi-module systems to be built from the prototypes. Based on the experiences from the FPGA-based prototype modules, a design for a VLSIimplemented module that can be used in multi-node systems as described above will be made.
5 CONCLUSION
This paper points to the strength of combining massively parallel architectures, trainability,and incremental development environments. The SIMD paradigm combines single-threadedprogramming with multiprocessing power and easy miniaturizing for embedded systems. Wehave presented a massively parallel system architecture based on multiple SIMD processorarrays to allow the implementation of real-time, ANN-based training using interaction-basedsystem development tools.
The presented system architecture and development model are intended to be used in biologi-cally inspired design of control systems [Kuperstein, 1991; Singer, 1990], where sensory,motoric, and higher cognitive functions are mapped onto nodes or clusters of nodes.
6 REFERENCES
Arbib, M.A. (1989). Schemas and neural networks for sixth generation computing. Journal of Parallel and Dist- ributed Computing, Vol. 6, No. 2, pp. 185-216.
Bengtsson, L., A. Linde, T. Nordström, B. Svensson, M. Taveniku, and A. Åhlander (1991). Design and imple- mentation of the REMAP3 software reconfigurable SIMD parallel computer, Fourth Swedish Workshop on Computer Systems Architecture, Linköping, Sweden, January, 1992. Available as Research Report CDv-9105 from Centre for Computer Science, Halmstad University, Halmstad, Sweden.
Fernström, C., I. Kruzela, and B. Svensson (1986). LUCAS Associative Array Processor – Design, Programming and Application Studies. Vol. 216 of Lecture Notes in Computer Science, Springer Verlag, Berlin.
Green, P.E. (1991). The future of fiber-optic computer networks. Computer, Vol. 24, No. 9.
Gustafsson, E. (1989). A mapping of a feedback neural network onto a SIMD architecture, Research Report CDv-8901, Centre for Computer Science, Halmstad University, May 1989.
Kanerva, P. (1988). Sparse Distributed Memory. MIT Press. Cambridge, MA, USA.
Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE. Vol. 78, No. 9. pp. 1464-1480.
Kuperstein, M. (1991). INFANT neural controller for adaptive sensory-motor coordination. Neural Networks, Lawson, H.W. (1992a). Cy-Clone: an approach to the engineering of resource adequate cyclic real-time systems. The Journal of Real-Time Systems. Vol. 4, No. 1, pp. 55-83.
Lawson, H.W. (1992b), with contributions by B. Svensson and L. Wanhammar . Parallel Processing in Indu- strial Real-Time Applications. Prentice-Hall, Englewood Cliffs, NJ, USA.
Linde, A. and M. Taveniku (1991). LUPUS – a reconfigurable prototype for a modular massively parallel SIMD computing system. Masters Thesis Report No. 1991:028 E, Division of Computer Engineering, Luleå Uni-versity of Technology, Luleå, Sweden (in Swedish).
Linde, A., T. Nordström, and M. Taveniku (1992). Using FPGA to implement a reconfigurable highly parallel computer. Second International Workshop on Field-Programmable Logic and Applications, Vienna, Aus-tria, Aug. 31 – Sept. 2.
Nilsson, K., B. Svensson, and P.A. Wiberg (1992). A modular, massively parallel computer architecture for trai- nable real-time control systems. AARTC ‘92: 2nd IFAC Workshop on Algorithms and Architectures for Real-Time Control, Seoul, Korea, Aug.31 – Sept. 2.
Nordström, T. (1991a). Sparse distributed memory simulation on REMAP3. Research Report No. TULEA 1991:16, Luleå University of Technology, Luleå, Sweden.
Nordström, T. (1991b). Designing parallel computers for self organizing maps. Research Report No. TULEA 1991:17, Luleå University of Technology, Luleå, Sweden.
Nordström, T. and B. Svensson (1992). Using and designing massively parallel computers for artificial neural networks. Journal of Parallel and Distributed Computing, Vol. 14, No. 3, pp. 260-285.
Singer, W. (1990). Search for coherence: a basic principle of cortical self-organization. Concepts in Neurosci- ence, Vol. 1, No. 1, pp. 1-26.
Sommerville, I. (1989). Software Engineering. 3rd ed. Addison-Wesley, Reading, MA, USA.
Svensson, B. (1989). Parallel imlementation of multilayer feedforward networks with supervised learning by back-propagation, Research Report CDv-8902, Centre for Computer Science, Halmstad University, Halm-stad, June 1989.
Svensson, B. and T. Nordström (1990). Execution of neural network algorithms on an array of bit-serial proces- sors. Proceedings of 10th Internatinal Conference on Pattern Recognition – Computer Architectures for Vision and Pattern Recognition, Atlantic City, NJ, USA, June 1990, Vol. II, pp. 501-505.

Source: http://www.nordstrom.nu/tomas/publications/SveEtAl_1992_SCC.pdf

Mystery

Tropical forest biochemistry, the driving force in human evolution. The evolution of the large human brain remains one of biology’s greatest unsolvedmysteries. Primates generally have a relatively large brain to body ratio, apes, theextinct hominids and particularly humans have taken this trait to extreme. No theoryto date has come close to explaining this phenomena. Some of the key element

Microsoft word - media release - internal audit survey 2013_020913

For immediate release 2 September 2013 Internal audit must gear up for a wider role Future responsibilities to include enterprise risk, information technology, governance and fraud As pressure mounts worldwide for internal audit functions to provide greater assurance over a wider area of the company’s operations, many internal auditors here find themselves under- equipped

Copyright © 2013-2018 Pharmacy Abstracts