# brain-readme.txt
#
# Project: SndLib
#
# Demand matrices from measurements in the BRAIN research network in berlin

*************
Content:
*************

1. Origin
2. Topology
3. Creation
4. Format
5. Remarks

***********************
1. Origin
***********************

The dynamic demand matrices contained in the archives
  - directed-brain-1min-over-7days.tgz          (xml files)
  - directed-brain-1min-over-7days-native.tgz   (native format files)
  - directed-brain-1h-over-375day.tgz		(xml files)
  - directed-brain-1h-over-375day-native.tgz	(native format files)
are calculated from real-life traffic data from the BRAIN research network in berlin, see
https://www.brain.de/. The traffic was measured on each link of the network in the units 
of bits per second.

For the first set of demands, the original data was taken in 1min steps starting on 
08.03.2013 at 14:52 and ending on 15.03.2013 at 08:56. There are 9723 traffic files. 
For at least one timestamp there is no demand file and maybe there are more gaps, see 
"Remarks" section. 
The second set of demands lasts from  05.03.2012 15:00 until 15.03.2013 08:00.
There are 8993 demand files, maybe containing gaps. 

The network topology was available, but no capacities, costs or coordinates are given.

***********************
2. Topology
***********************

The network contains 9 backbone nodes, which are connected in a connected s.t. there are 
multiple independent paths from each node to each other node.

At each backbone node, there are connections to multiple regional nodes which are the 
source and target of each demand. There are 152 regio nodes in the network. Each regio 
node has a unique backbone node, it is connected to and there are no connections between 
any regio nodes. 

***********************
3. Creation
***********************

Demands:

To create point-to-point demands for each time stamp, which are valid in the sense that 
routing the created demands in the network may lead to the given traffic value on the 
links, the following steps had been made:

1. For each node, some coordinates spread across germany are chosen to varify that the 
   produced pictures look nice.
2. Since each backbone node measures all traffic on each of its connected links, there 
   are two traffic value for a backbone link, one for each backbone node of the 
   connection. Since these values don't have to be equal, the mean value is used.
3. For each regio node the ingoing and outgoing traffic is known, since this is equal to 
   the traffic on the corresponding links to the backbone node. Based on this an ideal 
   demand goal value is calculated for each pair of different regio nodes. This goal 
   for the demand from regio1 to regio2 is calculated by:
   
   outgoingTrafficOfRegio1*ingoingTrafficOfRegio2*alpha
   
   The factor alpha is chosen in a way, such that the following holds:
   sum of traffic on all ingoing links of regio node = sum of all demand goals 

4. To calculate the demands the following LP model has been solved to optimality:
   For each regio-to-regio demand a set of path based flows is calculated. Then the 
   following values were calculated, forming the objective of the optimization:
   
   a: average relative difference between the demand goal and the flow on all paths for 
      each pair of regio nodes
   b: average relative difference between the traffic on a backbone-to-backbone link and 
      the sum of flows on that link
   c: average relative difference between the traffic on a regio-to-backbone link and the 
      sum of flows on that link
   
   The objective was a+5*b+10*c, resulting in 24 out of 9723 optimal values in the range 
   of 1 to 7 and the rest smaller then 1.
5. Since the resulting regio-to-regio demands are fractional, the demand values are 
   rounded to integer values, since solving the corresponding model with an integrality 
   constraint would have been taken much more computation time. All resulting demands 
   with value zero have been removed.

This results in the given directed demands (without loops).

Cost:

The given costs in the native format file are based on the model of huelsermann et al from 
the paper "Cost modeling and evaluation of capital expenditures in optical multilayer 
networks" from 2008.

Using the values from table 2, it is possible to build an 1G and 10G ethernet link having 
costs of 2 according port with 40km reach and corresponding the partial costs of a 
needed port card and a basic node. These value is multiplied by 1000 to avoid highly 
fractional numbers.

***********************
4. Format
***********************

For the new multiple demand matrix archives we decided to NOT introduce a new XML scheme
nor data format but to use the existing SNDlib formats.
This means that also all the available code (parsing/writing) can be used
for the multiple matrices.

A single demand matrix in the multiple matrices archive is just a Network object without 
a link section, that is, it consists of nodes and demands between the nodes. 
It follows that the Network parser/writer available in the SNDlib API can be used to 
parse/write a single demand matrix.
The node sections for all single matrices in the brain archive are of course identical and 
correspond to the brain sndlib network.

In addition to a node and a demand section a single demand matrix also has a Meta-Section
giving additional information about the matrix such as the time stamp, the time horizon,
the origin, and the data unit. The new SNDlib API 1.3 is able to handle this (optional)
Meta-Section.

***********************
5. Remarks
***********************

We do not give any warranty for the correctness of the data.
There might be mistakes already in the original accounting data.
We might also have made mistakes in the creation of the data.

NOTICE: THERE ALWAYS IS ONE demandMatrix*.xml FOR EVERY TRAFFIC FILE IN THE ORIGINAL DATA

NOTICE: THERE IS NO DEMAND FILE FOR TIME 14:56 on 08.03.2013 in the 1min demand set