Uploaded Test files
This commit is contained in:
parent
f584ad9d97
commit
2e81cb7d99
16627 changed files with 2065359 additions and 102444 deletions
95
venv/Lib/site-packages/sklearn/datasets/descr/kddcup99.rst
Normal file
95
venv/Lib/site-packages/sklearn/datasets/descr/kddcup99.rst
Normal file
|
@ -0,0 +1,95 @@
|
|||
.. _kddcup99_dataset:
|
||||
|
||||
Kddcup 99 dataset
|
||||
-----------------
|
||||
|
||||
The KDD Cup '99 dataset was created by processing the tcpdump portions
|
||||
of the 1998 DARPA Intrusion Detection System (IDS) Evaluation dataset,
|
||||
created by MIT Lincoln Lab [1]. The artificial data (described on the `dataset's
|
||||
homepage <https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html>`_) was
|
||||
generated using a closed network and hand-injected attacks to produce a
|
||||
large number of different types of attack with normal activity in the
|
||||
background. As the initial goal was to produce a large training set for
|
||||
supervised learning algorithms, there is a large proportion (80.1%) of
|
||||
abnormal data which is unrealistic in real world, and inappropriate for
|
||||
unsupervised anomaly detection which aims at detecting 'abnormal' data, ie
|
||||
|
||||
1) qualitatively different from normal data
|
||||
|
||||
2) in large minority among the observations.
|
||||
|
||||
We thus transform the KDD Data set into two different data sets: SA and SF.
|
||||
|
||||
-SA is obtained by simply selecting all the normal data, and a small
|
||||
proportion of abnormal data to gives an anomaly proportion of 1%.
|
||||
|
||||
-SF is obtained as in [2]
|
||||
by simply picking up the data whose attribute logged_in is positive, thus
|
||||
focusing on the intrusion attack, which gives a proportion of 0.3% of
|
||||
attack.
|
||||
|
||||
-http and smtp are two subsets of SF corresponding with third feature
|
||||
equal to 'http' (resp. to 'smtp')
|
||||
|
||||
General KDD structure :
|
||||
|
||||
================ ==========================================
|
||||
Samples total 4898431
|
||||
Dimensionality 41
|
||||
Features discrete (int) or continuous (float)
|
||||
Targets str, 'normal.' or name of the anomaly type
|
||||
================ ==========================================
|
||||
|
||||
SA structure :
|
||||
|
||||
================ ==========================================
|
||||
Samples total 976158
|
||||
Dimensionality 41
|
||||
Features discrete (int) or continuous (float)
|
||||
Targets str, 'normal.' or name of the anomaly type
|
||||
================ ==========================================
|
||||
|
||||
SF structure :
|
||||
|
||||
================ ==========================================
|
||||
Samples total 699691
|
||||
Dimensionality 4
|
||||
Features discrete (int) or continuous (float)
|
||||
Targets str, 'normal.' or name of the anomaly type
|
||||
================ ==========================================
|
||||
|
||||
http structure :
|
||||
|
||||
================ ==========================================
|
||||
Samples total 619052
|
||||
Dimensionality 3
|
||||
Features discrete (int) or continuous (float)
|
||||
Targets str, 'normal.' or name of the anomaly type
|
||||
================ ==========================================
|
||||
|
||||
smtp structure :
|
||||
|
||||
================ ==========================================
|
||||
Samples total 95373
|
||||
Dimensionality 3
|
||||
Features discrete (int) or continuous (float)
|
||||
Targets str, 'normal.' or name of the anomaly type
|
||||
================ ==========================================
|
||||
|
||||
:func:`sklearn.datasets.fetch_kddcup99` will load the kddcup99 dataset; it
|
||||
returns a dictionary-like object with the feature matrix in the ``data`` member
|
||||
and the target values in ``target``. The dataset will be downloaded from the
|
||||
web if necessary.
|
||||
|
||||
.. topic: References
|
||||
|
||||
.. [1] Analysis and Results of the 1999 DARPA Off-Line Intrusion
|
||||
Detection Evaluation Richard Lippmann, Joshua W. Haines,
|
||||
David J. Fried, Jonathan Korba, Kumar Das
|
||||
|
||||
.. [2] K. Yamanishi, J.-I. Takeuchi, G. Williams, and P. Milne. Online
|
||||
unsupervised outlier detection using finite mixtures with
|
||||
discounting learning algorithms. In Proceedings of the sixth
|
||||
ACM SIGKDD international conference on Knowledge discovery
|
||||
and data mining, pages 320-324. ACM Press, 2000.
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue