Fixed database typo and removed unnecessary class identifier.

This commit is contained in:
Batuhan Berk Başoğlu 2020-10-14 10:10:37 -04:00
parent 00ad49a143
commit 45fb349a7d
5098 changed files with 952558 additions and 85 deletions

View file

@ -0,0 +1,225 @@
% 1. Title: Iris Plants Database
%
% 2. Sources:
% (a) Creator: R.A. Fisher
% (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
% (c) Date: July, 1988
%
% 3. Past Usage:
% - Publications: too many to mention!!! Here are a few.
% 1. Fisher,R.A. "The use of multiple measurements in taxonomic problems"
% Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions
% to Mathematical Statistics" (John Wiley, NY, 1950).
% 2. Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
% (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.
% 3. Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
% Structure and Classification Rule for Recognition in Partially Exposed
% Environments". IEEE Transactions on Pattern Analysis and Machine
% Intelligence, Vol. PAMI-2, No. 1, 67-71.
% -- Results:
% -- very low misclassification rates (0% for the setosa class)
% 4. Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE
% Transactions on Information Theory, May 1972, 431-433.
% -- Results:
% -- very low misclassification rates again
% 5. See also: 1988 MLC Proceedings, 54-64. Cheeseman et al's AUTOCLASS II
% conceptual clustering system finds 3 classes in the data.
%
% 4. Relevant Information:
% --- This is perhaps the best known database to be found in the pattern
% recognition literature. Fisher's paper is a classic in the field
% and is referenced frequently to this day. (See Duda & Hart, for
% example.) The data set contains 3 classes of 50 instances each,
% where each class refers to a type of iris plant. One class is
% linearly separable from the other 2; the latter are NOT linearly
% separable from each other.
% --- Predicted attribute: class of iris plant.
% --- This is an exceedingly simple domain.
%
% 5. Number of Instances: 150 (50 in each of three classes)
%
% 6. Number of Attributes: 4 numeric, predictive attributes and the class
%
% 7. Attribute Information:
% 1. sepal length in cm
% 2. sepal width in cm
% 3. petal length in cm
% 4. petal width in cm
% 5. class:
% -- Iris Setosa
% -- Iris Versicolour
% -- Iris Virginica
%
% 8. Missing Attribute Values: None
%
% Summary Statistics:
% Min Max Mean SD Class Correlation
% sepal length: 4.3 7.9 5.84 0.83 0.7826
% sepal width: 2.0 4.4 3.05 0.43 -0.4194
% petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
% petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
%
% 9. Class Distribution: 33.3% for each of 3 classes.
@RELATION iris
@ATTRIBUTE sepallength REAL
@ATTRIBUTE sepalwidth REAL
@ATTRIBUTE petallength REAL
@ATTRIBUTE petalwidth REAL
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.4,3.7,1.5,0.2,Iris-setosa
4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa
4.3,3.0,1.1,0.1,Iris-setosa
5.8,4.0,1.2,0.2,Iris-setosa
5.7,4.4,1.5,0.4,Iris-setosa
5.4,3.9,1.3,0.4,Iris-setosa
5.1,3.5,1.4,0.3,Iris-setosa
5.7,3.8,1.7,0.3,Iris-setosa
5.1,3.8,1.5,0.3,Iris-setosa
5.4,3.4,1.7,0.2,Iris-setosa
5.1,3.7,1.5,0.4,Iris-setosa
4.6,3.6,1.0,0.2,Iris-setosa
5.1,3.3,1.7,0.5,Iris-setosa
4.8,3.4,1.9,0.2,Iris-setosa
5.0,3.0,1.6,0.2,Iris-setosa
5.0,3.4,1.6,0.4,Iris-setosa
5.2,3.5,1.5,0.2,Iris-setosa
5.2,3.4,1.4,0.2,Iris-setosa
4.7,3.2,1.6,0.2,Iris-setosa
4.8,3.1,1.6,0.2,Iris-setosa
5.4,3.4,1.5,0.4,Iris-setosa
5.2,4.1,1.5,0.1,Iris-setosa
5.5,4.2,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.0,3.2,1.2,0.2,Iris-setosa
5.5,3.5,1.3,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
4.4,3.0,1.3,0.2,Iris-setosa
5.1,3.4,1.5,0.2,Iris-setosa
5.0,3.5,1.3,0.3,Iris-setosa
4.5,2.3,1.3,0.3,Iris-setosa
4.4,3.2,1.3,0.2,Iris-setosa
5.0,3.5,1.6,0.6,Iris-setosa
5.1,3.8,1.9,0.4,Iris-setosa
4.8,3.0,1.4,0.3,Iris-setosa
5.1,3.8,1.6,0.2,Iris-setosa
4.6,3.2,1.4,0.2,Iris-setosa
5.3,3.7,1.5,0.2,Iris-setosa
5.0,3.3,1.4,0.2,Iris-setosa
7.0,3.2,4.7,1.4,Iris-versicolor
6.4,3.2,4.5,1.5,Iris-versicolor
6.9,3.1,4.9,1.5,Iris-versicolor
5.5,2.3,4.0,1.3,Iris-versicolor
6.5,2.8,4.6,1.5,Iris-versicolor
5.7,2.8,4.5,1.3,Iris-versicolor
6.3,3.3,4.7,1.6,Iris-versicolor
4.9,2.4,3.3,1.0,Iris-versicolor
6.6,2.9,4.6,1.3,Iris-versicolor
5.2,2.7,3.9,1.4,Iris-versicolor
5.0,2.0,3.5,1.0,Iris-versicolor
5.9,3.0,4.2,1.5,Iris-versicolor
6.0,2.2,4.0,1.0,Iris-versicolor
6.1,2.9,4.7,1.4,Iris-versicolor
5.6,2.9,3.6,1.3,Iris-versicolor
6.7,3.1,4.4,1.4,Iris-versicolor
5.6,3.0,4.5,1.5,Iris-versicolor
5.8,2.7,4.1,1.0,Iris-versicolor
6.2,2.2,4.5,1.5,Iris-versicolor
5.6,2.5,3.9,1.1,Iris-versicolor
5.9,3.2,4.8,1.8,Iris-versicolor
6.1,2.8,4.0,1.3,Iris-versicolor
6.3,2.5,4.9,1.5,Iris-versicolor
6.1,2.8,4.7,1.2,Iris-versicolor
6.4,2.9,4.3,1.3,Iris-versicolor
6.6,3.0,4.4,1.4,Iris-versicolor
6.8,2.8,4.8,1.4,Iris-versicolor
6.7,3.0,5.0,1.7,Iris-versicolor
6.0,2.9,4.5,1.5,Iris-versicolor
5.7,2.6,3.5,1.0,Iris-versicolor
5.5,2.4,3.8,1.1,Iris-versicolor
5.5,2.4,3.7,1.0,Iris-versicolor
5.8,2.7,3.9,1.2,Iris-versicolor
6.0,2.7,5.1,1.6,Iris-versicolor
5.4,3.0,4.5,1.5,Iris-versicolor
6.0,3.4,4.5,1.6,Iris-versicolor
6.7,3.1,4.7,1.5,Iris-versicolor
6.3,2.3,4.4,1.3,Iris-versicolor
5.6,3.0,4.1,1.3,Iris-versicolor
5.5,2.5,4.0,1.3,Iris-versicolor
5.5,2.6,4.4,1.2,Iris-versicolor
6.1,3.0,4.6,1.4,Iris-versicolor
5.8,2.6,4.0,1.2,Iris-versicolor
5.0,2.3,3.3,1.0,Iris-versicolor
5.6,2.7,4.2,1.3,Iris-versicolor
5.7,3.0,4.2,1.2,Iris-versicolor
5.7,2.9,4.2,1.3,Iris-versicolor
6.2,2.9,4.3,1.3,Iris-versicolor
5.1,2.5,3.0,1.1,Iris-versicolor
5.7,2.8,4.1,1.3,Iris-versicolor
6.3,3.3,6.0,2.5,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
7.1,3.0,5.9,2.1,Iris-virginica
6.3,2.9,5.6,1.8,Iris-virginica
6.5,3.0,5.8,2.2,Iris-virginica
7.6,3.0,6.6,2.1,Iris-virginica
4.9,2.5,4.5,1.7,Iris-virginica
7.3,2.9,6.3,1.8,Iris-virginica
6.7,2.5,5.8,1.8,Iris-virginica
7.2,3.6,6.1,2.5,Iris-virginica
6.5,3.2,5.1,2.0,Iris-virginica
6.4,2.7,5.3,1.9,Iris-virginica
6.8,3.0,5.5,2.1,Iris-virginica
5.7,2.5,5.0,2.0,Iris-virginica
5.8,2.8,5.1,2.4,Iris-virginica
6.4,3.2,5.3,2.3,Iris-virginica
6.5,3.0,5.5,1.8,Iris-virginica
7.7,3.8,6.7,2.2,Iris-virginica
7.7,2.6,6.9,2.3,Iris-virginica
6.0,2.2,5.0,1.5,Iris-virginica
6.9,3.2,5.7,2.3,Iris-virginica
5.6,2.8,4.9,2.0,Iris-virginica
7.7,2.8,6.7,2.0,Iris-virginica
6.3,2.7,4.9,1.8,Iris-virginica
6.7,3.3,5.7,2.1,Iris-virginica
7.2,3.2,6.0,1.8,Iris-virginica
6.2,2.8,4.8,1.8,Iris-virginica
6.1,3.0,4.9,1.8,Iris-virginica
6.4,2.8,5.6,2.1,Iris-virginica
7.2,3.0,5.8,1.6,Iris-virginica
7.4,2.8,6.1,1.9,Iris-virginica
7.9,3.8,6.4,2.0,Iris-virginica
6.4,2.8,5.6,2.2,Iris-virginica
6.3,2.8,5.1,1.5,Iris-virginica
6.1,2.6,5.6,1.4,Iris-virginica
7.7,3.0,6.1,2.3,Iris-virginica
6.3,3.4,5.6,2.4,Iris-virginica
6.4,3.1,5.5,1.8,Iris-virginica
6.0,3.0,4.8,1.8,Iris-virginica
6.9,3.1,5.4,2.1,Iris-virginica
6.7,3.1,5.6,2.4,Iris-virginica
6.9,3.1,5.1,2.3,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
6.8,3.2,5.9,2.3,Iris-virginica
6.7,3.3,5.7,2.5,Iris-virginica
6.7,3.0,5.2,2.3,Iris-virginica
6.3,2.5,5.0,1.9,Iris-virginica
6.5,3.0,5.2,2.0,Iris-virginica
6.2,3.4,5.4,2.3,Iris-virginica
5.9,3.0,5.1,1.8,Iris-virginica
%
%
%

View file

@ -0,0 +1,8 @@
% This arff file contains some missing data
@relation missing
@attribute yop real
@attribute yap real
@data
1,5
2,4
?,?

View file

@ -0,0 +1,11 @@
@RELATION iris
@ATTRIBUTE sepallength REAL
@ATTRIBUTE sepalwidth REAL
@ATTRIBUTE petallength REAL
@ATTRIBUTE petalwidth REAL
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
@DATA
% This file has no data

View file

@ -0,0 +1,13 @@
% Regression test for issue #10232 : Exception in loadarff with quoted nominal attributes
% Spaces between elements are stripped by the parser
@relation SOME_DATA
@attribute age numeric
@attribute smoker {'yes', 'no'}
@data
18, 'no'
24, 'yes'
44, 'no'
56, 'no'
89,'yes'
11, 'no'

View file

@ -0,0 +1,13 @@
% Regression test for issue #10232 : Exception in loadarff with quoted nominal attributes
% Spaces inside quotes are NOT stripped by the parser
@relation SOME_DATA
@attribute age numeric
@attribute smoker {' yes', 'no '}
@data
18,'no '
24,' yes'
44,'no '
56,'no '
89,' yes'
11,'no '

View file

@ -0,0 +1,10 @@
@RELATION test1
@ATTRIBUTE attr0 REAL
@ATTRIBUTE attr1 REAL
@ATTRIBUTE attr2 REAL
@ATTRIBUTE attr3 REAL
@ATTRIBUTE class {class0, class1, class2, class3}
@DATA
0.1, 0.2, 0.3, 0.4,class1

File diff suppressed because one or more lines are too long

View file

@ -0,0 +1,15 @@
@RELATION test2
@ATTRIBUTE attr0 REAL
@ATTRIBUTE attr1 real
@ATTRIBUTE attr2 integer
@ATTRIBUTE attr3 Integer
@ATTRIBUTE attr4 Numeric
@ATTRIBUTE attr5 numeric
@ATTRIBUTE attr6 string
@ATTRIBUTE attr7 STRING
@ATTRIBUTE attr8 {bla}
@ATTRIBUTE attr9 {bla, bla}
@DATA
0.1, 0.2, 0.3, 0.4,class1

View file

@ -0,0 +1,6 @@
@RELATION test3
@ATTRIBUTE attr0 crap
@DATA
0.1, 0.2, 0.3, 0.4,class1

View file

@ -0,0 +1,11 @@
@RELATION test5
@ATTRIBUTE attr0 REAL
@ATTRIBUTE attr1 REAL
@ATTRIBUTE attr2 REAL
@ATTRIBUTE attr3 REAL
@ATTRIBUTE class {class0, class1, class2, class3}
@DATA
0.1, 0.2, 0.3, 0.4,class1
-0.1, -0.2, -0.3, -0.4,class2
1, 2, 3, 4,class3

View file

@ -0,0 +1,26 @@
@RELATION test4
@ATTRIBUTE attr0 REAL
@ATTRIBUTE attr1 REAL
@ATTRIBUTE attr2 REAL
@ATTRIBUTE attr3 REAL
@ATTRIBUTE class {class0, class1, class2, class3}
@DATA
% lsdflkjhaksjdhf
% lsdflkjhaksjdhf
0.1, 0.2, 0.3, 0.4,class1
% laksjdhf
% lsdflkjhaksjdhf
-0.1, -0.2, -0.3, -0.4,class2
% lsdflkjhaksjdhf
% lsdflkjhaksjdhf
% lsdflkjhaksjdhf
1, 2, 3, 4,class3

View file

@ -0,0 +1,12 @@
@RELATION test6
@ATTRIBUTE attr0 REAL
@ATTRIBUTE attr1 REAL
@ATTRIBUTE attr2 REAL
@ATTRIBUTE attr3 REAL
@ATTRIBUTE class {C}
@DATA
0.1, 0.2, 0.3, 0.4,C
-0.1, -0.2, -0.3, -0.4,C
1, 2, 3, 4,C

View file

@ -0,0 +1,15 @@
@RELATION test7
@ATTRIBUTE attr_year DATE yyyy
@ATTRIBUTE attr_month DATE yyyy-MM
@ATTRIBUTE attr_date DATE yyyy-MM-dd
@ATTRIBUTE attr_datetime_local DATE "yyyy-MM-dd HH:mm"
@ATTRIBUTE attr_datetime_missing DATE "yyyy-MM-dd HH:mm"
@DATA
1999,1999-01,1999-01-31,"1999-01-31 00:01",?
2004,2004-12,2004-12-01,"2004-12-01 23:59","2004-12-01 23:59"
1817,1817-04,1817-04-28,"1817-04-28 13:00",?
2100,2100-09,2100-09-10,"2100-09-10 12:00",?
2013,2013-11,2013-11-30,"2013-11-30 04:55","2013-11-30 04:55"
1631,1631-10,1631-10-15,"1631-10-15 20:04","1631-10-15 20:04"

View file

@ -0,0 +1,12 @@
@RELATION test8
@ATTRIBUTE attr_datetime_utc DATE "yyyy-MM-dd HH:mm Z"
@ATTRIBUTE attr_datetime_full DATE "yy-MM-dd HH:mm:ss z"
@DATA
"1999-01-31 00:01 UTC","99-01-31 00:01:08 +0430"
"2004-12-01 23:59 UTC","04-12-01 23:59:59 -0800"
"1817-04-28 13:00 UTC","17-04-28 13:00:33 +1000"
"2100-09-10 12:00 UTC","21-09-10 12:00:21 -0300"
"2013-11-30 04:55 UTC","13-11-30 04:55:48 -1100"
"1631-10-15 20:04 UTC","31-10-15 20:04:10 +0000"

View file

@ -0,0 +1,14 @@
@RELATION test9
@ATTRIBUTE attr_date_number RELATIONAL
@ATTRIBUTE attr_date DATE "yyyy-MM-dd"
@ATTRIBUTE attr_number INTEGER
@END attr_date_number
@DATA
"1999-01-31 1\n1935-11-27 10"
"2004-12-01 2\n1942-08-13 20"
"1817-04-28 3"
"2100-09-10 4\n1957-04-17 40\n1721-01-14 400"
"2013-11-30 5"
"1631-10-15 6"

View file

@ -0,0 +1,412 @@
import datetime
import os
import sys
from os.path import join as pjoin
from io import StringIO
import numpy as np
from numpy.testing import (assert_array_almost_equal,
assert_array_equal, assert_equal, assert_)
import pytest
from pytest import raises as assert_raises
from scipy.io.arff.arffread import loadarff
from scipy.io.arff.arffread import read_header, ParseArffError
data_path = pjoin(os.path.dirname(__file__), 'data')
test1 = pjoin(data_path, 'test1.arff')
test2 = pjoin(data_path, 'test2.arff')
test3 = pjoin(data_path, 'test3.arff')
test4 = pjoin(data_path, 'test4.arff')
test5 = pjoin(data_path, 'test5.arff')
test6 = pjoin(data_path, 'test6.arff')
test7 = pjoin(data_path, 'test7.arff')
test8 = pjoin(data_path, 'test8.arff')
test9 = pjoin(data_path, 'test9.arff')
test10 = pjoin(data_path, 'test10.arff')
test11 = pjoin(data_path, 'test11.arff')
test_quoted_nominal = pjoin(data_path, 'quoted_nominal.arff')
test_quoted_nominal_spaces = pjoin(data_path, 'quoted_nominal_spaces.arff')
expect4_data = [(0.1, 0.2, 0.3, 0.4, 'class1'),
(-0.1, -0.2, -0.3, -0.4, 'class2'),
(1, 2, 3, 4, 'class3')]
expected_types = ['numeric', 'numeric', 'numeric', 'numeric', 'nominal']
missing = pjoin(data_path, 'missing.arff')
expect_missing_raw = np.array([[1, 5], [2, 4], [np.nan, np.nan]])
expect_missing = np.empty(3, [('yop', float), ('yap', float)])
expect_missing['yop'] = expect_missing_raw[:, 0]
expect_missing['yap'] = expect_missing_raw[:, 1]
class TestData(object):
def test1(self):
# Parsing trivial file with nothing.
self._test(test4)
def test2(self):
# Parsing trivial file with some comments in the data section.
self._test(test5)
def test3(self):
# Parsing trivial file with nominal attribute of 1 character.
self._test(test6)
def _test(self, test_file):
data, meta = loadarff(test_file)
for i in range(len(data)):
for j in range(4):
assert_array_almost_equal(expect4_data[i][j], data[i][j])
assert_equal(meta.types(), expected_types)
def test_filelike(self):
# Test reading from file-like object (StringIO)
with open(test1) as f1:
data1, meta1 = loadarff(f1)
with open(test1) as f2:
data2, meta2 = loadarff(StringIO(f2.read()))
assert_(data1 == data2)
assert_(repr(meta1) == repr(meta2))
@pytest.mark.skipif(sys.version_info < (3, 6),
reason='Passing path-like objects to IO functions requires Python >= 3.6')
def test_path(self):
# Test reading from `pathlib.Path` object
from pathlib import Path
with open(test1) as f1:
data1, meta1 = loadarff(f1)
data2, meta2 = loadarff(Path(test1))
assert_(data1 == data2)
assert_(repr(meta1) == repr(meta2))
class TestMissingData(object):
def test_missing(self):
data, meta = loadarff(missing)
for i in ['yop', 'yap']:
assert_array_almost_equal(data[i], expect_missing[i])
class TestNoData(object):
def test_nodata(self):
# The file nodata.arff has no data in the @DATA section.
# Reading it should result in an array with length 0.
nodata_filename = os.path.join(data_path, 'nodata.arff')
data, meta = loadarff(nodata_filename)
expected_dtype = np.dtype([('sepallength', '<f8'),
('sepalwidth', '<f8'),
('petallength', '<f8'),
('petalwidth', '<f8'),
('class', 'S15')])
assert_equal(data.dtype, expected_dtype)
assert_equal(data.size, 0)
class TestHeader(object):
def test_type_parsing(self):
# Test parsing type of attribute from their value.
with open(test2) as ofile:
rel, attrs = read_header(ofile)
expected = ['numeric', 'numeric', 'numeric', 'numeric', 'numeric',
'numeric', 'string', 'string', 'nominal', 'nominal']
for i in range(len(attrs)):
assert_(attrs[i].type_name == expected[i])
def test_badtype_parsing(self):
# Test parsing wrong type of attribute from their value.
def badtype_read():
with open(test3) as ofile:
_, _ = read_header(ofile)
assert_raises(ParseArffError, badtype_read)
def test_fullheader1(self):
# Parsing trivial header with nothing.
with open(test1) as ofile:
rel, attrs = read_header(ofile)
# Test relation
assert_(rel == 'test1')
# Test numerical attributes
assert_(len(attrs) == 5)
for i in range(4):
assert_(attrs[i].name == 'attr%d' % i)
assert_(attrs[i].type_name == 'numeric')
# Test nominal attribute
assert_(attrs[4].name == 'class')
assert_(attrs[4].values == ('class0', 'class1', 'class2', 'class3'))
def test_dateheader(self):
with open(test7) as ofile:
rel, attrs = read_header(ofile)
assert_(rel == 'test7')
assert_(len(attrs) == 5)
assert_(attrs[0].name == 'attr_year')
assert_(attrs[0].date_format == '%Y')
assert_(attrs[1].name == 'attr_month')
assert_(attrs[1].date_format == '%Y-%m')
assert_(attrs[2].name == 'attr_date')
assert_(attrs[2].date_format == '%Y-%m-%d')
assert_(attrs[3].name == 'attr_datetime_local')
assert_(attrs[3].date_format == '%Y-%m-%d %H:%M')
assert_(attrs[4].name == 'attr_datetime_missing')
assert_(attrs[4].date_format == '%Y-%m-%d %H:%M')
def test_dateheader_unsupported(self):
def read_dateheader_unsupported():
with open(test8) as ofile:
_, _ = read_header(ofile)
assert_raises(ValueError, read_dateheader_unsupported)
class TestDateAttribute(object):
def setup_method(self):
self.data, self.meta = loadarff(test7)
def test_year_attribute(self):
expected = np.array([
'1999',
'2004',
'1817',
'2100',
'2013',
'1631'
], dtype='datetime64[Y]')
assert_array_equal(self.data["attr_year"], expected)
def test_month_attribute(self):
expected = np.array([
'1999-01',
'2004-12',
'1817-04',
'2100-09',
'2013-11',
'1631-10'
], dtype='datetime64[M]')
assert_array_equal(self.data["attr_month"], expected)
def test_date_attribute(self):
expected = np.array([
'1999-01-31',
'2004-12-01',
'1817-04-28',
'2100-09-10',
'2013-11-30',
'1631-10-15'
], dtype='datetime64[D]')
assert_array_equal(self.data["attr_date"], expected)
def test_datetime_local_attribute(self):
expected = np.array([
datetime.datetime(year=1999, month=1, day=31, hour=0, minute=1),
datetime.datetime(year=2004, month=12, day=1, hour=23, minute=59),
datetime.datetime(year=1817, month=4, day=28, hour=13, minute=0),
datetime.datetime(year=2100, month=9, day=10, hour=12, minute=0),
datetime.datetime(year=2013, month=11, day=30, hour=4, minute=55),
datetime.datetime(year=1631, month=10, day=15, hour=20, minute=4)
], dtype='datetime64[m]')
assert_array_equal(self.data["attr_datetime_local"], expected)
def test_datetime_missing(self):
expected = np.array([
'nat',
'2004-12-01T23:59',
'nat',
'nat',
'2013-11-30T04:55',
'1631-10-15T20:04'
], dtype='datetime64[m]')
assert_array_equal(self.data["attr_datetime_missing"], expected)
def test_datetime_timezone(self):
assert_raises(ParseArffError, loadarff, test8)
class TestRelationalAttribute(object):
def setup_method(self):
self.data, self.meta = loadarff(test9)
def test_attributes(self):
assert_equal(len(self.meta._attributes), 1)
relational = list(self.meta._attributes.values())[0]
assert_equal(relational.name, 'attr_date_number')
assert_equal(relational.type_name, 'relational')
assert_equal(len(relational.attributes), 2)
assert_equal(relational.attributes[0].name,
'attr_date')
assert_equal(relational.attributes[0].type_name,
'date')
assert_equal(relational.attributes[1].name,
'attr_number')
assert_equal(relational.attributes[1].type_name,
'numeric')
def test_data(self):
dtype_instance = [('attr_date', 'datetime64[D]'),
('attr_number', np.float_)]
expected = [
np.array([('1999-01-31', 1), ('1935-11-27', 10)],
dtype=dtype_instance),
np.array([('2004-12-01', 2), ('1942-08-13', 20)],
dtype=dtype_instance),
np.array([('1817-04-28', 3)],
dtype=dtype_instance),
np.array([('2100-09-10', 4), ('1957-04-17', 40),
('1721-01-14', 400)],
dtype=dtype_instance),
np.array([('2013-11-30', 5)],
dtype=dtype_instance),
np.array([('1631-10-15', 6)],
dtype=dtype_instance)
]
for i in range(len(self.data["attr_date_number"])):
assert_array_equal(self.data["attr_date_number"][i],
expected[i])
class TestRelationalAttributeLong(object):
def setup_method(self):
self.data, self.meta = loadarff(test10)
def test_attributes(self):
assert_equal(len(self.meta._attributes), 1)
relational = list(self.meta._attributes.values())[0]
assert_equal(relational.name, 'attr_relational')
assert_equal(relational.type_name, 'relational')
assert_equal(len(relational.attributes), 1)
assert_equal(relational.attributes[0].name,
'attr_number')
assert_equal(relational.attributes[0].type_name, 'numeric')
def test_data(self):
dtype_instance = [('attr_number', np.float_)]
expected = np.array([(n,) for n in range(30000)],
dtype=dtype_instance)
assert_array_equal(self.data["attr_relational"][0],
expected)
class TestQuotedNominal(object):
"""
Regression test for issue #10232 : Exception in loadarff with quoted nominal attributes.
"""
def setup_method(self):
self.data, self.meta = loadarff(test_quoted_nominal)
def test_attributes(self):
assert_equal(len(self.meta._attributes), 2)
age, smoker = self.meta._attributes.values()
assert_equal(age.name, 'age')
assert_equal(age.type_name, 'numeric')
assert_equal(smoker.name, 'smoker')
assert_equal(smoker.type_name, 'nominal')
assert_equal(smoker.values, ['yes', 'no'])
def test_data(self):
age_dtype_instance = np.float_
smoker_dtype_instance = '<S3'
age_expected = np.array([
18,
24,
44,
56,
89,
11,
], dtype=age_dtype_instance)
smoker_expected = np.array([
'no',
'yes',
'no',
'no',
'yes',
'no',
], dtype=smoker_dtype_instance)
assert_array_equal(self.data["age"], age_expected)
assert_array_equal(self.data["smoker"], smoker_expected)
class TestQuotedNominalSpaces(object):
"""
Regression test for issue #10232 : Exception in loadarff with quoted nominal attributes.
"""
def setup_method(self):
self.data, self.meta = loadarff(test_quoted_nominal_spaces)
def test_attributes(self):
assert_equal(len(self.meta._attributes), 2)
age, smoker = self.meta._attributes.values()
assert_equal(age.name, 'age')
assert_equal(age.type_name, 'numeric')
assert_equal(smoker.name, 'smoker')
assert_equal(smoker.type_name, 'nominal')
assert_equal(smoker.values, [' yes', 'no '])
def test_data(self):
age_dtype_instance = np.float_
smoker_dtype_instance = '<S5'
age_expected = np.array([
18,
24,
44,
56,
89,
11,
], dtype=age_dtype_instance)
smoker_expected = np.array([
'no ',
' yes',
'no ',
'no ',
' yes',
'no ',
], dtype=smoker_dtype_instance)
assert_array_equal(self.data["age"], age_expected)
assert_array_equal(self.data["smoker"], smoker_expected)