분할¶

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important;}</style>"))

import pandas as pd

emp = pd.read_csv("c:/data/emp3.csv")
emp

count, bin_dividers = np.histogram(emp.sal,bins=3)
print(count)
print(bin_dividers) # 경계값 리스트

[8 5 1]
[ 800. 2200. 3600. 5000.]

bin_names = ['저소득','중간소득','고소득']
emp['sal_divide'] = pd.cut(x=emp.sal,bins=bin_dividers,labels=bin_names)
emp

더미변수¶

pd.get_dummies(emp.deptno)

문자형을 날짜형으로 변환¶

df = pd.read_csv("c:/data/studyfile/stock-data.csv")
print(df.info())
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Date    20 non-null     object
 1   Close   20 non-null     int64 
 2   Start   20 non-null     int64 
 3   High    20 non-null     int64 
 4   Low     20 non-null     int64 
 5   Volume  20 non-null     int64 
dtypes: int64(5), object(1)
memory usage: 1.1+ KB
None

df[['Date']] = pd.to_datetime(df.Date)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   Date    20 non-null     datetime64[ns]
 1   Close   20 non-null     int64         
 2   Start   20 non-null     int64         
 3   High    20 non-null     int64         
 4   Low     20 non-null     int64         
 5   Volume  20 non-null     int64         
dtypes: datetime64[ns](1), int64(5)
memory usage: 1.1 KB

df.Date.dt.year.tail()

15    2018
16    2018
17    2018
18    2018
19    2018
Name: Date, dtype: int64

df.Date.dt.month.head()

0    7
1    6
2    6
3    6
4    6
Name: Date, dtype: int64

df.Date.dt.day.head()

0     2
1    29
2    28
3    27
4    26
Name: Date, dtype: int64

인덱스를 날짜형으로 만들기¶

df.set_index('Date',inplace=True)
df

df.index

DatetimeIndex(['2018-07-02', '2018-06-29', '2018-06-28', '2018-06-27',
               '2018-06-26', '2018-06-25', '2018-06-22', '2018-06-21',
               '2018-06-20', '2018-06-19', '2018-06-18', '2018-06-15',
               '2018-06-14', '2018-06-12', '2018-06-11', '2018-06-08',
               '2018-06-07', '2018-06-05', '2018-06-04', '2018-06-01'],
              dtype='datetime64[ns]', name='Date', freq=None)

df.index.year

Int64Index([2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018,
            2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018],
           dtype='int64', name='Date')

df.index.month

Int64Index([7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6], dtype='int64', name='Date')

df.index.day

Int64Index([2, 29, 28, 27, 26, 25, 22, 21, 20, 19, 18, 15, 14, 12, 11, 8, 7, 5,
            4, 1],
           dtype='int64', name='Date')

	index	empno	ename	job	mgr	hiredate	sal	comm	deptno
0	1	7839	KING	PRESIDENT	NaN	1981-11-17 0:00	5000	NaN	10
1	2	7698	BLAKE	MANAGER	7839.0	1981-05-01 0:00	2850	NaN	30
2	3	7782	CLARK	MANAGER	7839.0	1981-05-09 0:00	2450	NaN	10
3	4	7566	JONES	MANAGER	7839.0	1981-04-01 0:00	2975	NaN	20
4	5	7654	MARTIN	SALESMAN	7698.0	1981-09-10 0:00	1250	1400.0	30
5	6	7499	ALLEN	SALESMAN	7698.0	1981-02-11 0:00	1600	300.0	30
6	7	7844	TURNER	SALESMAN	7698.0	1981-08-21 0:00	1500	0.0	30
7	8	7900	JAMES	CLERK	7698.0	1981-12-11 0:00	950	NaN	30
8	9	7521	WARD	SALESMAN	7698.0	1981-02-23 0:00	1250	500.0	30
9	10	7902	FORD	ANALYST	7566.0	1981-12-11 0:00	3000	NaN	20
10	11	7369	SMITH	CLERK	7902.0	1980-12-09 0:00	800	NaN	20
11	12	7788	SCOTT	ANALYST	7566.0	1982-12-22 0:00	3000	NaN	20
12	13	7876	ADAMS	CLERK	7788.0	1983-01-15 0:00	1100	NaN	20
13	14	7934	MILLER	CLERK	7782.0	1982-01-11 0:00	1300	NaN	10

	index	empno	ename	job	mgr	hiredate	sal	comm	deptno	sal_divide
0	1	7839	KING	PRESIDENT	NaN	1981-11-17 0:00	5000	NaN	10	고소득
1	2	7698	BLAKE	MANAGER	7839.0	1981-05-01 0:00	2850	NaN	30	중간소득
2	3	7782	CLARK	MANAGER	7839.0	1981-05-09 0:00	2450	NaN	10	중간소득
3	4	7566	JONES	MANAGER	7839.0	1981-04-01 0:00	2975	NaN	20	중간소득
4	5	7654	MARTIN	SALESMAN	7698.0	1981-09-10 0:00	1250	1400.0	30	저소득
5	6	7499	ALLEN	SALESMAN	7698.0	1981-02-11 0:00	1600	300.0	30	저소득
6	7	7844	TURNER	SALESMAN	7698.0	1981-08-21 0:00	1500	0.0	30	저소득
7	8	7900	JAMES	CLERK	7698.0	1981-12-11 0:00	950	NaN	30	저소득
8	9	7521	WARD	SALESMAN	7698.0	1981-02-23 0:00	1250	500.0	30	저소득
9	10	7902	FORD	ANALYST	7566.0	1981-12-11 0:00	3000	NaN	20	중간소득
10	11	7369	SMITH	CLERK	7902.0	1980-12-09 0:00	800	NaN	20	NaN
11	12	7788	SCOTT	ANALYST	7566.0	1982-12-22 0:00	3000	NaN	20	중간소득
12	13	7876	ADAMS	CLERK	7788.0	1983-01-15 0:00	1100	NaN	20	저소득
13	14	7934	MILLER	CLERK	7782.0	1982-01-11 0:00	1300	NaN	10	저소득

	10	20	30
0	1	0	0
1	0	0	1
2	1	0	0
3	0	1	0
4	0	0	1
5	0	0	1
6	0	0	1
7	0	0	1
8	0	0	1
9	0	1	0
10	0	1	0
11	0	1	0
12	0	1	0
13	1	0	0

	Date	Close	Start	High	Low	Volume
0	2018-07-02	10100	10850	10900	10000	137977
1	2018-06-29	10700	10550	10900	9990	170253
2	2018-06-28	10400	10900	10950	10150	155769
3	2018-06-27	10900	10800	11050	10500	133548
4	2018-06-26	10800	10900	11000	10700	63039

	Close	Start	High	Low	Volume
Date
2018-07-02	10100	10850	10900	10000	137977
2018-06-29	10700	10550	10900	9990	170253
2018-06-28	10400	10900	10950	10150	155769
2018-06-27	10900	10800	11050	10500	133548
2018-06-26	10800	10900	11000	10700	63039
2018-06-25	11150	11400	11450	11000	55519
2018-06-22	11300	11250	11450	10750	134805
2018-06-21	11200	11350	11750	11200	133002
2018-06-20	11550	11200	11600	10900	308596
2018-06-19	11300	11850	11950	11300	180656
2018-06-18	12000	13400	13400	12000	309787
2018-06-15	13400	13600	13600	12900	201376
2018-06-14	13450	13200	13700	13150	347451
2018-06-12	13200	12200	13300	12050	558148
2018-06-11	11950	12000	12250	11950	62293
2018-06-08	11950	11950	12200	11800	59258
2018-06-07	11950	12200	12300	11900	49088
2018-06-05	12150	11800	12250	11800	42485
2018-06-04	11900	11900	12200	11700	25171
2018-06-01	11900	11800	12100	11750	32062

	10	20	30
0	1	0	0
1	0	0	1
2	1	0	0
3	0	1	0
4	0	0	1
5	0	0	1
6	0	0	1
7	0	0	1
8	0	0	1
9	0	1	0
10	0	1	0
11	0	1	0
12	0	1	0
13	1	0	0

	10	20	30
0	1	0	0
1	0	0	1
2	1	0	0
3	0	1	0
4	0	0	1
5	0	0	1
6	0	0	1
7	0	0	1
8	0	0	1
9	0	1	0
10	0	1	0
11	0	1	0
12	0	1	0
13	1	0	0

판다스 이해하기 - 분할, 더미변수, 문자형 날짜형 변환

분할¶

더미변수¶

문자형을 날짜형으로 변환¶

인덱스를 날짜형으로 만들기¶

	10	20	30
0	1	0	0
1	0	0	1
2	1	0	0
3	0	1	0
4	0	0	1
5	0	0	1
6	0	0	1
7	0	0	1
8	0	0	1
9	0	1	0
10	0	1	0
11	0	1	0
12	0	1	0
13	1	0	0