AICE Associate 시험 총정리

Notice

Recent Posts

Recent Comments

Link

« 2025/11 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Tags 더보기

Archives

Total

Today

Yesterday

관리 메뉴

개발새발 쓰여진 글

AICE Associate 시험 총정리 본문

AICE Associate 시험 총정리

ITst 2025. 6. 30. 18:29

시험 출제 범위

영역	사용 라이브러리	기술 (하단에 주로 활용되는 함수 정리)	문항 수
탐색적 데이터 분석	Pandas, Matpltotlib, Seaborn	데이터 로딩, 기초 통계량, 사분위수, 분포 확인, 데이터 시각화	3~4
데이터 전처리	Pandas, Numpy, Scikit-Learn	결측치/이상치 처리, 인코딩, 데이터 학습셋 분리, 스케일링	5~7
머신러닝	Scikit-Learn, XGBoost, LightGBM	학습모델 정의(회귀, 분류), 파라미터 설정, 모델 훈련 및 예측	1
머신러닝 모델 평가	Scikit-Learn, TensorFlow	[분류 모델] accuracy, precision, recall, f1-score, AUC [회귀 모델] MAE, MSE, RMSE, R²	1
딥러닝	Scikit-Learn, TensorFlow	딥러닝 모델 정의, 컴파일, Callback 정의, 모델 학습	1
딥러닝 모델 평가	Matplotlib.pyplot, Scikit-Learn	- 그래프 작성하여 모델 성능 평가 - load-model로 모델 로딩 후 예측	0~1

라이브러리 별 자주 활용된 함수

Pandas

.read_csv('{파일명}')
.info(df)
.shape(df)
.describe(df)
.drop(df[ {컬럼} ], axis=1)
.merge(df_a, df_b, on=' {컬럼} ', how=' {left/right/inner/outer} ')
.copy()
.index.tolist()
.columns
.reset_index(drop= {True/False} , inplace= {True/False} )
.value_counts()
.dtypes()
.select_dtypes(include=[ {자료형} ], exclude=[ {자료형} ])
.astype( {자료형} )
.to_datatime(df[ {컬럼} ])
.loc[' {행} ', ' {컬럼} '] or loc[' {조건문} ',' {컬럼} ']
.iloc[' {행} ', ' {컬럼} ']
.sort_value(' {컬럼} ', ascending= {True/False} )
.groupby(' {그룹기준열} ',as_index= {True/False} ).agg(' {mean/var/std/sum/count/min/max} ').unstack()

.str

.str.split('{분리 기준값}').str[{인덱스}]
.str.contains
.str.startwith('')
.str.len('')

.isin(' {리스트} ')
.replace(' {기존값: 바꿀값} ')
.duplicate(subset=' {컬럼} ', keep=' {false/last/false} ')
.drop_duplicate(subset=' {컬럼} ', keep=' {false/last/false} ')
.isnull()
.dropna([ {컬럼} ], how=' {any/all} ')
.fillna(df[''].median/mean/mode())
.corr(Numeric_only= {True/False} )

Seaborn

.boxplot(data, x, y)
.countplot(data, x)
.jointplot(data, x, y)
.lmplot(data, x, y, hue)
.heatmap(df.corr())
※ AIDU 툴에서는 select_dtypes로 숫자 데이터 지정을 따로 해주지 않아도 상관분석이 잘 동작함

Scikit-Learn

model_selection

train_test_split(X, y, random_state, test_size, stratify)

X = df.drop('{특성컬럼}', axis=1)
y = df['{특성컬럼}']

from sklearn.model_selection import train_test_split

X_train, X_valid, y_train, y_valid = train_test_split(X, y, random_state=42, test_size=0.3, stratify=y)

preprocessing

StandardScaler(): 기존 분포를 정규 분포로 변환(평균, 분산 사용)

from sklearn.preprocessing import StandardScaler

ss = StandardScaler()
X_train = ss.fit_transform(X_train)
X_valid = ss.transform(X_valid)

Noramalizer(): 각 변수의 값을 원점으로부터 1만큼 떨어져 있는 범위 내로 변환

from sklearn.preprocessing import Normalizer

nor = Normalizer()
X_train = nor.fit_transform(X_train)
X_valid = nor.transform(X_valid)

MinMaxScaler(): 모든 값을 0과 1사이의 값으로 변환

from sklearn.preprocessing import MinMaxScaler

ms = MinMaxScaler()
X_train = ms.fit_transform(X_train)
X_valid = ms.transform(X_valid)

RobustScaler() : 기존 분포를 정규 분포로 변환(중위수, 사분위수 사용 - 이상치 영향 적어짐)

from sklearn.preprocessing import RobustScaler

rob = RobustScaler()
X_train = rob.fit_transform(X_train)
X_valid = rob.transform(X_valid)

Linear_Model

LinearRegression(): 선형 모델

from sklearn.Linear_Model import LinearRegression

ML_model = LinearRegression()

ML_model.fit(X_train, y_train)
y_pred = ML_model.predict(X_valid)

tree: 트리기반 머신러닝 모델

DecisionTreeClassifier(): 트리기반 분류 모델

from sklearn.tree import DecisionTreeClassifier

ML_model = DecisionTreeClassifier()

ML_model.fit(X_train, y_train)
y_pred = ML_model.predict(X_valid)

DecisionTreeRegressor(): 트리기반 회귀 모델

from sklearn.tree import DecisionTreeRegressor

ML_model = DecisionTreeRegressor()

ML_model.fit(X_train, y_train)
y_pred = ML_model.predict(X_valid)

ensemble: 트리기반 앙상블 모델 (배깅기법)

RandomForestClassifier(): 랜덤포레스트 분류 모델

from sklearn.ensemble import RandomForestClassifier

ML_model = RandomForestClassifier()

ML_model.fit(X_train, y_train)
y_pred = ML_model.predict(X_valid)

RandomForestRegressor(): 랜덤포레스트 회귀 모델

from sklearn.ensemble import RandomForestRegressor 

ML_model = RandomForestRegressor()

ML_model.fit(X_train, y_train)
y_pred = ML_model.predict(X_valid)

metrics: 모델 평가

평가지표
- 분류 모델 평가지표
  - accuracy, precision, recall, f1-score, AUC 높아야 좋음
- 회귀 모델 평가지표
  - MAE, MSE, RMSE 낮아야 좋음
  - R² 계수 높아야 좋음

accuracy_score({실제값}, {예측값})

from sklearn.metrics import accuracy_score

accuracy_score(y_valid, y_pred)

r2_score({실제값}, {예측값})

from sklearn.metrics import r2_score

r2_score(y_valid, y_pred)

mean_square_error({실제값}, {예측값})

from sklearn.metrics import mean_squared_error

mean_squared_error(y_valid, y_pred)

confusion_metrix({실제값}, {예측값})

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_valid, y_pred)
sns.heatmap(cm, annot=True)

classification_report({실제값}, {예측값})

from sklearn.metrics import classification_report

classification_report(y_valid, y_pred)

xgboost

XGBoostRegressor(): 트리 기반 부스팅 모델

from xgboost import XGBoostRegressor

ML_model = XGBoostRegressor()

ML_model.fit(X_train, y_train)
y_pred = ML_model.predict(X_valid)

lightgbm

LGBMRegressor(): 트리 기반 부스팅 모델

from lightgbm import LGBMRegressor

ML_model = LGBMRegressor()

ML_model.fit(X_train, y_train)
y_pred = ML_model.predict(X_valid)

넘파이 특징

모델 정의

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout, BatchNormalization

model = Sequential()
model.add(Dense(units=64, activation='relu', input_shape=(X_train.shape[1],)))
# AIDU 툴에서는 units, activation, input_shape를 넣지 않아도 잘 동작함
model.add(Dense(units=32, activation='relu'))
model.add(Dense(units=16, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Dense(units=1, activation='sigmoid'))
# to_categorical로 출력층 변경한 문제 출제 가능
# model.add(Dense(units=2, activation='softmax'))

컴파일

model.compile(optimizer='adam', loss='mse', metrics=['mae','mse'])

콜백 정의

from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

es = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
cp = ModelCheckpoint('best_model.h5', monitor='val_loss', save_best_only = True)

학습

hist = model.fit(X_train, y_train, epochs=35, batch_size=25, validation_data=(X_valid, y_valid), call_backs=[es, cp])

저장된 최고 성능 모델 불러오기

from tensorflow.keras.models import load_model

best_model = load_model('best_model.h5')

예측 또는 평가

test_pred = best_model.predict(X_valid)
test_loss, test_acc = best_model.evaluate(X_valid, y_valid)

Matplotlib

pyplot

딥러닝 모델 성능 평가

import matplotlib.pyplot as plt

plt.figure()
plt.plot(hist.history['mse'], label='mse')
plt.plot(hist.history['val_mse'], label='val_mse')
plt.title('Model mse')
plt.xlabel('Epochs')
plt.ylabel('mse')

'AI' 카테고리의 다른 글

AWS AIF-C01 시험 정리(2) (0)	2025.07.09
AWS AIF-C01 시험 정리(1) (0)	2025.07.07
딥러닝(2) - RNN, GAN (0)	2025.03.24
딥러닝(1) - ANN, DNN, CNN (0)	2025.03.23

'AI' 관련글

Comments

개발새발 쓰여진 글

개발새발 쓰여진 글

AICE Associate 시험 총정리 본문

AICE Associate 시험 총정리

시험 출제 범위

라이브러리 별 자주 활용된 함수

'AI' 카테고리의 다른 글

티스토리툴바