Deeplearning
< Deep Learning History >
Google Translation - AlphaGo - Camera Filter - Object Detection (ex. detecting pedestrians on street through )
< Milestones >
perceptron
AI Limitation: Needed alot of amout of calculation
Multi-layered Perceptron (Backprogagation)
SVM
Deep Neural Network (Pretraining) --> After the develop of GPU
< What is Deep Learning? >
- one method of machine learning
- uses alot of layers of nonlinear machie cascade
- based on ai neuron networks
< AI cs ML - DL >
Data : Image(Cat picture) + Label (cat) -> supervised learning
< ML Framework > iterative
Data - algorithm - loss (gives feedback and update parameters)
< Example: Stock price prediction >
Neural Networks don't require expert knowledge
Input data - put it through edge - multiply weight + add bias + activation function (nonlinear func) - get predicted result - Calculate Loss ( y, y_)
< practice: med data >
Data -> Algorithm -> Loss
Data: put all data.
Algo : make y^
Loss: loss(y, y^) = 1
go to algo tell that the value was 1
in the end the loss is 0 because it keeps on learning
< What if making more layer? making deeper? >
input - hidden1 - hidden2 - output layer
< Activation Function - non-linear func >
Why do we need to use non-linear activation functions?
Only linear func will be no use... :(
1. Sigmoid fun
2. tanh func
3. ReLU func ==> max(0,x)
etc...
[[[ What I learned on this lecture ]]]]
- Difference of AI, ML, DL
- ML framework - data - algo - loss
- Why NN model is practical, useful - requires no expertised knowledge
- The role of activation func : non-linear
< Activation Function >
It makes it non-linear structure to separate the characteristics
< Forward, Backward propagation >
Forward prop : Data -> algo -> loss
Backward pop : loss <- algo
gradient : w-loss의 미분값으로 해서 Backward pop 한다.
< How to organize loss func? >
Gradient Descent
Momentum
Adam
RMSProp
< Batch Stochastic Gradient Descent >
Batch : Data Bundle
< Learning rate decay >
Why decrease learning rate?
loss function 아래로 볼록해서 최저값으로 가야해서 조금씩 줄여가는 것.
1. Step decay
In the middle of the step -> 0.1 multiply
2. Exponential decay
e^-t
3. Cosine decay
Pytorch
- Easy to use innerproduct on pytorch, numpy compared to list.
Super uses the child class's parameters to pass on to the parent class (nn.Module).
data = torch.randn(2,3) #tensor is a class / m=0, s=1
data = torch.rand(2,3) #[0,1]
tor1 = torch.tensor([1, 2])
tor2 = torch.tensor([3, 4])
import torch.nn as nn
import torch.optim as optim
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(in_features = 100, out_features = 100, bias = True)
self.fc1_act = nn.ReLU()
self.fc2 = nn.Linear(in_features = 100, out_features = 10, bias = True)
def forward(self, x):
out = self.fc1(x)
out = self.fc1_act(out)
out = self.fc2(out)
return out
net = Net()
print(net)
batch_size = 10
input = torch.randn(batch_size, 100)
output = net(input)
targt = torch.randn(batch_size, 10)
criterion = nn.MSELoss() # mean squared loss
loss = criterion(output, target)
print(loss)
ex) neural network learning
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt # to make plot
from sklearn import datasets
n_pts = 500 # number of points
X, y = datasets.make_circles(n_samples=n_pts, random_state=123, noise=0.2, factor=0.3)
x_data = torch.Tensor(X)
y_data = torch.Tensor(y.reshape(500, 1))
# classification problem
# two class. Need to distinguish red blue dots.
def scatter_plot():
plt.scatter(X[y==0, 0], X[y==0, 1], color='red')
plt.scatter(X[y==1, 0], X[y==1, 1], color='blue')
scatter_plot()
class Model(nn.Module):
def __init__(self, input_size, H1, output_size):
super().__init__()
self.linear1 = nn.Linear(input_size, H1)
self.linear2 = nn.Linear(H1, output_size)
def forward(self, x):
x = torch.sigmoid(self.linear1(x))
x = torch.sigmoid(self.linear2(x))
return x
def predict(self, x):
return i if self.forward(x) >= 0.5 else 0
criterion = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
epochs = 100
losses = []
for i in range(epochs):
y_pred = model.forward(x_data)
loss = criterion(y_pred, y_data)
print("epochs:", i, "loss:", loss.item())
losses.append(loss, item())
loss.backward()
optimizer.step()
Computer Vision
Computer Vision: aims to create autonomous systems that perform some tasks that the human perspective can do.
< Computer Vision Field >
1. Object Detection (CCTV, medical video, self-driving car)
2. OCR: Optimal Character Recognition ( + NLP)
3. Face Recognition
4. Artistic Images
< Image Data that computer sees. >
- Image Data is a matrix
- Sees in 2 dimensional or 3 dimensional tensor
- Each matrix elements have 0~255 number --->> we call this pixel
< Gray-scale picture vs normal RGB pic >
- Gray-scale pic : 2 dim ( length x width )
- RGB pic : 3 dim ( length x width x 3 )
< human vs com >
if image changes a little -> com gets incorrect!!
==> sensative of illumination, deformation, occlusion, background clutter
Convolution
find edges -> remove background -> find corners
< Important CV Tasks >
1. Classification (분류)
2. Object Detection (물체 인식)
3. Instance Segmentation
< Classification >
- predicting which one is a cat, airplane, lion, human ... etc
- so, create an algorithm to predict what the label of the picture is.
- In general, the output of an algorithm is output as a probability vector.
+) softmax func
Fully- Connected Layer ==> two many parameters, hard to use image features
< Convolutional Layer >
- Convolutional Layer uses spatial feature well
- Convolve the filter with the image
- through dot product calculation make the image spatially slide
- down size it by filtering
Image: 224 x 224 x 3(channel)
Filter: 5 x 5 x 3(w)
< Padding >
- adding edge of 0 to make the size same
< Stride >
- the distance of moving
< Confusion matrix >
def plot_confusion_matrix(cm, target_names=None, cmap=None, normalize=True, labels=True, title='Confusion matrix'):
accuracy = np.trace(cm) / float(np.sum(cm)) #trace = matrix diagonal term의 summation, np.sum은 matrix의 all elements의 합 -> 전체 엘러먼트의 합이 분모, 올바르게 예측한 엘러먼트가 분자
misclass = 1 - accuracy #error
if cmap is None:
cmap = plt.get_cmap('Blues')
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plt.figure(figsize=(20, 15))
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
thresh = cm.max() / 1.5 if normalize else cm.max() / 2
if target_names is not None:
tick_marks = np.arange(len(target_names))
plt.xticks(tick_marks, target_names)
plt.yticks(tick_marks, target_names)
if labels:
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
if normalize:
plt.text(j, i, "{:0.4f}".format(cm[i, j]),
horizontalalignment = 'center',
color='white' if cm[i, j]>thresh else 'black')
else:
plt.text(j, i, "{:,}".format(cm[i,j]),
horizontalalignment = 'center',
color='white' if cm[i,j]>thresh else 'black')
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predected label \naccuracy={:.4f}; misclass={:.4f}'.format(accuracy, misclass))
plt.show()
# 가상의 오차 행렬 생성
cm = np.array([[100, 10, 5],
[20, 200, 15],
[30, 25, 150]])
# 오차 행렬 시각화
plot_confusion_matrix(cm,
target_names=['Class 1', 'Class 2', 'Class 3'],
title='Example Confusion Matrix',
normalize=True)
< MNIST practice >
0~9 handwritten data
import torch
import torch.nn as nn #linear, other operation (neural network) 효과적인 사용을 위한 라이브러리
import torch.nn.functional as F #various activation functions for model
#dataset / dataset augmentatoin
import torchvision #다양한 pretrained model 로드 가능
import torchvision.datasets as vision_dsets
import torchvision.transforms as T #transformation funcions to manipylate images
#optimizer
import torch.optim as optim #다양한 optimization functions for model
from torch.autograd import Variable
from torch.utils import data
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import itertools
def MNIST_DATA(root='./data', train=True, transforms=None, download=True, batch_size=32, num_workers=1):
print("[+] Get the MNIST DATA")
"""
We will use Mnist data for our tutorial
"""
mnist_train = vision_dsets.MNIST(root=root, # root is the place to store your data.
train=True,
transform=T.ToTensor(), # convert data to tensor
download=True) # whether to download the data
mnist_test = vision_dsets.MNIST(root=root,
train=False,
transform=T.ToTensor(),
download=True)
"""
batch size * data shape
Data Loader is an iterator that fetches the data with the number of desired batch size.
* Practical Guide : What is the optimal batch size?
- Usually, higher is better.
- We recommend using it as a multiple of 2 to efficiently utilize the GPU memory (related to bit size)
"""
trainDataLoader = data.DataLoader(dataset=mnist_train, # information about your data type
batch_size=batch_size, # batch size
shuffle=True, # Whether to shuffle your data for every epoch. (Very important for training performance)
num_workers=num_workers) # number of workers to load your data. (usually number of CPU cores)
testDataLoader = data.DataLoader(dataset=mnist_test,
batch_size=batch_size,
shuffle=False, # we don't actually need to shuffle data for testing
num_workers=num_workers)
print('[+] Finished loading data & Preprocessing')
return mnist_train, mnist_test, trainDataLoader, testDataLoader
trainDset, testDset, trainDataLoader, testDataLoader = MNIST_DATA(batch_size=32)
Define trainer
class Trainer():
def __init__(self, trainloader, testloader, net, optimizer, criterion):
"""
trainloader: train data's loader
testloader: test data's loader
net: model to train
optimizer: optimizer to update your model
criterion: loss function
"""
self.trainloader = trainloader
self.testloader = testloader
self.net = net
self.optimizer = optimizer
self.criterion = criterion
def train(self, epoch=100):
"""
epoch: number of times each training sample is used
"""
self.net.train() #net_eval()
for e in range(epoch):
running_loss = 0.0
for i, data in enumerate(self.trainloader, 0):
#get the inputs
inputs, labels = data[0], data[1] #return type for data in dataloader is tuple of (input_data, labels)
inputs = inputs.cuda() #gpu training
labels = labels.cuda()
#zero the parameter gradients
self.optimizer.zero_grad()
# Q1) what if we didnt clear up the gradients?
#forward+backward+optimize
outputs=self.net(inputs) #get output after passing through the network
loss = self.criterion(outputs, labels) #compute models's score using the loss function
loss.backward() #perform back-propagation from the loss
self.optimizer.step() #perform gradient descent with given optimizer
#printstatistics
running_loss+= loss.item()
if (i+1) % 500 == 0: #print every 500 mini batches #500번의 미니배치마다 로스값의 평균값 계산
print('[%d, %5d] loss: %.3f' % (e+1, i+1, running_loss/500))
running_loss=0.0
print('Finished Training')
def test(self):
self.net.eval() #Q2) why should we change the network into eval-model?
test_loss = 0
correct = 0
for inputs, labels in self.testloader:
inputs = inputs.cuda()
labels = labels.cuda()
output = self.net(inputs) #32*10 - 클래스의 개수만큼 배치 개수 존재 = 0~9까지의 예측값이 존재
pred = output.max(1, keepdim=True)[1] #get the index of the max #size = 32
correct += pred.eq(labels.view_as(pred)).sum().item() #0~32 사이의 correct 개수
test_loss /= len(self.testloader.dataset) #10000
print('\nTest set: Accuracy: {}/{} ({:.0f}%)\n'.
format(correct, len(self.testloader.dataset),
100.*correct / len(self.testloader.dataset)))
def get_conf(self):
self.net.eval()
confusion = torch.zeros(10,10)
for inputs, labels in self.testloader:
inputs = inputs.cuda()
labels = labels.cuda()
output = self.net(inputs)
pred = torch.argmax(output, dim=1)
for num in range(output.shape[0]):
confusion[pred[num], labels[num]] +=1
return confusion
< Basic CNN Structure >
LeNet
- 5 x 5 is kernal size
- s is stride
- 120 is wrong. it is 420
AlexNet
Norm: normalizing per channel
< VGG Network >
- filter size decreases to 3
- depth gets deeper
Recent CNNs
Data -> Algorithm (CNN) -> softmax, label one-hot vector -> Loss -> Back prop -> optimal solution
< ResNet >
more deeper more better...
but! there is a problem of
parameter up , overfitting warning
< EfficientNet >
compare width, depth, resolution and analyze the relation.
(a) baseline
(b) width scaling
(c) depth scaling
(d) resolution scaling
(e) compound scaling
parameter up -> good performance, but.. slow calculation, gets lot of storage
parameter down -> bad performance, but.. fast calculation, gets little storage
Transfer Learning
in new task, to enhance learning, bring the learned source model to the target model
1. in order to more develop model performance.
2. in order to use the pre-trained model.
How to train transfer learning
1. learn the entire model
2. freeze some of the convolutional layer, and learn the other layer.
3. freeze all the convolutional layer, only learn the fully connected layer
Object Detection
: task of recognizing all the objects(dogs, cats) -> bounding box
< object detection pipeline >
classification + localization
< RCNN >
1. Input image
2. extract region proposals by lots of bounding box (about 2000?)
3. compute CNN features ( only convolutional filter )
4. flatten
5. classify regions
warp(crop, resize) the extraction to the same size.
in order to do the image classification with the same size.
two object class => dog cat
but needs one more for background classification.
Bbox regression : predicts the box position
SVM: support vector machine -> classifies the classes
**BUT TOOO SLOOWW
< Fast RCNN >
- extract the feature through CNN first, and then do the region proposal.
< YOLO >
- stop region proposal. it takes too much time.
- divide the input image first with grid. and then predict the bounding box and class on each grid.
- also predict the confidence of each box as well.
'Study > Computer Vision & Deep Learning' 카테고리의 다른 글
Sprint 1 Lecture Code Note (0) | 2024.03.04 |
---|---|
CNN (0) | 2024.03.01 |
Neural Networks (0) | 2024.02.27 |