본문 바로가기

Study/Computer Vision & Deep Learning

Study notes in Deep Learning and Computer Vision

 Deeplearning

< Deep Learning History >

Google Translation - AlphaGo - Camera Filter - Object Detection (ex. detecting pedestrians on street through ) 

 

 < Milestones >

perceptron

AI Limitation: Needed alot of amout of calculation 

Multi-layered Perceptron (Backprogagation)

SVM

Deep Neural Network (Pretraining) --> After the develop of GPU

 

< What is Deep Learning? >

- one method of machine learning

- uses alot of layers of nonlinear machie cascade

- based on ai neuron networks

 

< AI cs ML - DL >

Data : Image(Cat picture) + Label (cat) -> supervised learning 

 

< ML Framework > iterative

Data - algorithm - loss (gives feedback and update parameters) 

 

< Example: Stock price prediction > 

Neural Networks don't require expert knowledge

 

Input data - put it through edge - multiply weight  + add bias + activation function (nonlinear func) - get predicted result - Calculate Loss ( y, y_) 

 

< practice: med data >

Data -> Algorithm -> Loss

Data: put all data. 

Algo : make y^

Loss: loss(y, y^) = 1

go to algo tell that the value was 1 

in the end the loss is 0 because it keeps on learning 

 

< What if making more layer? making deeper? >

input - hidden1 - hidden2 - output layer 

 

< Activation Function - non-linear func > 

Why do we need to use non-linear activation functions?
Only linear func will be no use... :( 

1. Sigmoid fun

2. tanh func

3. ReLU func ==> max(0,x)

etc...

 

[[[ What I learned on this lecture ]]]]

  • Difference of AI, ML, DL
  • ML framework - data - algo - loss
  • Why NN model is practical, useful - requires no expertised knowledge
  • The role of activation func : non-linear  

< Activation Function > 

It makes it non-linear structure to separate the characteristics

 

< Forward, Backward propagation >

Forward prop :  Data -> algo -> loss

Backward pop :  loss <- algo 

gradient : w-loss의 미분값으로 해서 Backward pop 한다. 

 

< How to organize loss func? >

Gradient Descent

Momentum 

Adam

RMSProp

 

< Batch Stochastic Gradient Descent >

Batch : Data Bundle 

 

< Learning rate decay >

Why decrease learning rate?

loss function 아래로 볼록해서 최저값으로 가야해서 조금씩 줄여가는 것. 

1. Step decay

In the middle of the step -> 0.1 multiply

2. Exponential decay

e^-t

3. Cosine decay   

 


Pytorch 

- Easy to use innerproduct on pytorch, numpy compared to list. 

 

nn.Module class must be inherited and needs to implement  __init__, forward method. 
• All networks implemented with PyTorch should do this.

Super uses the child class's parameters to pass on to the parent class (nn.Module).

 

data = torch.randn(2,3) #tensor is a class / m=0, s=1

data = torch.rand(2,3) #[0,1]

 

tor1 = torch.tensor([1, 2])

tor2 = torch.tensor([3, 4])

 

import torch.nn as nn
import torch.optim as optim

class Net(nn.Module):
  def __init__(self):
     super(Net, self).__init__()
     self.fc1 = nn.Linear(in_features = 100, out_features = 100, bias = True) 
     self.fc1_act = nn.ReLU()
     self.fc2 = nn.Linear(in_features = 100, out_features = 10, bias = True)
  
  def forward(self, x):
     out = self.fc1(x)
     out = self.fc1_act(out)
     out = self.fc2(out)
     
     return out
     
net = Net()
print(net) 

batch_size = 10
input = torch.randn(batch_size, 100)
output = net(input)

targt = torch.randn(batch_size, 10)
criterion = nn.MSELoss() # mean squared loss

loss = criterion(output, target)
print(loss)

 

ex) neural network learning

import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt # to make plot
from sklearn import datasets

n_pts = 500 # number of points
X, y = datasets.make_circles(n_samples=n_pts, random_state=123, noise=0.2, factor=0.3)

x_data = torch.Tensor(X)
y_data = torch.Tensor(y.reshape(500, 1))

# classification problem 
# two class. Need to distinguish red blue dots. 

def scatter_plot():
  plt.scatter(X[y==0, 0], X[y==0, 1], color='red')
  plt.scatter(X[y==1, 0], X[y==1, 1], color='blue')
 
scatter_plot()


class Model(nn.Module):
  def __init__(self, input_size, H1, output_size):
     super().__init__()
     self.linear1 = nn.Linear(input_size, H1) 
     self.linear2 = nn.Linear(H1, output_size)
  
  def forward(self, x):
     x = torch.sigmoid(self.linear1(x))
     x = torch.sigmoid(self.linear2(x))
     return x
     
  def predict(self, x):
     return i if self.forward(x) >= 0.5 else 0
     
criterion = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

epochs = 100
losses = []

for i in range(epochs):
  y_pred = model.forward(x_data)
  loss = criterion(y_pred, y_data)
  
  print("epochs:", i, "loss:", loss.item())
  
  losses.append(loss, item())
  loss.backward()
  optimizer.step()

 


Computer Vision

Computer Vision: aims to create autonomous systems that perform some tasks that the human perspective can do.

 

 

 

 

 

 

 

 

 

 

 

< Computer Vision Field >

1. Object Detection (CCTV, medical video, self-driving car) 

2. OCR: Optimal Character Recognition ( + NLP)

3. Face Recognition

4. Artistic Images

 

< Image Data that computer sees. >

- Image Data is a matrix

- Sees in 2 dimensional or 3 dimensional tensor 

- Each matrix elements have 0~255 number --->> we call this pixel 

 

< Gray-scale picture vs normal RGB pic >

- Gray-scale pic : 2 dim ( length x width )

- RGB pic : 3 dim ( length x width x 3 ) 

 

< human vs com >

if image changes a little -> com gets incorrect!!

==> sensative of illumination, deformation, occlusion, background clutter 

 


Convolution 

find edges -> remove background -> find corners

 

< Important CV Tasks >
1. Classification (분류)

2. Object Detection (물체 인식)

3. Instance Segmentation 

 

 

 

 

 

 

 

 

 

 

 

 

< Classification >

- predicting which one is a cat, airplane, lion, human ... etc

- so, create an algorithm to predict what the label of the picture is.

- In general, the output of an algorithm is output as a probability vector.

+) softmax func 

 

Fully- Connected Layer ==> two many parameters, hard to use image features

 

< Convolutional Layer >

- Convolutional Layer uses spatial feature well

- Convolve the filter with the image

- through dot product calculation make the image spatially slide

- down size it by filtering 

 

Image: 224 x 224 x 3(channel)

Filter: 5 x 5 x 3(w)  

 

< Padding >

- adding edge of 0 to make the size same 

 

< Stride >

- the distance of moving 

 

< Confusion matrix > 

def plot_confusion_matrix(cm, target_names=None, cmap=None, normalize=True, labels=True, title='Confusion matrix'):
    accuracy = np.trace(cm) / float(np.sum(cm)) #trace = matrix diagonal term의 summation, np.sum은 matrix의 all elements의 합 -> 전체 엘러먼트의 합이 분모, 올바르게 예측한 엘러먼트가 분자
    misclass = 1 - accuracy #error
    
    if cmap is None:
        cmap = plt.get_cmap('Blues')

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
    plt.figure(figsize=(20, 15))
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()

    thresh = cm.max() / 1.5 if normalize else cm.max() / 2
    
    if target_names is not None:
        tick_marks = np.arange(len(target_names))
        plt.xticks(tick_marks, target_names) 
        plt.yticks(tick_marks, target_names)

    if labels:
        for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
            if normalize:
                plt.text(j, i, "{:0.4f}".format(cm[i, j]),
                    horizontalalignment = 'center',
                    color='white' if cm[i, j]>thresh else 'black')
            
            else:
                plt.text(j, i, "{:,}".format(cm[i,j]),
                            horizontalalignment = 'center',
                         color='white' if cm[i,j]>thresh else 'black')
                
    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predected label \naccuracy={:.4f}; misclass={:.4f}'.format(accuracy, misclass))
    plt.show()

# 가상의 오차 행렬 생성
cm = np.array([[100, 10, 5],
               [20, 200, 15],
               [30, 25, 150]])

# 오차 행렬 시각화
plot_confusion_matrix(cm, 
                      target_names=['Class 1', 'Class 2', 'Class 3'], 
                      title='Example Confusion Matrix',
                      normalize=True)

 

 

 

 

< MNIST practice > 

0~9 handwritten data 

import torch
import torch.nn as nn #linear, other operation (neural network) 효과적인 사용을 위한 라이브러리 
import torch.nn.functional as F #various activation functions for model

#dataset / dataset augmentatoin
import torchvision #다양한 pretrained model 로드 가능
import torchvision.datasets as vision_dsets
import torchvision.transforms as T #transformation funcions to manipylate images

#optimizer
import torch.optim as optim #다양한 optimization functions for model


from torch.autograd import Variable
from torch.utils import data

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import itertools


def MNIST_DATA(root='./data', train=True, transforms=None, download=True, batch_size=32, num_workers=1):
    print("[+] Get the MNIST DATA")
    """
    We will use Mnist data for our tutorial
    """
    mnist_train = vision_dsets.MNIST(root=root, # root is the place to store your data.
                                         train=True,
                                         transform=T.ToTensor(), # convert data to tensor
                                         download=True) # whether to download the data

    mnist_test = vision_dsets.MNIST(root=root,
                                        train=False,
                                        transform=T.ToTensor(),
                                        download=True)

    """
    batch size * data shape
    Data Loader is an iterator that fetches the data with the number of desired batch size.
    * Practical Guide : What is the optimal batch size?
        - Usually, higher is better.
        - We recommend using it as a multiple of 2 to efficiently utilize the GPU memory (related to bit size)
    """
    trainDataLoader = data.DataLoader(dataset=mnist_train, # information about your data type
                                      batch_size=batch_size, # batch size
                                      shuffle=True, # Whether to shuffle your data for every epoch. (Very important for training performance)
                                      num_workers=num_workers) # number of workers to load your data. (usually number of CPU cores)

    testDataLoader = data.DataLoader(dataset=mnist_test,
                                     batch_size=batch_size,
                                     shuffle=False, # we don't actually need to shuffle data for testing
                                     num_workers=num_workers)

    print('[+] Finished loading data & Preprocessing')
    return mnist_train, mnist_test, trainDataLoader, testDataLoader

trainDset, testDset, trainDataLoader, testDataLoader = MNIST_DATA(batch_size=32)

 

Define trainer

class Trainer():
    def __init__(self, trainloader, testloader, net, optimizer, criterion):
        """
        trainloader: train data's loader
        testloader: test data's loader
        net: model to train
        optimizer: optimizer to update your model
        criterion: loss function
        """
        self.trainloader = trainloader
        self.testloader = testloader
        self.net = net
        self.optimizer = optimizer
        self.criterion = criterion

    def train(self, epoch=100):
        """
        epoch: number of times each training sample is used
        """
        self.net.train() #net_eval()
        for e in range(epoch):
            running_loss = 0.0
            for i, data in enumerate(self.trainloader, 0):
                #get the inputs
                inputs, labels = data[0], data[1] #return type for data in dataloader is tuple of (input_data, labels)
                inputs = inputs.cuda() #gpu training
                labels = labels.cuda()
                #zero the parameter gradients
                self.optimizer.zero_grad()
                # Q1) what if we didnt clear up the gradients?

                #forward+backward+optimize
                outputs=self.net(inputs) #get output after passing through the network
                loss = self.criterion(outputs, labels) #compute models's score using the loss function
                loss.backward() #perform back-propagation from the loss
                self.optimizer.step() #perform gradient descent with given optimizer

                #printstatistics
                running_loss+= loss.item()
                if (i+1) % 500 == 0: #print every 500 mini batches #500번의 미니배치마다 로스값의 평균값 계산
                    print('[%d, %5d] loss: %.3f' % (e+1, i+1, running_loss/500))
                    running_loss=0.0
        print('Finished Training')

    def test(self):
        self.net.eval() #Q2) why should we change the network into eval-model?
        test_loss = 0
        correct = 0
        for inputs, labels in self.testloader:
            inputs = inputs.cuda()
            labels = labels.cuda()
            output = self.net(inputs) #32*10 - 클래스의 개수만큼 배치 개수 존재 = 0~9까지의 예측값이 존재
            pred = output.max(1, keepdim=True)[1] #get the index of the max #size = 32
            correct += pred.eq(labels.view_as(pred)).sum().item() #0~32 사이의 correct 개수

            test_loss /= len(self.testloader.dataset) #10000
        print('\nTest set: Accuracy: {}/{} ({:.0f}%)\n'.
              format(correct, len(self.testloader.dataset),
                     100.*correct / len(self.testloader.dataset)))
        
    def get_conf(self):
        self.net.eval()

        confusion = torch.zeros(10,10)
        for inputs, labels in self.testloader:
            inputs = inputs.cuda()
            labels = labels.cuda()
            output = self.net(inputs)
            pred = torch.argmax(output, dim=1)

            for num in range(output.shape[0]):
                confusion[pred[num], labels[num]] +=1

        return confusion

 

< Basic CNN Structure > 

 

LeNet

 

 

- 5 x 5 is kernal size

- s is stride

- 120 is wrong. it is 420

 

 

AlexNet 

 

 

Norm: normalizing per channel 

 

 

 

 

 

 

 

 

< VGG Network >

- filter size decreases to 3

- depth gets deeper 


Recent CNNs

Data -> Algorithm (CNN) -> softmax, label one-hot vector -> Loss -> Back prop -> optimal solution

 

< ResNet > 

more deeper more better...

 but! there is a problem of 

 parameter up , overfitting warning 

 

< EfficientNet > 

compare width, depth, resolution and analyze the relation. 

(a) baseline

(b) width scaling

(c) depth scaling

(d) resolution scaling 

(e) compound scaling 

 

 

 

 

 

 

 

 

 

 

 

parameter up -> good performance, but.. slow calculation, gets lot of storage 

parameter down -> bad performance, but.. fast calculation, gets little storage 


Transfer Learning 

in new task, to enhance learning, bring the learned source model to the target model 

1. in order to more develop model performance. 

2. in order to use the pre-trained model. 

 

 How to train transfer learning 

1. learn the entire model 

2. freeze some of the convolutional layer, and learn the other layer. 

3. freeze all the convolutional layer, only learn the fully connected layer

 


Object Detection

: task of recognizing all the objects(dogs, cats) -> bounding box

 

< object detection pipeline >

classification + localization 

 

< RCNN >

1. Input image 

2. extract region proposals by lots of bounding box (about 2000?)

3. compute CNN features ( only convolutional filter ) 

4. flatten 

5. classify regions 

 

 

 

 

 

 

 

 

warp(crop, resize) the extraction to the same size.  

in order to do the image classification with the same size. 

 

two object class => dog cat 

but needs one more for background classification. 

 

 

 

Bbox regression : predicts the box position 

SVM: support vector machine -> classifies the classes

 

**BUT TOOO SLOOWW

 

 

 

 

 

< Fast RCNN >

- extract the feature through CNN first, and then do the region proposal. 

 

 

 

 

 

 

 

 

 

 

< YOLO > 

- stop region proposal. it takes too much time. 

- divide the input image first with grid. and then predict the bounding box and class on each grid. 

- also predict the confidence of each box as well. 

 

 

 

 

 

 

 

 

 

 

 

 

'Study > Computer Vision & Deep Learning' 카테고리의 다른 글

Sprint 1 Lecture Code Note  (0) 2024.03.04
CNN  (0) 2024.03.01
Neural Networks  (0) 2024.02.27