Pages

Machine Learning Quiz (134 Objective Questions) Start ML Quiz

Deep Learning Quiz (205 Objective Questions) Start DL Quiz

Deep Learning Free eBook Download

Thursday, 29 April 2021

100 basic terms and comparisons from PMBOK which a PMP certified project manager must know

A PMP Certified Project Manager should be knowing following comparisons. These comparisons can be asked in a project management interview for PMP certified project managers. I have divided all these terms and comparisons based upon the knowledge areas.

General Introduction

1) Project vs Operation

2) Product Life Cycle vs Project Life Cycle

3) Project Phase vs Phase Gate

4) Development Approach: Predictive vs Adaptive vs Hybrid vs Iterative vs Incremental

5) Portfolio Management vs Program Management vs Project Management

6) Organizational Structure Type: Strong Matrix vs Weak Matrix vs Balanced Matrix

7) Types of PMO: Supportive vs Controlling vs Directive

8) Powers of Project Manager: Referent vs Positional vs Situational vs Coercive vs Ingratiating vs Persuasive

9) Leadership Styles: Laissez-faire vs Servant Leader vs Transactional vs Transformational vs Charismatic vs Interactional 

10) Project Manager vs Project Coordinator vs Project Expeditor

11) PMI Triangle: Technical Project Management vs Strategic and Business Management vs Leadership

12) Code of Ethics and Professional Conduct: Aspirational Standards and Mandatory Standards

Integration Management

13) OPA (Organizational Process Assets) vs EEF (Enterprise Environmental Factors)

14) Business Case vs Benefits Management Plan

15) Configuration Management Plan vs Change Management Plan

16) Project Charter vs Team Charter

17) Assumption log: Assumptions vs Constraints

18) Change Requests: Preventive actions vs Corrective actions vs Defect repair

19) Knowledge Management vs Information Management

20) Explicit Knowledge vs Tacit Knowledge

21) Work/Job Shadowing vs Reverse Shadowing

22) Work Performance Data vs Work Performance Information vs Work Performance Report

23) Work Performance Reports: Status report vs Progress report

24) Payback Period vs Return on Investment (ROI) vs Internal Rate of Return (IRR) vs Discounted Cash Flow vs Net Present Value (NPV) vs Benefit Cost Ratio (BCR)

Scope Management

25) Project Charter vs Project Scope Statement

26) WBS: Work Package vs Planning Package

27) WBS: Work Package vs Activities

28) Scope Baseline vs Schedule Baseline vs Cost Baseline vs Performance Measurement Baseline

29) Validate Scope Process vs Control Quality Process

30) Scope Creep and Gold Plating

31) Product Scope vs Project Scope

32) Inspection vs Audit

33) Verification vs Validation

34) Work Breakdown Structure vs Resource Breakdown Structure vs Risk Breakdown Structure vs Organizational Breakdown Structure

35) Brainstorming vs Brain-writing vs Nominal Group Technique

36) Decision Making: Autocratic vs Multi-criteria vs Voting (Unanimity vs Majority vs Plurality)

Schedule Management

37) Iterative Scheduling vs On-demand Scheduling

38) Precedence Diagramming Method vs Arrow Diagramming Method

39) Finish to Start (FS) vs Finish to Finish (FF) vs Start to Start (SS) vs Start to Finish (SF)

40) Dependencies: Mandatory vs Discretionary vs External vs Internal

41) Schedule Network Diagram: Path Convergence vs Path Divergence

42) Critical Path Method vs Critical Chain Method

43) Early Start vs Early Finish vs Late Start vs Late Finish

44) Forward Pass vs Backward Pass

45) Leads vs Lags

46) Total Float vs Free Float

47) Project Schedule: Gantt charts vs Bar charts vs Milestone charts vs Summary charts

48) Fast-tracking vs Crashing

49) Resource Optimization vs Resource Levelling

50) Project Calendar vs Resource Calendar

51) Contingency Reserve (Known Unknown) vs Management Reserve (Unknown Unknown)

52) Triangular Distribution vs PERT/Beta Distribution

53) Analogous Estimating vs Parametric Estimating vs Three-point Estimating vs Bottom-up Estimating

54) Accuracy and Precision

55) Activity List vs Milestone List

56) Student Syndrome vs Parkinson's Law

Cost Management

57) Rough Order of Magnitude Estimate (-25% to +75%) vs Budgeted Estimate (-10% to +25%) vs Definitive Estimate (-5% to +10%)

58) Planned Value (PV) vs Earned Value (EV) vs Actual Cost (AC)

59) Cost variance (CV = EV - AC) vs Schedule variance (SV = EV - PV)

60) Cost Performance Index (CPI = EV/AC) vs Schedule Performance Index (SPI = EV/PV)

61) Budget at Completion (BAC) vs Estimate at Completion (EAC) vs Variance at Completion (VAC = BAC - EAC)

62) EAC Calculation: EAC forecast for ETC work performed at the budgeted rate (EAC = AC + (BAC – EV)) vs EAC forecast for ETC work performed at the present CPI (EAC = BAC / CPI) vs EAC forecast for ETC work considering both SPI and CPI factors (EAC = AC + [(BAC – EV) / (CPI × SPI)]) vs To-complete performance index (TCPI = Work Remaining / Fund Remaining)

Quality Management

63) Quality vs Grade

64) Attribute Sampling vs Variable Sampling

65) Continual Improvement: Plan-do-check-act (PDCA) vs Total Quality Management(TQM) vs Six Sigma vs Lean Six Sigma

66) Cost of Conformance vs Cost of Non-conformance

67) Cost of Conformance: Prevention cost vs Appraisal cost

68) Cost of Non-Conformance: Internal Failure Cost vs External Failure Cost

69) Control Charts vs Flowcharts vs Histograms vs Cause-and-effect Diagrams vs Matrix Diagrams vs Scatter Diagrams vs Affinity Diagrams vs Mind Mapping

Resource Management

70) Matrix: Responsibility Assignment Matrix (RAM / RACI) vs Probability and Impact Matrix vs Stakeholder Engagement Assessment Matrix

71) Tuckman Ladder: Forming vs Storming vs Norming vs Performing vs Adjourning

72) Co-located Teams vs Virtual Teams

73) Conflict Management: Withdraw/Avoid vs Smooth/Accommodate vs Compromise/Reconcile (Lose-Lose) vs Force/Direct (Win-Lose) vs Collaborate/Problem Solve (Win-Win)

74) Develop Team vs Manage Team

75) Coaching vs Mentoring

Communication Management

76) Push vs Pull communication

77) Verbal vs Non-verbal communication (Paralinguistic, Vocal inflection, Pitch, Tone)

Risk Management

78) Risks vs Issues

79) Event vs Non-event Risks

80) Variability Risk vs Ambiguity Risk vs Emergent Risks (Unknowable Unknowns)

81) Secondary risk vs Residual risk

82) Risk Register vs Risk Report

83) Opportunities (Positive Risks) vs Threats (Negative Risks)

84) Strategies for Opportunities: Exploit vs Enhance vs Accept vs Share

85) Strategies for Threats: Avoid vs Mitigate vs Accept vs Transfer

86) Active Risk Acceptance vs Passive Risk Acceptance

87) Qualitative vs Quantitative Risk Analysis

88) Quantitative Risk Analysis: Simulations (Monte Carlo Analysis) vs Sensitivity Analysis (Tornado Diagrams) vs Decision Tree Analysis vs Influence Diagrams

89) Prompt Lists: PESTLE (political, economic, social, technological, legal, environmental) vs TECOP (technical, environmental, commercial, operational, political) vs VUCA (volatility, uncertainty, complexity, ambiguity)

90) Risk Assessment Parameters: Urgency vs Proximity vs Dormancy vs Propinquity vs Detectability

Procurement Management

91) Contract Types:  Fixed Price vs Cost Reimbursable vs Time and Material

92) Fixed Price Contracts: Firm Fixed Price (FFP) vs Fixed Price Incentive Fee (FPIF) vs Fixed Price with Economic Price Adjustments (FPEPA)

93) Cost-reimbursable Contracts: Cost plus fixed fee (CPFF) vs Cost plus incentive fee (CPIF) vs Cost plus award fee (CPAF)

94) Bid Documents: RFI (Request for Information) vs RFQ (Request for Quotation) vs RFP (Request for Proposal)

95) Scope Statement vs Statement of Work (SOW) vs Terms of Reference (TOR)

96) Make vs Buy Analysis

97) Selecting a Seller vs Awarding a Contract

98) Claims Administration: Claims vs Disputes vs Appeals

Stakeholder Management

99) Stakeholder Analysis: Power/Interest vs Power/Influence vs Impact/Influence

100) Engagement level of stakeholders: Unaware vs Resistant vs Neutral vs Supportive vs Leading

Wednesday, 11 September 2019

Preprocessing of raw data in NLP: Tokenization, Stemming, Lemmatization and Vectorization

NLP (Natural Language Processing) is a very interesting branch of Artificial Intelligence. Natural language is a language which we human use to communicate and interact with each other. In NLP, we are teaching computers to understand, interpret and manipulate human languages. In this article, we will focus on some of the preprocessing tasks which we perform on the raw data like Tokenization, Stemming, Lemmatization and Vectorization.

While processing a natural language which we human speak, we need to take care of following things:

1. Syntax: The sentence should be grammatically correct. The arrangement of words in a sentence should follow all the grammar rules defined by a language. 

2. Semantics: It deals with the meaning of words and their interpretation within sentences.

3. Pragmatics: Same as semantics but it also consider context in which the word is used.

Applications of NLP

Applications of NLP (Natural Language Processing) are unlimited. I have listed few of those:

1. Machine translation (like Google Translate)
2. Sentiment analysis (reviews and comments on e-commerce and social-networking sites)
3. Text classification, generation and automatic summarization

4. Automated question answering and conversational interfaces (like chatbots)
5. Personal assistants (like Alexa, Siri, Google Assistant, Cortana etc.)
6. Auto-correct grammatical mistakes (MS Word and Grammarly use NLP to check grammatical errors)
7. Spam filtering, auto-complete, auto-correct, auto-tagging, topic modelling, sentence segmentation, speech recognition, part of speech tagging, named entity recognition, duplicates detection and a lot more...

NLP Toolkit (library) in Python


There are a lot of libraries in Python for NLP but the most commonly used library is NLTK( Natural Language Toolkit). It provides very efficient modules for preprocessing and cleaning of raw data like removing punctuation, tokenizing, removing stopwords, stemming, lemmatization, vectorization, tagging, parsing, and more.

Pre-processing of raw data in NLP


Following are the basic steps which we need to perform while cleaning the raw data in NLP:

1. Remove Punctuation
2. Tokenization
3. Remove Stopwords

4. Stemming / Lemmatization
5. Vectorization

1. Remove Punctuation:
First of all, we should remove all the punctuation marks (like comma, semicolon, colon, apostrophe, quotation marks, dash, hyphen, brackets, braces, parentheses, ellipsis etc.) from the text as these carry negligible weight.


2. Tokenization: Now create a list of words used in the text. Each word is called a token. We can use regular expression to find out tokens from the sentences otherwise NLTK has efficient modules for this task.

3. Remove Stopwords: Now we need to remove all the stopwords from the token list. Stopwords are the words which occur frequently in a sentence but carry little weight (like the, for, is, and, or, been, to, this, that, but, if, in, a, as etc.).

4.1 Stemming: It is used to reduce the number of tokens just like removing stopwords. In this process, we reduce inflected words to their word stem or root. We keep only the semantic meaning of similar words.

Examples: 

1) Tokens like stemming and stemmed are converted to a token stem.

2) Tokens like working, worked, works and work are converted to a token work.

Points 1 and 2 clearly illustrate that how can we reduce the number of tokens in a token list using stemming. But wait! There is a problem. Consider following examples of stemming:

3) Tokens like meanness and meaning are converted to a token mean. Now this is wrong. Both tokens have different meanings, even then its treating both as same.

4) Tokens like goose and geese are converted to the tokens goos and gees respectively (it will just remove "e" suffix from both the tokens). Now this is again wrong. "geese" is just a plural of "goose", even then its treating both tokens as different.

Points 3 and 4 can be resolved using Lemmatization.

NLTK library has 4 stemmers:

1) Porter Stemmer
2) Snowball Stemmer
3) Lancaster Stemmer
4) Regex-based Stemmer


I mainly use Porter stemmer for stemming the tokens in my NLP code.

4.2: Lemmatization: We saw the limitation of stemming in above examples (3 and 4). We can overcome these limitations using Lemmatization. It is more powerful and sophisticated as compared to stemming and returns more accurate and meaningful words / tokens by considering the context in which the word is used in a sentence.

But the tradeoff is that, it is slower and complex as compared to the stemming.

Examples: 

1) Tokens like meanness and meaning are retained as it is instead of reducing it to mean (unlike stemming).

2) Tokens like goose and geese are converted to a token goose which is right. We should get rid of the token "geese" as it is just a plural of "goose".

I mainly use WordNet Lemmatizer present in NLTK library.

5. Vectorization: Machine Learning algorithms don't understand text. These need numeric data for matrix multiplications. Till now, we have just cleaned our tokens. So, in this process, we encode our final tokens into numbers to create feature vectors so that algorithms can understand. In other words, we will fit and transform vectorization methods to our preprocessed and cleaned data which we created till lemmatization.

Document-term matrix: Let's first understand this term before proceeding further. We use document term matrix to represent the words in the text in the form of matrix of numbers. The rows of the matrix represent the text responses to be analyzed, and the columns of the matrix represent the words / tokens from the text that are to be used in the analysis.

Types of Vectorization

There are mainly 3 types vectorization:

1) Count vectorization
2) N-grams vectorization
3) Term Frequency - Inverse Document Frequency (TF-IDF)


1) Count vectorization: It creates a document-term matrix which contains the count of each unique word / token in the text response.

2) N-grams vectorization: It creates a document-term matrix which also considers context of the word depending upon the value of N.

If N = 2, it is called bi-gram,
If N = 3, it is called tri-gram,
If N = 4, it is called four-gram and so on...

We need to be careful about value of N and choose it wisely.

Example: Consider a sentence "NLP is awesome". Count vectorization will create a column corresponding to each word in document-term matrix while N-gram will create columns like following in case of bi-gram:

"NLP is", "is awesome"

3) Term Frequency - Inverse Document Frequency (TF-IDF) - It is just like count vectorization but instead of count, it stores weightage of each word by using following  formula:








w(i, j) = weightage of a particular word "i" in a document "j"

tf(i, j) = term frequency of a word "i" in document "j" i.e. number of times the word "i" occurs in a document "j" divided by total number of words in document "j"

N = number of total documents

df(i) = number of documents containing the word "i"

So, in this way, TF-IDF considers two facts while calculating the weightage of a word or token: 

1) how frequent the word occurs in a particular document 
2) and how frequent that word occurs in other documents

Example: Consider that we have 10 text messages and one of the text messages is "NLP is awesome". No other message contains the word "NLP". Now lets calculate weightage of the word NLP.

tf(i, j) = number of times the word NLP occurs in the text message divided by the total number of words in the text message. It comes out to be (1/3) as there are three words and NLP occurs only one time.

N = 10 as there are 10 text messages. 

df(i) = number of text messages containing the word NLP which in our case is 1.

So, the final equation becomes:

Weightage of NLP = (1/3) * log(10/1)

In this way, we fill all the rows and column of document-term matrix in TF-IDF.

Thursday, 5 September 2019

Image Recognition: Text Detection (Optical Character Recognition) using Google Cloud Vision API

Google Cloud Vision API helps in label detection, face detection, logo detection, landmark detection and text detection (OCR: Optical Character Recognition). In this article, we will see how can we use Google Cloud Vision API to extract the text from the image? This is a step by step guide for text detection (OCR) using Google Cloud Vision API. Let's follow it.

I will directly start from step 5. First 4 steps are same as mentioned in my previous post on label detection using Google Cloud Vision API.

You can download my Jupyter notebook containing below code from here.
 
Step 5: Import required libraries

from googleapiclient.discovery import build
from oauth2client.client import GoogleCredentials
from base64 import b64encode

You may get import error "no module name..." if you have not already installed Google API Python client. Use following command to install it.

pip install --upgrade google-api-python-client

If you also get import error for oauth2client, you must install it using following command:

pip3 install --upgrade oauth2client

Step 6: Load credentials file

Load the credentials file (which we created in step 3 of my previous article) and create a service object using it.

CREDENTIAL_FILE = 'credentials.json'
credentials = GoogleCredentials.from_stream(CREDENTIAL_FILE)
service = build('vision', 'v1', credentials=credentials)

Step 7: Load image file (from which we need to extract the text)

I will load an image of cover page of my deep learning book and encode it so that it becomes compatible with the cloud vision API.



























IMAGE_FILE = book_cover_page.jpg'
with open(IMAGE_FILE, 'rb') as file:
    image_data = file.read()
    encoded_image_data = b64encode(image_data).decode('UTF-8')

Step 8: Create a batch request

We will create a batch request which we will send to the cloud vision API. In the batch request, we will include the above encoded image and the instruction as TEXT_DETECTION.

batch_request = [{
    'image':{'content':encoded_image_data},
    'features':[{'type':'TEXT_DETECTION'}],
}]

Step 9: Create a request

request = service.images().annotate(body={'requests':batch_request})

Step 10: Execute the request

response = request.execute()

This step will throw an error if you have not enabled billing (as mentioned in step 4 of my previous article). So, you must enable the billing in order to use Google Cloud Vision API. The charges are very reasonable. So, don't think too much and provide credit card details. For me, Google charged INR 1 and then refunded it back.

Step 11: Process the response

For error handling, include this code:

if 'error' in response:
    raise RuntimeError(response['error'])

We are interested in text annotations here. So, fetch it from the response and display the results.

labels = response['responses'][0]['textAnnotations']

extracted_text = extracted_texts[0]
print(extracted_text['description'], extracted_text['boundingPoly'])

Output:

Objective Type Questions and Answers in Deep Learning
Deep
Learning
ARTIFICIAL
INTELLIGENCE
MACHINE
LEARNING
DEEP
LEARNING
NARESH KUMAR
 {'vertices': [{'x': 42, 'y': 77}, {'x': 2365, 'y': 77}, {'x': 2365, 'y': 3523}, {'x': 42, 'y': 3523}]}

You can test the above code using different images and check the accuracy of the API.

Wednesday, 4 September 2019

Image Recognition: Label Detection using Google Cloud Vision API

Google Cloud Vision API helps in label detection, face detection, logo detection, landmark detection and text detection. In this article, we will see how can we use Google Cloud Vision API to identify labels in the image? This is a step by step guide for label detection using Google Cloud Vision API. Let's follow it.

Step 1: Setup a Google Cloud Account

A) Go to: https://console.cloud.google.com/
B) Login with your google credentials
C) You will see a dashboard. Create a Project if not already created.


Step 2: Enable Cloud Vision API

A) Go to console
B) Click on Navigation Menu
C) Click on API & Services >> Library
D) Search "cloud vision" and you will get the "Cloud Vision API". Enable this API if not already enabled.


Step 3: Download credentials file

A) Go to console
B) Click on Navigation Menu
C) Click on API & Services >> Credentials
D) Click on Create Credentials dropdown >> Service account key >> New service account
E) Enter Service account name
F) Select any role. I had selected Project >> Viewer
G) Save the file as JSON on your hard drive. Rename it to 'credentials.json'.

Step 4: Add billing information

A) Go to console
B) Click on Navigation Menu
C) Click on Billing

Now open the Jupyter notebook and try using this API. You can download my Jupyter notebook containing below code from here.

 
Step 5: Import required libraries

from googleapiclient.discovery import build
from oauth2client.client import GoogleCredentials
from base64 import b64encode


You may get import error "no module name..." if you have not already installed Google API Python client. Use following command to install it.

pip install --upgrade google-api-python-client

If you also get import error for oauth2client, you must install it using following command:

pip3 install --upgrade oauth2client

Step 6: Load credentials file

Load the credentials file (which we created in step 3) and create a service object using it.

CREDENTIAL_FILE = 'credentials.json'
credentials = GoogleCredentials.from_stream(CREDENTIAL_FILE)
service = build('vision', 'v1', credentials=credentials)

Step 7: Load image file (which needs to be tested)

We will load an image of a cat and encode it so that it becomes compatible with the cloud vision API.














IMAGE_FILE = 'cat.jpg'
with open(IMAGE_FILE, 'rb') as file:
    image_data = file.read()
    encoded_image_data = b64encode(image_data).decode('UTF-8')

Step 8: Create a batch request

We will create a batch request which we will send to the cloud vision API. In the batch request, we will include the above encoded image and the instruction as LABEL_DETECTION.

batch_request = [{
    'image':{'content':encoded_image_data},
    'features':[{'type':'LABEL_DETECTION'}],
}]

Step 9: Create a request

request = service.images().annotate(body={'requests':batch_request})

Step 10: Execute the request

response = request.execute()

This step will throw an error if you have not enabled billing (as mentioned in step 4). So, you must enable the billing in order to use Google Cloud Vision API. The charges are very reasonable. So, don't think too much and provide credit card details. For me, Google charged INR 1 and then refunded it back.

Step 11: Process the response

For error handling, include this code:

if 'error' in response:
    raise RuntimeError(response['error'])


We are interested in label annotations here. So, fetch it from the response and display the results.

labels = response['responses'][0]['labelAnnotations']

for label in labels:
    print(label['description'], label['score'])

Output:

Cat 0.99598557
Mammal 0.9890478
Vertebrate 0.9851104
Small to medium-sized cats 0.978553
Felidae 0.96784574
European shorthair 0.960582
Tabby cat 0.9573447
Whiskers 0.9441685
Dragon li 0.93990624
Carnivore 0.9342105

You can test the above code using different images and check the accuracy of the API.

Friday, 30 August 2019

Image Recognition using Pre-trained VGG16 model in Keras

Lets use a pre-trained VGG16 model to predict an image from ImageNet database. We will load an image, convert that image to numpy array, preprocess that array and let the pre-trained VGG16 model predict the image.

VGG16 is a CNN model. To know more about CNN, you can visit my this post. We are not fine-tuning the VGG16 model here. We are using it as it is. To fine-tune the existing VGG16 model, you can visit my this post.

You can download my Jupyter notebook containing following code from here.

Step 1: Import required libraries

import numpy as np
from keras.applications import vgg16
from keras.preprocessing import image


Step 2: Load pre-trained weights from VGG16 model for ImageNet dataset

model = vgg16.VGG16(weights='imagenet')

Step 3: Load image to predict

img = image.load_img('cat.jpg', target_size=(224, 224))
img











Please note that we need to reshape the image to 224X224 as it is a requirement for VGG16 model. You can download this image from ImageNet official website.

Step 4: Convert the image into numpy array

arr = image.img_to_array(img)
arr.shape


(224, 224, 3)

Step 5: Expand the array dimension

arr = np.expand_dims(arr, axis=0)
arr.shape


(1, 224, 224, 3)

Step 6: Preprocess the array

arr = vgg16.preprocess_input(arr)
arr


Step 7: Predict from the model

predictions = model.predict(arr)

predictions

We get an array as an output which is hard to understand. So, lets simplify it and see top 5 predictions made by the VGG16 model.

vgg16.decode_predictions(predictions, top=5)

[[('n02123045', 'tabby', 0.7138179),
  ('n02123159', 'tiger_cat', 0.21695374),
  ('n02124075', 'Egyptian_cat', 0.043560617),
  ('n04040759', 'radiator', 0.0053847637),
  ('n04553703', 'washbasin', 0.0024860944)]]

So, as per VGG16 model prediction, the given image may be a tabby (71%) or a tiger cat (21%). You can try the same with different images from ImageNet database and check your results.

About the Author

I have more than 10 years of experience in IT industry. Linkedin Profile

I am currently messing up with neural networks in deep learning. I am learning Python, TensorFlow and Keras.

Author: I am an author of a book on deep learning.

Quiz: I run an online quiz on machine learning and deep learning.