Technical Notes: March 2017

Phases:

Lexical Analysis- Lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A token can be a keyword.

Syntax Analysis- The next phase is called the syntax analysis or parsing. It takes the token produced by lexical analysis as input and generates a parse tree (or syntax tree)

Semantic Analysis- Semantic analysis checks whether the parse tree constructed follows the rules of language

For example, assignment of values is between compatible data types, and adding string to an integer. Also, the semantic analyzer keeps track of identifiers,Their types and expressions; whether identifiers are declared before use or not etc.

Annotated syntax tree

Intermediate Code Generation- After semantic analysis the compiler generates an intermediate code of the source code for the target machine.

Code Optimization- The next phase does code optimization of the intermediate code. Optimization can be assumed as something that removes unnecessary code lines, arranges the sequence of statements in order to speed up the program execution without wasting resources (CPU, memory).

Code Generation - In this phase, the code generator takes the optimized representation of the intermediate code and maps it to the target machine language

Symbol Table - It is a data-structure maintained throughout all the phases of a compiler. All the identifier's names along with their types are stored here.

Lexical Analysis:

- The basic element recognized by the compiler is the “token.” A token is source-program text that the compiler does not break down into component elements.

- There are 6 types of C tokens: identifiers, keywords, constants, operators, string literals and other separators

Syntax - <token-name, attribute-value>

Removing any white space or comments in the source code

Keywords, constants, identifiers, strings, numbers, operators and punctuation symbols can be considered as tokens.

Lexical analysis for a modern computer language - finite automata

https://www.tutorialspoint.com/compiler_design/images/compiler_phases.jpg

Regular expression:

Precedence and associativity: *, concatenation (.), and | (pipe sign) are left associative

x* means zero or more occurrence of x.

====> i.e., it can generate {e, x, xx, xxx, xxxx, … }

x+ means one or more occurrence of x.

====> i.e., it can generate { x, xx, xxx, xxxx … } or x.x*

x? means at most one occurrence of x

====> i.e., it can generate either {x} or {e}.

[a-z] is all lower-case alphabets of English language.

[A-Z] is all upper-case alphabets of English language.

[0-9] is all natural digits used in mathematics

Finite Automata:

Finite automata are a state machine that takes a string of symbols as input and changes its state accordingly. Finite automata are a recognizer for regular expressions.

https://www.tutorialspoint.com/compiler_design/images/finite_automata.jpg

Syntax analysis:

Lexical analyzer can identify tokens with the help of regular expressions and pattern rules.

But cannot check syntax of a given sentence due to the limitations of the regular expressions.

IE - Check balancing tokens, such as parenthesis.

This phase uses context-free grammar (CFG), which is recognized by push-down automata.

CFG is super set of Regular Grammar, which means every Regular Grammar is also context-free.

Context-Free Grammar:

A context-free grammar has four components:

1. A set of non-terminals (V). Non-terminals are syntactic variables that denote sets of strings.

The non-terminals define sets of strings that help define the language generated by the grammar.

2. A set of tokens, known as terminal symbols (Σ). Terminals are the basic symbols from which strings are formed.

3. A set of productions (P). The productions of a grammar specify the manner in which the terminals and non-terminals can be combined to form strings

Each production consists of a non-terminal called the left side of the production, an arrow, and a sequence of tokens and/or on- terminals, called the right side of the production.

4. One of the non-terminals is designated as the start symbol (S); from where the production begins.

Syntax Analyzers:

A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams. The parser analyzes the source code (token stream) against the production

Rules to detect any errors in the code. The output of this phase is a parse tree

Derivation- A derivation is basically a sequence of production rules, in order to get the input string.

Left most and right most derivation:

Example:

Production rules:

E → E + E

E → E * E

E → id

Input string: id + id * id

- left most derivation:

E → E * E

E → E + E * E

E → id + E * E

E → id + id * E

E → id + id * id

- Right most derivation:

E → E + E

E → E + E * E

E → E + E * id

E → E + id * id

E → id + id * id

A parse tree is a graphical depiction of a derivation.

All leaf nodes are terminals.

All interior nodes are non-terminals.

In-order traversal gives original input string.

https://www.tutorialspoint.com/compiler_design/images/parse_tree_step_5.jpg

Ambiguity: A grammar G is said to be ambiguous if it has more than one parse tree (left or right derivation) for at least one string.

Associativity: If an operand has operators on both sides, the side on which the operator takes this operand is decided by the associativity of those operators.

Precedence: If two different operators share a common operand, the precedence of operators decides which will take the operand.

Types of Parsing- Syntax analyzers follow production rules defined by means of context-free grammar. The way the production rules are implemented (derivation)

Divides parsing into two types: Top-down parsing and Bottom-up parsing.

Top Down parsing:

https://www.tutorialspoint.com/compiler_design/images/top_down_parsing.jpg

Recursive decent, backtrack & non backtrack-> predicative parser, LL parsing.

Bottom up parsing:

Reduced parsing -> LR -> SLR, LR, and LALR parser.

Compiler error types -

Lexical: name of some identifier typed incorrectly

Syntactical: missing semicolon or unbalanced parenthesis

Semantical: incompatible value assignment

Logical: code not reachable, infinite loop

Panic mode - quit from error comes so that doesn't go to infinite loop.

Statement mode - predicts correct statements for compiler.

Error productions -Some common errors are known to the compiler designers that may occur in the code. Designers can create augmented grammar to be used, As productions that generate erroneous constructs when these errors are encountered.

Abstract Syntax Trees - reduce syntax tree to abstract tree. Syntax tree have more than required info for compiler so slow.

Semantic Analyzer:

Semantics of a language provides meaning to its constructs, like tokens and syntax structure. Semantics help interpret symbols, their types, and their relations with each other. Semantic analysis judges whether the syntax structure constructed in the source program derives any meaning or not

CFG + semantic rules = Syntax Directed Definitions

IE: int x = "String"; // syntax tree ok but semantically wrong.

Tasks: Scope resolution, Type checking, Array-bound checking

Errors: Type mismatch, undeclared variable, reserved identifier misuse, multiple declaration of variable in a scope, accessing an out of scope variable, Actual and formal parameter mismatch, attribute grammar used here.

E → E + T { E.value = E.value + T.value }

Symbol Table:

Symbol table is an important data structure created and maintained by compilers in order to store information about the occurrence of various entities such as variable names, function names, objects, classes, interfaces, etc.

Symbol table is used by both the analysis and the synthesis parts of a compiler.

Purpose: To store the names of all entities in a structured form at one place.

To verify if a variable has been declared.

To implement type checking, by verifying assignments and expressions in the source code are semantically correct.

To determine the scope of a name (scope resolution)

Maintains entry for each name;

Exam: static int interest; => <interest, int, static>

Symbol table implementation:

Linear (sorted or unsorted) list

Binary Search Tree

Hash table

Intermediate code generator:

Need: for each new machine, a full native compiler is required and synthesis, is changed according to the target machine.

Optimization can be done on intermediate code.

Code generation:

it should carry the exact meaning of the source code.

It should be efficient in terms of CPU usage and memory management.

Things into consideration to generate the code-

Target language- some machine specific instructions.

IR Type - Abstract Syntax Tree (AST) structure, Reverse Polish Notation, or 3-address code.

Selection of instruction

Code Optimization:

Do not change the meaning of the program

Must increase the performance

Optimization itself should be fast.

Lexical Analysis- Lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A token can be a keyword.

Syntax Analysis- The next phase is called the syntax analysis or parsing. It takes the token produced by lexical analysis as input and generates a parse tree (or syntax tree)

Semantic Analysis- Semantic analysis checks whether the parse tree constructed follows the rules of language

Annotated syntax tree

Intermediate Code Generation- After semantic analysis the compiler generates an intermediate code of the source code for the target machine.

Code Optimization- The next phase does code optimization of the intermediate code. Optimization can be assumed as something that removes unnecessary code lines, arranges the sequence of statements in order to speed up the program execution without wasting resources (CPU, memory).

Code Generation - In this phase, the code generator takes the optimized representation of the intermediate code and maps it to the target machine language

Symbol Table - It is a data-structure maintained throughout all the phases of a compiler. All the identifier's names along with their types are stored here.

Lexical Analysis:

- The basic element recognized by the compiler is the “token.” A token is source-program text that the compiler does not break down into component elements.

- There are 6 types of C tokens: identifiers, keywords, constants, operators, string literals and other separators

Syntax - <token-name, attribute-value>

Removing any whitespace or comments in the source code

Keywords, constants, identifiers, strings, numbers, operators and punctuations symbols can be considered as tokens.

Lexical analysis for a modern computer language - finite automata

https://www.tutorialspoint.com/compiler_design/images/compiler_phases.jpg

Regular expression:

Precedence and associativity: *, concatenation (.), and | (pipe sign) are left associative

x* means zero or more occurrence of x.

====> i.e., it can generate {e, x, xx, xxx, xxxx, … }

x+ means one or more occurrence of x.

====> i.e., it can generate { x, xx, xxx, xxxx … } or x.x*

x? means at most one occurrence of x

====> i.e., it can generate either {x} or {e}.

[a-z] is all lower-case alphabets of English language.

[A-Z] is all upper-case alphabets of English language.

[0-9] is all natural digits used in mathematics

Finite Automata:

Finite automata are a state machine that takes a string of symbols as input and changes its state accordingly. Finite automata are a recognizer for regular expressions.

https://www.tutorialspoint.com/compiler_design/images/finite_automata.jpg

Syntax analysis:

Lexical analyzer can identify tokens with the help of regular expressions and pattern rules.

But cannot check syntax of a given sentence due to the limitations of the regular expressions.

IE - Check balancing tokens, such as parenthesis.

This phase uses context-free grammar (CFG), which is recognized by push-down automata.

CFG is superset of Regular Grammar, which means every Regular Grammar is also context-free.

Context-Free Grammar:

A context-free grammar has four components:

1. A set of non-terminals (V). Non-terminals are syntactic variables that denote sets of strings.

The non-terminals define sets of strings that help define the language generated by the grammar.

2. A set of tokens, known as terminal symbols (Σ). Terminals are the basic symbols from which strings are formed.

3. A set of productions (P). The productions of a grammar specify the manner in which the terminals and non-terminals can be combined to form strings

Each production consists of a non-terminal called the left side of the production, an arrow, and a sequence of tokens and/or on- terminals, called the right side of the production.

4. One of the non-terminals is designated as the start symbol (S); from where the production begins.

Syntax Analyzers:

A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams. The parser analyzes the source code (token stream) against the production

Rules to detect any errors in the code. The output of this phase is a parse tree

Derivation- A derivation is basically a sequence of production rules, in order to get the input string.

Left most and right most derivation.

Example:

Production rules:

E → E + E

E → E * E

E → id

Input string: id + id * id

- left most derivation :

E → E * E

E → E + E * E

E → id + E * E

E → id + id * E

E → id + id * id

- Right most derivation:

E → E + E

E → E + E * E

E → E + E * id

E → E + id * id

E → id + id * id

A parse tree is a graphical depiction of a derivation.

All leaf nodes are terminals.

All interior nodes are non-terminals.

In-order traversal gives original input string.

https://www.tutorialspoint.com/compiler_design/images/parse_tree_step_5.jpg

Ambiguity: A grammar G is said to be ambiguous if it has more than one parse tree (left or right derivation) for at least one string.

Associativity: If an operand has operators on both sides, the side on which the operator takes this operand is decided by the associativity of those operators.

Precedence: If two different operators share a common operand, the precedence of operators decides which will take the operand.

Types of Parsing- Syntax analyzers follow production rules defined by means of context-free grammar. The way the production rules are implemented (derivation)

Divides parsing into two types: Top-down parsing and Bottom-up parsing.

Top Down parsing:

https://www.tutorialspoint.com/compiler_design/images/top_down_parsing.jpg

Recursive decent, backtrack & non backtrack-> predicative parser, LL parsing.

Bottom up parsing:

Reduced parsing -> LR -> SLR, LR, and LALR parser.

Compiler error types -

Lexical: name of some identifier typed incorrectly

Syntactical: missing semicolon or unbalanced parenthesis

Semantical: incompatible value assignment

Logical: code not reachable, infinite loop

Panic mode - quit from error comes so that doesn't go to infinite loop.

Statement mode - predicts correct statements for compiler.

Abstract Syntax Trees - reduce syntax tree to abstract tree. Syntax tree have more than required info for compiler so slow.

Semantic Analyzer:

CFG + semantic rules = Syntax Directed Definitions

IE: int x = "String"; // syntax tree ok but semantically wrong.

Tasks: Scope resolution, Type checking, Array-bound checking

E → E + T { E.value = E.value + T.value }

Symbol Table:

Symbol table is used by both the analysis and the synthesis parts of a compiler.

Purpose: To store the names of all entities in a structured form at one place.

To verify if a variable has been declared.

To implement type checking, by verifying assignments and expressions in the source code are semantically correct.

To determine the scope of a name (scope resolution)

Maintains entry for each name;

Exam: static int interest; => <interest, int, static>

Symbol table implementation:

Linear (sorted or unsorted) list

Binary Search Tree

Hash table

Intermediate code generator:

Need: for each new machine, a full native compiler is required and synthesis, is changed according to the target machine.

Optimization can be done on intermediate code.

Code generation:

it should carry the exact meaning of the source code.

It should be efficient in terms of CPU usage and memory management.

Things into consideration to generate the code-

Target language- some machine specific instructions.

IR Type - Abstract Syntax Tree (AST) structure, Reverse Polish Notation, or 3-address code.

Selection of instruction

Code Optimization:

Do not change the meaning of the program

Must increase the performance

Optimization itself should be fast.

Observer Pattern is one of the behavioral Design pattern. Observer design pattern is useful when you are interested in the state of an object and want to get notified whenever there is any change. In observer pattern, the object that watch on the state of another object are called Observer and the object that is being watched is called Subject.

Intent:
Define a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically.

Subject contains a list of observers to notify of any change in it’s state, so it should provide methods using which observers can register and unregistered themselves. Subject also contain a method to notify all the observers of any change and either it can send the update while notifying the observer or it can provide another method to get the update.

Observer should have a method to set the object to watch and another method that will be used by Subject to notify them of any updates.

Java Message Service (JMS) uses Observer design pattern along with Mediator Pattern to allow applications to subscribe and publish data to other applications.

Java provides inbuilt java.util.observer interface and java.util.obserable class but its not widely used. Because most of the times we don't wanna end up extending a class just for implementing observer pattern, since java don't support multiple inheritances in class.

Java.util.Observer :

Public Interface Observer:

Any class who implements this interface must be notified when subject or observable object change its status.

Update (Observable Ob, Object arg): This method is called when subject is changed.

Class Observable:

It’s a subject to whom observer wants to observe.

Some Important Method:

addObserver(Observer o):add Observers in the set of observers for this subject or observable object.

deleteObserver(Observer o): delete Observers in the set of observers .

hasChanged():check if object has changed.

clearChanged():this method will indicate that subject has no changes or all the observers has been notified when changes is made.

notifyObservers(): notify all the observers if object has changed .

Example:

import java.util.ArrayList;

interface Observer {

public void update(float

}

interface Subject {

public void registerObserver(Observer observer);

public void removeObserver(Observer observer);

public void notifyObservers();

}

class Loan implements Subject {

private ArrayList<Observer> observers = new ArrayList<Observer>();

private String type;

private float interest;

private String bank;

public Loan(String type, float interest, String bank) {

this.type = type;

this.interest = interest;

this.bank = bank;

}

public float getInterest() {

return interest;

}

public void setInterest(float interest) {

this.interest = interest;

notifyObservers();

}

public String getBank() {

return this.bank;

}

public String getType() {

return this.type;

}

@Override

public void registerObserver(Observer observer) {

observers.add(observer);

}

@Override

public void removeObserver(Observer observer) {

observers.remove(observer);

}

@Override

public void notifyObservers() {

for (Observer ob : observers) {

System.out.println("Notifying Observers on change in Loan interest rate");

ob.update(this.interest);

}

class Newspaper implements Observer {

@Override

public void update(float interest) {

System.out.println("Newspaper: Interest Rate updated, new Rate is: "

+ interest);

}

class Internet implements Observer {

@Override

public void update(float interest) {

System.out.println("Internet: Interest Rate updated, new Rate is: "

+ interest);

}

public class ObserverTest {

public static void main(String args[]) {

// this will maintain all loans information

Newspaper printMedia = new Newspaper();

Internet onlineMedia = new Internet();

Loan personalLoan = new Loan("Personal Loan", 12.5f,

"Standard Charterd");

personalLoan.registerObserver(printMedia);

personalLoan.registerObserver(onlineMedia);

personalLoan.setInterest(3.5f);

}

Technical Notes

Tuesday, March 14, 2017

Decorator Design Pattern

What is decorator design pattern in Java?

1. Decorator design pattern is used to enhance the functionality of a particular object at run-time or dynamically.

2. At the same time other instance of same class will not be affected by this so individual object gets the new behavior.

Problem Solved in Decorator:

When use Decorator:

Sunday, March 12, 2017

Compiler Design subject

Phases:

Lexical Analysis:

Syntax analysis:

Context-Free Grammar:

Types of Parsing- Syntax analyzers follow production rules defined by means of context-free grammar. The way the production rules are implemented (derivation)

Optimization itself should be fast.

Compiler error types -

Semantic Analyzer:

Symbol Table:

Intermediate code generator:

Code generation:

Code Optimization:

Friday, March 10, 2017

Observer Design pattern

Ericsson Interview