56
Insight Through Computing 15. Strings Operations Subscripting Concatenation Search Numeric-String Conversions Built-Ins: int2str,num2str, str2double

15. Strings

  • Upload
    moe

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

15. Strings. Operations Subscripting Concatenation Search Numeric-String Conversions Built-Ins: int2str,num2str, str2double. Previous Dealings. N = input(‘ Enter Degree: ’) title(‘ The Sine Function ’) disp( sprintf(‘N = %2d’,N) ). - PowerPoint PPT Presentation

Citation preview

Page 1: 15. Strings

Insight Through Computing

15. Strings

OperationsSubscriptingConcatenationSearchNumeric-String Conversions

Built-Ins: int2str,num2str, str2double

Page 2: 15. Strings

Insight Through Computing

Previous Dealings

N = input(‘Enter Degree: ’)

title(‘The Sine Function’)

disp( sprintf(‘N = %2d’,N) )

Page 3: 15. Strings

Insight Through Computing

A String is an Array of Characters

‘Aa7*>@ x!’

A a 7 * > @ x !

This string has length 9.

Page 4: 15. Strings

Insight Through Computing

Why are Stirngs Important?

1. Numerical Data often encoded as strings

2. Genomic calculation/search

Page 5: 15. Strings

Insight Through Computing

Numerical Data is Often Encoded in Strings

For example, a file containingIthaca weather data begins with the string

W07629N4226

Longitude: 76o 29’ WestLatitude: 42o 26’ North

Page 6: 15. Strings

Insight Through Computing

What We Would Like to Do

W07629N4226

Get hold of the substring ‘07629’

Convert it to floating format so thatit can be involved in numerical

calculations.

Page 7: 15. Strings

Insight Through Computing

Format Issues

9 as an IEEE floating point number:

9 as a character:

0100000blablahblah01001111000100010010

01000otherblablaDifferent Representation

Page 8: 15. Strings

Insight Through Computing

Genomic Computations

Looking for patterns in a DNA sequence:

‘ATTCTGACCTCGATC’ACCT

Page 9: 15. Strings

Insight Through Computing

Genomic Computations

Quantifying Differences:

ATTCTGACCTCGATCATTGCTGACCTCGAT

Remove?

Page 10: 15. Strings

Insight Through Computing

Working With Strings

Page 11: 15. Strings

Insight Through Computing

Strings Can Be Assignedto Variables

S = ‘N = 2’

N = 2;

S = sprintf(‘N = %1d’,N)

‘N = 2’

S

sprintf produces a formatted string using fprintf rules

Page 12: 15. Strings

Insight Through Computing

Strings Have a Length

s = ‘abc’;

n = length(s); % n = 3

s = ‘’; % the empty string

n = length(s) % n = 0

s = ‘ ‘; % single blank

n = length(s) % n = 1

Page 13: 15. Strings

Insight Through Computing

Concatenation

This: S = ‘abc’;

T = ‘xy’

R = [S T]

is the same as this: R = ‘abcxy’

Page 14: 15. Strings

Insight Through Computing

Repeated Concatenation

This: s = ‘’;

for k=1:5

s = [s ‘z’];

end

is the same as this:

z = ‘zzzzz’

Page 15: 15. Strings

Insight Through Computing

Replacing and AppendingCharacters

s = ‘abc’;s(2) = ‘x’ % s = ‘axc’

t = ‘abc’t(4) = ‘d’ % t = ‘abcd’

v = ‘’v(5) = ‘x’ % v = ‘ x’

Page 16: 15. Strings

Insight Through Computing

Extracting Substrings

s = ‘abcdef’;

x = s(3) % x = ‘c’

x = s(2:4) % x = ‘bcd’

x = s(length(s)) % x = ‘f’

Page 17: 15. Strings

Insight Through Computing

Colon Notation

s( : )

Starting Location

Ending Location

Page 18: 15. Strings

Insight Through Computing

Replacing Substrings

s = ‘abcde’;

s(2:4) = ‘xyz’ % s = ‘axyze’

s = ‘abcde’

s(2:4) = ‘wxyz’ % Error

Page 19: 15. Strings

Insight Through Computing

Question Time

s = ‘abcde’;

for k=1:3

s = [ s(4:5) s(1:3)];

end

What is the final value of s ?

A abcde B. bcdea C. eabcd D. deabc

Page 20: 15. Strings

Insight Through Computing

Problem: DNA Strand

x is a string made up of the characters‘A’, ‘C’, ‘T’, and ‘G’.

Construct a string Y obtained from x by replacinig each A by T, each T by A, each C by G, and each G by C

x: ACGTTGCAGTTCCATATGy: TGCAACGTCAAGGTATAC

Page 21: 15. Strings

Insight Through Computing

function y = Strand(x)

% x is a string consisting of

% the characters A, C, T, and G.

% y is a string obtained by

% replacing A by T, T by A,

% C by G and G by C.

Page 22: 15. Strings

Insight Through Computing

Comparing Strings

Built-in function strcmp

strcmp(s1,s2) is true if the strings s1 and s2 are identical.

Page 23: 15. Strings

Insight Through Computing

How y is Built Up

x: ACGTTGCAGTTCCATATGy: TGCAACGTCAAGGTATAC

Start: y: ‘’ After 1 pass: y: TAfter 2 passes: y: TGAfter 3 passes: y: TGC

Page 24: 15. Strings

Insight Through Computing

for k=1:length(x)

if strcmp(x(k),'A')

y = [y 'T'];

elseif strcmp(x(k),'T')

y = [y 'A'];

elseif strcmp(x(k),'C')

y = [y 'G'];

else

y = [y 'C'];

end

end

Page 25: 15. Strings

Insight Through Computing

A DNA Search Problem

Suppose S and T are strings, e.g.,

S: ‘ACCT’

T: ‘ATGACCTGA’

We’d like to know if S is a substring of T and if so, where is the first occurrance?

Page 26: 15. Strings

Insight Through Computing

function k = FindCopy(S,T)

% S and T are strings.

% If S is not a substring of T,

% then k=0.

% Otherwise, k is the smallest

% integer so that S is identical

% to T(k:k+length(S)-1).

Page 27: 15. Strings

Insight Through Computing

A DNA Search Problem

S: ‘ACCT’

T: ‘ATGACCTGA’

strcmp(S,T(1:4)) False

Page 28: 15. Strings

Insight Through Computing

A DNA Search Problem

S: ‘ACCT’

T: ‘ATGACCTGA’

strcmp(S,T(2:5)) False

Page 29: 15. Strings

Insight Through Computing

A DNA Search Problem

S: ‘ACCT’

T: ‘ATGACCTGA’

strcmp(S,T(3:6)) False

Page 30: 15. Strings

Insight Through Computing

A DNA Search Problem

S: ‘ACCT’

T: ‘ATGACCTGA’

strcmp(S,T(4:7))) True

Page 31: 15. Strings

Insight Through Computing

Pseudocode

First = 1; Last = length(S);

while S is not identical to T(First:Last) First = First + 1;

Last = Last + 1;

end

Page 32: 15. Strings

Insight Through Computing

Subscript Error

S: ‘ACCT’

T: ‘ATGACTGA’

strcmp(S,T(6:9))

There’s a problem if S is not a substring of T.

Page 33: 15. Strings

Insight Through Computing

Pseudocode

First = 1; Last = length(s);

while Last<=length(T) && ... ~strcmp(S,T(First:Last))

First = First + 1;

Last = Last + 1;

end

Page 34: 15. Strings

Insight Through Computing

Post-Loop Processing

Loop ends when this is false:

Last<=length(T) && ...

~strcmp(S,T(First:Last))

Page 35: 15. Strings

Insight Through Computing

Post-Loop Processing

if Last>length(T) % No Match found k=0;else % There was a match k=First;end

The loop ends for one of two reasons.

Page 36: 15. Strings

Insight Through Computing

Numeric/StringConversion

Page 37: 15. Strings

Insight Through Computing

String-to-Numeric Conversion

An example…

Convention: W07629N4226

Longitude: 76o 29’ West Latitude: 42o 26’ North

Page 38: 15. Strings

Insight Through Computing

String-to-Numeric Conversion

S = ‘W07629N4226’

s1 = s(2:4);

x1 = str2double(s1);

s2 = s(5:6);

x2 = str2double(s2);

Longitude = x1 + x2/60

There are 60 minutes in a degree.

Page 39: 15. Strings

Insight Through Computing

Numeric-to-String Conversion

x = 1234;

s = int2str(x); % s = ‘1234’

x = pi;

s = num2str(x,’%5.3f’); % s =‘3.142’

Page 40: 15. Strings

Insight Through Computing

Problem

Given a date in the format ‘mm/dd’

specify the next day in the same format

Page 41: 15. Strings

Insight Through Computing

y = Tomorrow(x)

x y

02/28 03/01

07/13 07/14

12/31 01/01

Page 42: 15. Strings

Insight Through Computing

Get the Day and Month

month = str2double(x(1:2));

day = str2double(x(4:5));

Thus, if x = ’02/28’ then month is assignedthe numerical value of 2 and day is assigned the numerical value of 28.

Page 43: 15. Strings

Insight Through Computing

L = [31 28 31 30 31 30 31 31 30 31 30 31];

if day+1<=L(month)

% Tomorrow is in the same month

newDay = day+1;

newMonth = month;

Page 44: 15. Strings

Insight Through Computing

L = [31 28 31 30 31 30 31 31 30 31 30 31];

else

% Tomorrow is in the next month

newDay = 1;

if month <12

newMonth = month+1;

else

newMonth = 1;

end

Page 45: 15. Strings

Insight Through Computing

The New Day String

Compute newDay (numerical) and convert…

d = int2str(newDay);if length(d)==1 d = ['0' d];end

Page 46: 15. Strings

Insight Through Computing

The New Month String

Compute newMonth (numerical) and convert…

m = int2str(newMonth);

if length(m)==1;

m = ['0' m];

end

Page 47: 15. Strings

Insight Through Computing

The Final Concatenation

y = [m '/' d];

Page 48: 15. Strings

Insight Through Computing

Some other useful string functionsstr= ‘Cs 1112’;

length(str) % 7isletter(str) % [1 1 0 0 0 0 0]isspace(str) % [0 0 1 0 0 0 0]lower(str) % ‘cs 1112’upper(str) % ‘CS 1112’

ischar(str) % Is str a char array? True (1)strcmp(str(1:2),‘cs’) % Compare strings str(1:2) & ‘cs’. False (0)strcmp(str(1:3),‘CS’) % False (0)

Page 49: 15. Strings

Insight Through Computing

ASCII characters(American Standard Code for Information Interchange)

ascii code Character: :: :65 ‘A’66 ‘B’67 ‘C’: :90 ‘Z’: :

ascii code Character

: :: :48 ‘0’49 ‘1’50 ‘2’: :57 ‘9’: :

Page 50: 15. Strings

Insight Through Computing

Character vs ASCII code

str= ‘Age 19’

%a 1-d array of characters

code= double(str)

%convert chars to ascii values

str1= char(code)

%convert ascii values to chars

Page 51: 15. Strings

Insight Through Computing

Arithmetic and relational ops on characters

• ‘c’-‘a’ gives 2• ‘6’-‘5’ gives 1• letter1=‘e’; letter2=‘f’; • letter1-letter2 gives -1

• ‘c’>’a’ gives true• letter1==letter2 gives false

• ‘A’ + 2 gives 67• char(‘A’+2) gives ‘C’

Page 52: 15. Strings

Insight Through Computing

Example: toUpperWrite a function toUpper(cha) to convert character cha to upper case if cha is a lower case letter. Return the converted letter. If cha is not a lower case letter, simply return the character cha.

Hint: Think about the distance between a letter and the base letter ‘a’ (or ‘A’). E.g.,

a b c d e f g h …

A B C D E F G H …

Of course, do not use Matlab function upper!

distance = ‘g’-‘a’ = 6 = ‘G’-‘A’

Page 53: 15. Strings

Insight Through Computing

function up = toUpper(cha)% up is the upper case of character cha.% If cha is not a letter then up is just cha.

up= cha;

cha is lower case if it is between ‘a’ and ‘z’

Page 54: 15. Strings

Insight Through Computing

function up = toUpper(cha)% up is the upper case of character cha.% If cha is not a letter then up is just cha.

up= cha;

if ( cha >= 'a' && cha <= 'z' )

% Find distance of cha from ‘a’

end

Page 55: 15. Strings

Insight Through Computing

function up = toUpper(cha)% up is the upper case of character cha.% If cha is not a letter then up is just cha.

up= cha;

if ( cha >= 'a' && cha <= 'z' )

% Find distance of cha from ‘a’ offset= cha - 'a';

% Go same distance from ‘A’ end

Page 56: 15. Strings

Insight Through Computing

function up = toUpper(cha)% up is the upper case of character cha.% If cha is not a letter then up is just cha.

up= cha;

if ( cha >= 'a' && cha <= 'z' )

% Find distance of cha from ‘a’ offset= cha - 'a';

% Go same distance from ‘A’ up= char('A' + offset);end