<?xml version="1.0" encoding="UTF-8"?>
<XML><RECORDS>
<RECORD>
	<REFERENCE_TYPE>31</REFERENCE_TYPE>
	<AUTHORS>
		<AUTHOR>Koza, John R.</AUTHOR>
		<AUTHOR>Andre, David</AUTHOR>
	</AUTHORS>
	<YEAR>1996</YEAR>
	<TITLE>Automatic discovery of protein motifs using genetic programming</TITLE>
	<SECONDARY_AUTHORS>
		<SECONDARY_AUTHOR>Xin Yao</SECONDARY_AUTHOR>
	</SECONDARY_AUTHORS>
	<SECONDARY_TITLE>Evolutionary Computation: Theory and Applications</SECONDARY_TITLE>
	<PLACE_PUBLISHED>Singapore</PLACE_PUBLISHED>
	<PUBLISHER>World Scientific</PUBLISHER>
	<KEYWORDS>
		<KEYWORD>genetic</KEYWORD>
		<KEYWORD>algorithms,</KEYWORD>
		<KEYWORD>genetic</KEYWORD>
		<KEYWORD>programming,</KEYWORD>
		<KEYWORD>DEAD</KEYWORD>
		<KEYWORD>box,</KEYWORD>
	</KEYWORDS>
	<ABSTRACT>Automated methods of machine learning may prove to be
                 useful in discovering biologically meaningful
                 information hidden in the rapidly growing databases of
                 DNA sequences and protein sequences. Genetic
                 programming is an extension of the genetic algorithm in
                 which a population of computer programs is bred, over a
                 series of generations, in order to solve a problem.
                 Genetic programming is capable of evolving complicated
                 problem-solving expressions of unspecified size and
                 shape. Moreover, when automatically defined functions
                 are added to genetic programming, genetic programming
                 becomes capable of efficiently capturing and exploiting
                 recurring sub-patterns. This chapter describes how
                 genetic programming with automatically defined
                 functions successfully evolved motifs for detecting the
                 D-E-A-D box family of proteins and for detecting the
                 manganese superoxide dismutase family. Both motifs were
                 evolved without prespecifying their length. Both
                 evolved motifs employed automatically defined functions
                 to capture the repeated use of common subexpressions.
                 When tested against the SWISS-PROT database of
                 proteins, the two genetically evolved consensus motifs
                 detect the two families either as well, or slightly
                 better than, the comparable human-written motifs found
                 in the PROSITE database.</ABSTRACT>
	<NOTES>In Press 1997?</NOTES>
	<URL>http://www.genetic-programming.com/jkpdf/ecta1999.pdf</URL>
</RECORD>
</RECORDS></XML>