Exploiting Redundancy in Natural Language
Christoph Karlberger, G¨unther Bayler, Christopher Kruegel, and Engin Kirda
{christoph,gmb,chris,ek}@seclab.tuwien.ac.at
Probabilistic systems: Systems such as Bayesian filters
are used to learn word frequencies that are associ-
Today’s attacks against Bayesian spam filters attempt to
ated with both spam and non-spam messages [11].
keep the content of spam mails visible to humans, butobscured to filters. A common technique is to fool filters
Since Bayesian filters do not have a fixed set of rules to
by appending additional words to a spam mail. Because
classify incoming messages, they have to be trained with
these words appear very rarely in spam mails, filters are
known spam and ham messages before they are able to
inclined to classify the mail as legitimate.
classify messages. The training of a Bayesian spam fil-
The idea we present in this paper leverages the fact
ter occurs in three steps: first, each message is stripped
that natural language typically contains synonyms. Syn-
of any transfer encodings. The decoded message is then
onyms are different words that describe similar terms and
split into single tokens, which are the words that make
concepts. Such words often have significantly different
up the message. Last, for each token, a record in the to-
spam probabilities. Thus, an attacker might be able to
ken database is updated that maintains two counts: the
penetrate Bayesian filters by replacing suspicious words
number of spam messages and the number of ham mes-
by innocuous terms with the same meaning. A precon-
sages in which that token has been observed so far. Be-
dition for the success of such an attack is that Bayesian
sides that, the token database also keeps track of the total
spam filters of different users assign similar spam prob-
number of spam and ham messages that have been used
abilities to similar tokens. We first examine whether this
precondition is met; afterwards, we measure the effectiv-
Once a Bayesian spam filter has created a token
ity of an automated substitution attack by creating a test
database, messages can be analyzed. Analogous to the
set of spam messages that are tested against SpamAssas-
training phase, the message is first decoded and split into
single tokens. For each token, a spam probability is cal-culated based on the number of spam and ham messagesthat have contained this token as well as the total num-
ber of spam and ham messages that have been used totrain the Bayesian spam filter. The following formula is
The purpose of a spam filter is to decide whether an in-
coming message is legitimate (i.e., ham) or unsolicited(i.e., spam). There are many different types of filter sys-tems, including:
Word lists: Simple and complex lists of words that are
In this formula, nspam and nham are the total num-
Black lists and white lists: These lists contain known
bers of spam and ham tokens, whereas nspam(token)
IP addresses of spam and non-spam senders.
and nham(token) denote how many times a token ap-peared in a spam or ham mail, respectively. Note that
Message digests: These systems summarize mails into
there are alternative ways to calculate this probability;
pseudo-unique values. Repeated sightings of the
an overview can be found in [22]. Next, Bayes theorem
same digest is symptomatic of a spam mail.
is used to calculate the spam probability of the whole
message by combining the spam probabilities of the sin-
messages is that blocks of additional words are indica-
gle tokens. Finally, the message is classified as ham or
tors of spam, and algorithms that are able to detect these
spam, typically by comparing its combined spam proba-
additional words, such as Zdziarski’s Bayesian Noise Re-
duction algorithm [21], foil attacks.
In this paper, we explore an alternative approach: in-
stead of adding known good words to compensate for
the bad words in the spam mail, one could exploit redun-dancies in the language and substitute words with a high
The goal of attacks against Bayesian spam filters is to let
spam probability by synonyms with a lower spam prob-
spam mails be identified as ham mails. Currently exist-
ability. The idea of a computer-aided substitution attack
ing attacks aim to achieve this by adding words to the
was first hinted at by Bowers [2]. Bowers showed that by
spam mails. The objective is that these additional words
manually replacing suspicious words, the spam probabil-
are used in the classification of the mail in the same way
ity of a message can be lowered. However, a completely
as the original words, thereby tampering with the classifi-
manual substitution process is clearly impractical for an
cation process and reducing the overall spam probability.
attacker. In this work, we investigate the feasibility of
When the additional words are randomly chosen from
an automated substitution attack and evaluate its success
a larger set of words, for example, a dictionary, this is
called random word attack (“word salad”). The objec-tive is that the spam probabilities of the words added tothe spam message should compensate for the original to-
kens’ high spam probabilities in the calculation of thewhole message’s combined spam probability. There is
For a successful substitution attack, it is necessary that
some controversy about the effectiveness of such an at-
Bayesian spam filters at different sites (and for differ-
tack: Several authors have found random word attacks
ent users) judge words sufficiently similarly. Otherwise,
ineffective against Bayesian spam filters [9, 13, 22], be-
the attacker would not know which words are consid-
cause many random dictionary words are infrequently
ered suspicious by the victims’ spam filters and, there-
used in legitimate messages and, therefore, tend to have
fore, should be substituted. In addition, it would be un-
either neutral or high spam probabilities for Bayesian
known which synonyms could be used for the substitu-
spam filters. An improvement of the random word at-
tion, since it would be equally unknown which words
tack is to add words to the spam mail that are often used
receive a low or neutral spam probability by the victims’
in legitimate messages. This is called common word
attack. The idea is that often-used words should have
The spam probability of a word is determined (a) by
lower spam probabilities than randomly-chosen words
the number of appearances of this word in spam mails,
for Bayesian filters, thus being better suited for an attack.
(b) the number of appearances of this word in ham mails,
The number of a additional words that are needed for this
(c) the total number of spam mails, and (d) the total num-
attack to work varies between 50 [20] and 1,000 [13].
ber of ham mails the Bayesian filter has classified. If
Finally, Lowd and Meek improved the common word at-
the spam mails and the ham mails of users of Bayesian
tack by adding words that are common in the language
spam filters are sufficiently similar, then it is reasonable
but uncommon in spam [13]. They called their attack
to assume that words are classified similarly enough for
frequency ratio attack. Lowd and Meek calculated that
about 150 frequency-ratio words suffice to make a spam
Are the mails used for training Bayesian spam filters
of different users the same? This is clearly not the case.
Another common approach to circumvent Bayesian
But many spam filters are set up for more than one user;
spam filters is to overlay the text of a message on images
that means, they use broader samples of spam and ham
that are embedded in HTML. The content of the mail is
mail for training. Even more important, however, is that
visible to the user, but it is unrecognizable to text-based
we can assume that many users receive very similar spam
filters, which usually ignore images in their analysis of
mails. After all, the idea of spam is the wide dissemi-
nation of particular messages. Thus, it is reasonable toassume that many users receive similar spam messages,and as a result, their filters assign high spam probabilities
to the same (or very similar) sets of words.
Another aspect to consider in this regard is how fast
Most attacks described previously have one thing in com-
messages, in particular spam messages, mutate. When
mon: they add words to spam mails. From the spammer’s
message content changes too quickly, the classification
point of view, the disadvantage of adding words to spam
of the words in the messages would change too. As a re-
sult, the effectiveness of the attack decreases, because
matically replacing words with high spam probability by
the adversary does not know which words to replace,
words with low spam probability. This is done in several
and which synonyms to choose. In order to determine
whether spam messages change slowly enough to allowa substitution attack, we examined three different spam
1. All words with a very high spam probability are
archives. We extracted messages received in the year
2006, divided them by the month they were received, and
2. For every such word, a thesaurus is queried to find a
created lists of the most frequently used tokens for each
set of words with similar meaning, but with a lower
month. We then measured the overlap of these lists by
comparing them. The goal was to determine how manyof the 100 most frequently used tokens of one month ap-
3. If a set of suitable synonyms is found, the spam
pear among the 100 most frequently used tokens of an-
word is replaced with one of the possible candi-
other month. The results in Table 1 show that the major-
ity of the 100 most frequently used tokens in one month’sspam messages appear among the top 100 tokens of an-
Identify words with high spam probability.
that raise the spam probability of a message need to be
Manual inspection of the most frequently used tokens
automatically replaced by words that have a lower spam
showed that the lower the rank of a token in this list is,
probability. To this end, the spam probability of each
the less is the difference of that token’s and the next fre-
word in a message has to be determined. For this, we
quently used token’s number of appearances. Some to-
query the Bayes token database of a spam filter. More
kens that are at the end of the list of the 100 most fre-
precisely, we trained SpamAssassin with about 18,000
quently used tokens of one month do not appear on an-
spam mails from Bruce Guenter’s Spam Archive [10]
other month’s top 100 list and, therefore, lower the over-
and about 12,000 ham mails from the SpamAssassin
lap. However, many of these tokens are not completely
and Enron ham archives [6] to prepare SpamAssassin’s
missing in the spam corpus of that other month, but are
Bayes filter with a large and comprehensive training cor-
only a little bit too infrequent to appear in the list of the
pus. Then, for each word of a message, SpamAssas-
100 most frequently used tokens. If that border case to-
sin was consulted to derive the spam probability for this
kens would count too, the overlap would be higher; to
word. We chose the SpamAssassin mail filter [17] for
be able to estimate the overlap including the border case
this task because it is widely used and achieves good re-
tokens, we also measured how many of the 100 most fre-
quently used tokens of a month appear among the 200
Based on the Bayesian spam probabilities for each
most frequently used tokens of another month. The re-
word, the decision is made whether this word needs to
sults for this type of comparison is shown in the right half
be replaced. To this end, a substitution threshold is de-
fined. If a word with a spam probability higher than that
Our results demonstrate that many of the terms used
threshold is found, it is replaced with a synonym. In the
in spam messages do not change over the course of a
following Section 6, we show results of experiments us-
year, which is a certain indication that the spam prob-
ing different values for the substitution threshold.
abilities Bayesian filters assign to these terms do notchange too much either. These findings are confirmed
by related studies: Sullivan [18] examined “almost 2,500
a spam probability above the threshold is found, Word-
spam messages sampled from 8 different domains over
Net [15] is queried for alternatives. WordNet is a lex-
a period of 2.5 years” and found that spam is relatively
ical database that groups verbs, nouns, adjectives, and
time-stable. Pu and Webb [16] studied the evolution of
adverbs into sets of cognitive synonyms (called synsets).
spam by examining “over 1.4 million spam messages that
That is, each synset represents a concept, and it con-
were collected from SpamArchive between January 2003
tains a set of words with a sense that names this con-
and January 2006.” They focused their study on a trend
cept. For example, the word “car” is classified as noun
analysis of spam construction techniques, and found that
and is contained in five synsets, where each of these sets
these changes occur slowly over the course of several
represents a different meaning of the word “car.” One
months, confirming Sullivan’s claim.
of these sets is described as “a motor vehicle with fourwheels” and contains other words such as “automobile”
or “motorcar.” Another synset is described as a “cabinfor transporting people”, containing the word “elevator
As mentioned previously, the goal of the substitution at-
car.” For every synset, WordNet provides links to hy-
tack is to reduce the overall spam score of a mail by auto-
pernym synsets, which are sets of words whose meaning
Table 1: Overlap of most frequently used tokens in three different spam archives for 2006 (see Section 4).
encompasses that of other words. That is, a hypernym
all synsets returned by WordNet are considered (although
is more generic than a given word. For example, “motor
the substitution is less likely to be accurate).
vehicle” is a hypernym of “car” [15].
The easiest strategy is to select that word among the
Whenever WordNet is queried for alternatives for a
candidates whose spam probability is the lowest. An-
particular word, the tool not only requires the search
other strategy is to randomly choose a word from the re-
word itself, but also additional information that describes
sulting synset(s). The latter approach aims to create di-
the role of this word in the sentence (such as whether
versity in the substitution process for a large set of mails.
this word is a noun, a verb, or an adjective). The rea-
If a word is always replaced by the same word, the spam
son is that WordNet distinguishes between synsets for
probability of this word would rise every time the mail is
verbs, nouns, adjectives and adverbs, and it is necessary
classified as spam. Variability in the substitution could
to specify what kind of synsets the result should con-
slow down this process. Thus, we can select between
tain. For example, if a word that is used in the query
minimum or random as replacement strategies.
is a noun, it would not make sense to look up synsetsof verbs. Obviously, many words possess more than a
single role. For example, the word “jump” can either be
a spam probability lower than that of the original word is
used as verb or as noun. The natural language processing
found and the spam probability is very high, it is possi-
(NLP) tool that we use to perform the recognition of the
ble to exchange a single letter of the word with another
roles of words in a sentence is contained in the LingPipe
character that resembles this letter (e.g. “i” with “1”, “a”
NLP package [12]. This tool relies on the context of a
with “@”). This is an implementation of a trick from
word to discover its role in a sentence and assigns a tag
John Graham-Cumming’s site “The Spammer’s Com-
to each word that describes its part-of-speech role [4].
pendium” [8] to conceal words from spam filters. An-
Once the role of a word is discovered, WordNet can
other threshold, called the exchange threshold, has to
provide all synsets that contain the search word in its
be defined that specifies for which words (and their cor-
proper role. In addition, WordNet is also queried for
responding spam probabilities) this obfuscation process
all direct hypernym synsets of these sets, because hy-
pernyms can also act as synonyms and, therefore, ex-pand the search space for suitable replacement words.
Unfortunately, the role of a word is not sufficient to se-lect the proper synset. The reason is that one must se-
We evaluated our substitution attack against three pop-
lect the synset that contains those words that are seman-
ular mail filters: SpamAssassin 3.1.4 [17], DSPAM
tically closest to the original term. As mentioned above,
3.8.0 [5], and Gmail [7]. For our experiments, we ran-
the noun “car” could be replace by the term “automo-
domly chose 100 spam messages from Bruce Guenter’s
bile”, but also by the word “cabin.” To choose the synset
SPAM archive [10] for the month of May 2007.
that contains words with a semantics that is closest to
In a first step, the header lines of each mail were re-
the original term, SenseLearner [14], a “word sense dis-
moved, except for the subject line. This was done for two
ambiguation” tool, is employed. This tool analyzes the
reasons. First, the header (except for the subject line) is
mail text together with the previously calculated part-of-
not altered in a substitution attack, because it contains no
speech tags to determine the synset that is semantically
words that can be replaced by synonyms. Second, retain-
closest to the original search word.
ing the header of the original spam mail would influencethe result, because certain lines of the header could causea spam filter to classify a message differently. In the next
step, each HTML mail was stripped of its markup tags
determining a single synset, only words from this synset
and converted into plain text. Then, we corrected man-
are considered as candidates for substitution. Otherwise,
ually words that were extended with additional charac-
ters or were altered in similar ways to escape spam filter
the overall effectiveness of the attack is limited because
detection (e.g., Hou=se – House). Finally, the resulting
SpamAssassin use Bayesian analysis only as one compo-
messages were processed by our prototype that imple-
nent in their classification process. For example, Spam-
ments the proposed substitution attack. In total, five dif-
Assassin uses “block lists” that contain URLs that are
ferent test sets were created, using different settings for
associated with spam mails. In our test set, many mails
the substitution and exchange thresholds as well as dif-
do contain such links, and in some cases, a mail received
ferent substitution policies (as described in section 5):
more than 10 points for a single URL. In this case, thespam threshold of 5 was immediately exceeded, and the
Test Set A: no substitution, original header removed.
mail is tagged as spam regardless of the result that theBayesian classifier delivers.
threshold: 95%; minimum replacement strategy.
threshold: 100%; minimum replacement strategy.
threshold: 100%; random replacement strategy.
threshold: 100%; minimum replacement strategy.
A threshold of 100% means that no character ex-
change or word substitution is performed. Test set Aconsists of the original messages for which no substi-
tution was performed. Test sets B, C and D use an ag-gressive substitution policy, whereas test set E aims to
Figure 1: SpamAssassin: Bayesian spam scores.
preserve more of the original text. Test set C and D usethe same threshold settings, but apply a different replace-
To gain a deeper understanding of the effect of our
ment strategy. The difference between test set B and C is
attack on the Bayesian classifier of SpamAssassin, we
that certain words in test set B are obfuscated.
examined the Bayesian spam score that is computed bySpamAssassin for the mails before (test set A) and after
Using our five test sets, SpamAssassin and
the most effective substitution attack (test set B). The re-
DSPAM were locally run to classify all mails in each
sults are shown in Figure 1. Note that the spam scores
set. In addition, all mails were sent to a newly created
that are assigned to a mail by SpamAssassin are fixed
Gmail account to determine which of them Gmail would
values that range from -2.599 to 3.5. A negative score
recognize as spam. SpamAssassin was used with its de-
means that the content of the mail is regarded as ham,
fault configuration (where the threshold for classifying
whereas a positive score implies that the mail is spam.
a spam is 5). However, note that we disabled SpamAs-
Values around 0 are neutral that leave the classification
sassin’s ability to learn new spam tokens from analyzed
of the mail to other mechanisms. In the figure, it can
mails. This was done to prevent changes in the results
be seen that for the original test set A, only 10% of all
that depend on the order in which the tests were exe-
mails had the lowest score of -2.599, while 30% received
cuted. Furthermore, SpamAssassin was not allowed to
the highest spam score of 3.5. After the substitution at-
add network addresses to its whitelist. DSPAM was used
tack (with test set B), 25% of all mails achieved a score
in its standard configuration, with the exception that it
of -2.599, while only 2% received 3.5 points. Also, the
was not allowed to use whitelists as well. Whitelisting is
number of mails that were assigned a neutral spam score
disabled to ensure that filters would never incorrectly let
increased. This clearly shows the significant effect of the
a mail pass as ham without first invoking the Bayesian
substitution attack on the Bayesian classification.
This claim is further confirmed when analyzing the re-
The results of the experiments are listed in Table 2.
sults for DSPAM shown in Table 2. DSPAM is much
For each tested spam filter, the numbers show the mails
more dependent on the results derived by the Bayesian
that are incorrectly classified as ham (i.e., the mails that
filter when detecting spam, and thus, the number of spam
successfully penetrated the filter). At a first glance, the
mails that passed the filter could be more than doubled
effectiveness of the substitution attack does not seem to
after the substitution process. To pass filters such as
be significant, especially for SpamAssassin and Gmail.
SpamAssassin (and probably also Gmail), the attacker
Closer examination of the results, however, revealed that
also has to take into account other factors besides the
Table 2: Number of test spam messages not recognized by filters.
content (i.e., text) of his mail. For example, by frequently
ficult task, and our tools are not always able to identify
changing the URLs that point to the spammer’s sites (or
the correct role or semantics of a word. For example,
by hosting these sites on compromised machines), one
WordNet yields “nexus” as replacement for “link.” Other
could evade SpamAssassin’s block list. In this case, the
examples are “locomote” for “go” or “stymie” for “em-
substitution attack is only one building block of a suc-
barrass.” We have invested significant effort to select pre-
cise replacements, but, unsurprisingly, the system failssometimes. Moreover, the bad grammar used in manyspam mails makes correct semantic analysis even more
challenging. To mitigate this limitation, one could con-
uating the effectiveness of a substitution attack, we also
sider a setup in which the substitution system produces
assessed the number of different versions that can be cre-
different versions of a particular spam mail that all have
ated from a single spam mail. For this, we analyzed the
low spam probabilities. Then, a human can pick those
number of words for which substitution was attempted,
alternatives that sound reasonable, and use only those for
as well as the number of possible synonyms for each
spamming. An example for a mail before and after word
word. When a substitution threshold of 60% was used,
substitution is shown in Appendix A.
the system attempted to replace on average 36 words permail. For these, an average of 1.92 synonyms were avail-able, and in 23% of the cases, not a single synonym could
be found. For a substitution threshold of 80%, 19 substi-tution attempts were made on average, with 1.65 avail-
Spam mails are a serious concern to and a major annoy-
able synonyms (and no synonym in 29% of the cases).
ance for many Internet users. Bayesian spam filters are
Using a random replacement strategy, we also found that
an important element in the fight against spam mail, and
there are on average 992 variations of one mail.
such filters are now integrated into popular mail clientssuch as Mozilla Thunderbird or Microsoft Outlook. Ob-
The substitution attack is effective in re-
viously, spammers have been working on adapting their
ducing the spam score calculated by Bayesian filters.
techniques to bypass Bayesian filters. For example, a
However, the attack also has occasional problems.
common technique for disguising spam is appending ad-
One issue is that it is not always possible to find suit-
ditional words to mails, with the hope of reducing the
able synonyms for particular words. This is especially
calculated spam probability. The effectiveness of such
relevant for brand names and proper names such as “Vi-
evasion efforts, however, varies, and Bayesian filters are
agra.” In this case, one has to resort to obfuscation by
replacing certain characters. Unfortunately for the at-
In this paper, we present a novel, automated technique
tacker, spam filters are quite robust to simple character
to penetrate Bayesian spam filters by replacing words
substitution. This can be observed when one compares
with high spam probability with synonyms that have a
the results for test set B (with obfuscation) with test set C
lower spam probability. Our technique attacks the core
(without obfuscation) in Table 2. Also, newly created
idea behind Bayesian filters, which identify spam by as-
words can be learned by spam filters, which counters the
signing spam probability values to individual words. Our
obfuscation or even raises the spam score of a mail [22].
experiments demonstrate that automated substitution at-
Another problem for automated substitution are spelling
tacks are feasible in practice, and that Bayesian filters are
errors in spam mails, which make it impossible to find
vulnerable. Hence, it is important for service providers
the misspelled words in the thesaurus.
and mail clients to make use of a combination of tech-
Another issue is that automated word substitutions are
niques to fight spam such as URL-blocking, blacklisting,
not always perfect. Natural language processing is a dif-
This work was supported by the Austrian Science Foun-
This example shows the content of a mail before and after
dation (FWF) under grant P18157, the FIT-IT project
the substitution process. It can be seen that most words
Pathfinder, and the Secure Business Austria competence
are substituted by a reasonable replacement, although us-
[1] ARADHYE, H. B., MYERS, G. K., AND HERSON, J. A. Image
Subject: Take twice as long to eat half as much
analysis for efficient categorization of image-based spam e-mail. In Eighth International Conference on Document Analysis andRecognition (2005).
I know it is the HOODIA that has made me lose
weight. Now I am so confident I think I will try to do it a
archive.org/web/20050206210806/www.jerf.
few more times and see where it gets me. I love the fact
org/writings/bayesReport.html,, February 2003.
that I am getting weight loss results without any bad side
[3] CipherTrust SpamArchive. ftp://mirrors.blueyonder.
effects like the other products that have stimulants in
them. So I just had to write and give you my testimonial
to say I am happy I gained my body back and since
[4] CUTTING, D., KUPIEC, J., AND PEDERSEN, J. A practical part-
of-speech tagger. In Third Conference on Applied Natural Lan-
losing weight, I am ready to become more active and
guage Processing (1992), Xerox Palo Alto Research Center.
attractive than I have ever been. Thanks So Much,
Patricia Strate - Currently 137 lbs
Order online securely from our website
[7] GOOGLE. Gmail. http://mail.google.com/.
(A sample is available at no cost to you)
[8] GRAHAM-CUMMING, J. The spammers’ compendium. http:
pls click the remove link at our website, and enter your
[9] GRAHAM-CUMMING, J. How to beat an adaptive spam filter. In
UENTER, B. Bruce Guenter’s SPAM Archive. http://www.
Subject: Take twice as long to eat half as much
[11] KRAWETZ, N. Anti-Spam Solutions and Security. http://
www.securityfocus.com/infocus/1763, 2004.
I know it is the HOODIA that has made me drop
[12] LingPipe 2.4.0. http://www.alias-i.com/lingpipe/. off weight. Instantly I am so confident I think I will try to
[13] LOWD, D., AND MEEK, C. Good word attacks on statistical
spam filters. In Conference on Email and Anti-Spam (2005).
do it a few more times and see where it gets me. I love
the fact that I am getting weight passing results without
any bad side effects like the other merchandises that
[15] PRINCTON. Wordnet 2.1. http://wordnet.princeton.
have stimulants in them. So I just had to write and give
you my testimony to say I am happy I derived my body
[16] PU, C., AND WEBB, S. Observed trends in spam construction
back and since losing weight, I am quick to become
techniques: A case study of spam evolution. In Third Conference
more active and attractive than I have ever been. Thanks
on Email and Anti-Spam (CEAS) (2006), p. 104.
[17] SpamAssassin. http://spamassassin.apache.org/.
Patricia Strate - Currently 137 pounds
[18] SULLIVAN, T. The more things change: Volatility and stability
in spam features. In MIT Spam Conference (2004).
Order online securely from our internet site
//web.archive.org/web/20051104234750/http:
(A sample is usable at no cost to you)
[20] WITTEL, G., AND WU, F. Attacking statistical spam filters. In
pls click the remove link at our internet site, and enter
First Conference on Email and Anti-Spam (CEAS) (July 2004).
[21] ZDZIARSKI, J. Bayesian noise reduction: Contextual symmetry
logic utilizing pattern consistency analysis. In MIT Spam Confer-ence (2005).
[22] ZDZIARSKI, J. Ending Spam: Bayesian Content Filtering and
the Art of Statistical Language Classification. No Starch Press,2005.
Flamel Technologies Announces Conference Call; Chairman’s Letter to Shareholders; and Extension of Voting Deadline for ADR’s Lyon, France – June 1, 2005 - Flamel Technologies (NASDAQ: FLML) has scheduled a call to be held today at 4:15 PM EDT. Hosting the call will be Dr. Gerard Soula, Founder, Chairman, and CEO of Flamel. Also on the call will be Stephen H. Willard, Flamel’