| |
Humans
are social animals. Individuals form groups; groups
form cultures,
and cultures evolve into civilizations. Names have
created a unique means of identifying and categorizing
individuals.
Name classifications were based on similar factors
within similar groups. These factors included recognizable
physical features and character traits Individuals
with similar attributes were usually clustered within
specific geographic areas. As they moved from rural
to urban environments, separate groups were formed
each having their own societal structure and belief
system.
As groups became more unified ethnic, religious and
minority distinctions evolved when each group perceived
themselves as the we group and all others
as the they group.
List Service Directs Ethnic and Religious Encoding
System utilizes this historical concept as one component
of its process. Knowing that each group has a distinct
culture and a distinct world view, our
process has created a rule and exception based program
that incorporates the idea that each group has last
and first names that will be unique to that group.
By applying specific criteria in a specific order,
the ethnic, religious, and minority identity of the
individual can be ascertained.
The accuracy of this identification is further enhanced
by applying a geographical analysis, based on census
tract data, of the name within the ethnic, religious
or minority group.
Our Ethnic and Religious Encoding System is NOT A
SURNAME BASED SYSTEM. Rather, it is a revolutionary
new process that allows the marketer or researcher
to select over 130 ethnic, religious, and minority
groups from any list.
Our
ethnic encoding system analyzes both an individuals
first and last name and applies, in a specific order,
ethno-linguistic and geocentric rules to both the
surname prefix and suffix and identifies the specific
ethnic, religious, and minority status of individuals,
even an individual with a multiethnic surname.
The LSDI ethnic encoding system consists of a set
of irrevocably entwined computer programs and data
files as follows:
- A
unique first name file by ethnicity
- A
non-unique surname file by ethnicity
- A
series of two to five character prefix rules by
ethnicity
- A
series of two to five character suffix rules by
ethnicity
- A
series of codes to identify the ethnic, religious
and minority
status of an individual
- A
geocentric reference table
- A
complex series of computer programs that analyze
the individuals
names using the systems data.
Another
exclusive feature of our system is its ability to
recognize hyphenated and misspelled names, which will
be correctly coded because of the prefix and suffix
rules.
Hyphenated names will be captured using our first
name and surname tables in conjunction with the prefix
and suffix rules that apply to them.
In order to understand and appreciate our system,
it is necessary to trace the onomastic variables
that are found within the process. These variables
include ethnic heritage descriptions, locational identifiers,
and ethnic life form and individual trait describers.
Ethnic Heritage Descriptions
An
ethnic heritage descriptor alludes to the parentage
of the individual. Each ethnicity and language has
a different way of expressing this within the first
or the last name.
List Service Directs Ethnic and Religious Encoding
System has used these descriptors (either suffixes
or prefixes appended to first or last name) to accurately
identify particular names unique to particular ethnic
groups. Below are several examples that will illustrate
the way ethnic heritage describers may be used.
In the Finnish ethnicity, the suffix NEN
means the offspring of. In Welsh, the
original prefix AP (since shortened to
P when combined with a first name) means
offspring of. Thus, PROBERT is Welsh for
offspring of Robert.
The suffix UCCI means descendant
of in Italian while the Turks use the suffix
BASHI to mean father of.
Prefixes play an important role in identifying some
Irish names. Grandson of is implied in
the prefix O, while the prefix MC
means son of. To designate Uncle,
the Burmese use the prefix U.
It is important to remember that the use of these
name endings and beginnings do not alone guarantee
accuracy. These components are only a part of our
process.
Ethnic
Locational Identifiers
During
the Dark and Middle Ages it was essential that an
individual could
be traced to his country of origin or his geographic
location within a country. This information immediately
identified the individual as friend or foe. One method
that was adapted was to add an identifier to his name
in the form of a prefix or suffix.
Geographic locators are important to the ethnic identifier
process as well. In addition to the suffixes and prefixes
and the rules derived therein, our system also incorporates
actual geographic coordinates in the U.S. to determine
ethnic, religious and minority group clusters. This
improves the accuracy of our system.
Below are some examples of ethnic locational identifiers
and the popular name myths they refute.
In the Finnish ethnicity, the surname suffixes OLA
YLA and KOSKI mean upper,
lower, and middle respectively. KOSIU
refutes the popular notion that all names ending in
SKI are Polish and OLA proves that not
all names ending in a vowel are Italian.
The French use the prefixes DU, DE, DELAS, and
DES to designate from; while the
Romanians use AN-U and EANU
to convey the same meaning.
Italian names ending in DDA and DDO
show that the individual is from Sardinia.
Ethnic locational identifiers help our ethnic system
to correctly determine ethnic origin of all groups
including Italian and Polish. (see above Finnish).
Other systems currently in use do not have this ability
and are far less accurate.
Ethnic
Life Form & Individual Trait Descriptors
This
category reveals the humorous and sometimes cruel
side of human
nature. Created as a means of classifying individuals
by their physical attributes and likeness to animals
(sometimes not flattering); these descriptors offer
an unusual method of identifying individuals by ethnic
group.
Our system uses these (as well as all other descriptors)
to help build its rule and exception based system.
These rules allow our process to capture names that
might be eliminated or inaccurately identified using
surname systems, and other programs. Our system captures
and assigns these individuals to their correct ethnic
group.
The Italians provide us with many examples of both
life form and individual
trait descriptors: FUZZO(curly),
MANCINI(left handed), LAGO(tal1)
and FASANO (pheasant). Less
flattering is BOCCACIO (ugly mouth)
IZZO (snail), and MUSSOLINI
(gnats).
Religious Affiliation
Our
system has a code that determines religious affiliation.
However, the
process cannot distinguish denominations, sects and
splinter groups within individual religions. For example,
it cannot accurately determine who is Baptist or Calvinist
within the Protestant group. Nor can it select Hasidic
Jewish groups from the Jewish population at large.
Religious affiliations are determined by geographic
locators and ethnic
group identifiers. Yes, we will include some atheists
and agnostics within
the groups. However, our percentage of accuracy still
holds.
We are constantly utilizing new technology and developing
new rules,
exceptions, and criteria that will allow our system
to maintain its high level
of accuracy.
African-American
Our
system differs from conventional approaches in that
it goes well beyond knowing which areas have high
concentrations of Afro-Americans. (Most compiled lists
generate this select based on the neighborhood
approach; i.e. if you live in an a high percentage
nonwhite zip code you must be African-American).
Our process identifies African based Afro-American
names with its unique
first name and surname tables. Individuals identified
in this manner may
reside anywhere in the United States, not just in
African-American clusters.
In addition, our system identifies Afro-Americans
with non African based
but unique first names anywhere in the United States.
Sheneka Brinter living in Conway, Arkansas is African
American. So is Amarta Azubuike.
As a further safeguard, our system looks within the
African-American
clusters and eliminates all non-black ethnicities,
qualifying only those individuals with commonly borrowed
ethnic names and certain Islamic names.
The system continually refines the selection criteria
to ensure that the name identified as African-American
will be African-American; not just a could be
but an is.
Hispanic
Our
system identifies Hispanic individuals by unique last
and first names
using rules and exceptions that apply to these names.
Geographic mapping confirms the locations of this
population.
Our process will identify Hispanics in non-Hispanic
areas. For example, in Conway, Arkansas we identified
Juanita Beene as Hispanic NOT by zip code cluster
and NOT by last name but by FIRST NAME. John Martinez
was identified NOT by first name or zip cluster but
LAST NAME.
Surname based systems cannot identify non-Hispanics
with Hispanic
surnames. Our ethnic encoding system can and does.
There are many multiethnic names (e.g. Delgado) which
could be Hispanic but could also be another ethnicity
(e.g. Italian). Our system can separate the multiethnic
name into its proper component ethnicities by using
first names where possible. The remainder are stored
with the multiethnic uncoded class until they are
verified as being Hispanic using first name indicators.
Hispanic women who marry individuals with non-Hispanic
surnames are identified by our systems unique
first name table. Quite often, Hispanics marrying
Hispanics lead to hyphenated names Our system identifies
these and some misspelled names with its ethno-linguistic
rules.
Surname based systems are simply that: systems that
use only the last name of an individual to infer that
individuals ethnic, religious or minority status.
Our system takes this idea and expands on it. Thus,
our Hispanic names are HISPANIC Hispanic names; not
Portuguese, Italian, and other names based on conventional
wisdom.
Japanese
Almost
all Japanese names are comprised of descriptive components
put together. Hence, the names are almost musical
due to their repetitive vowel sounds.
Although the Japanese surname is the easiest Oriental
name to distinguish, (especially since they were not
influenced by the Chinese) most surname based systems
have included all Asians in a category known as Oriental.
Therefore, the Koreans, who not only share certain
surnames with the Chinese but often introduce Chinese
qualifiers to their names, are mixed with the Japanese
and other Asian ethnic groups.
Our system has separate and unique prefix and suffix
rules and exceptions for all ethnic groups including
those representing the continent of Asia. Also, we
have an extensive Japanese surname and first name
table.
These features allow us to identify Japanese in traditionally
non-Japanese areas such as Mark Tanaka and Junk0 Takahashi
in Conway, Arkansas and to also identify Japanese
women who have non-Japanese surnames.
It is important to remember that each ethnic group
within the Asian community considers themselves to
be mutually exclusive of the others. Japanese wish
to be identified as such and not confused with or
considered as other Orientals. Other systems either
overlook this fact or have not developed components
within their programs to allow proper ethnic identifications.
Our system considers each ASIAN ethnic group as a
separate and identifiable selection. This has been
attained by creating rules and exceptions based on
the study of the history and development of surnames
and first names with the culture of each country.
This allows the user of our system to select a particular
ethnic group, such as Japanese from the larger Asian
or Oriental category.
Completeness
and Accuracy
In
order to correctly determine the accuracy of our data
and the algorithms contained within, we contracted
with a national market research company to conduct
a telephone study. In March of 1999, a major telephone
study was conducted. Sample size was determined by
the research company to ensure that the resulting
data would be at the 95th level of confidence. We
set up quotas by major ethnicity (Hispanic, African
American, Asian and "Other") in an effort
to make sure each was properly represented in the
study. A total of 1,566 telephone interviews were
conducted. A telephone methodology was chosen as opposed
to a mail study because we felt we would be able to
reach a larger number of respondents via telephone
quicker and at less cost than a mail study.
The
sample for the study was pulled from our national
database using a random nth selection in order to
get a statistically valid cross section of the database.
Each piece of sample was assigned a sample number.
This number was used after the data was tabulated
to cross match the data from each individual completed
interview with the data for that record contained
in our database. In other words, if a respondent indicated
that they are Hispanic in the survey, we would look
at that respondent's data record in our database to
see if the data matched as a way of checking accuracy.
This extra step was done in addition to the standard
data tabulations that were completed for the study.
Our
findings indicated that different ethnicities produced
different levels of cooperation and accuracy. Please
see the chart below, which reflects cooperation and
accuracy by major ethnicity.
| ETHNICITY |
COOPERATION |
ACCURACY |
| HISPANIC |
48%
|
94%
|
| ASIAN |
39%
|
86%
|
| AFRICAN
AMERICAN |
47%
|
90%
|
| OTHER |
46%
|
92%
|
Descriptions
and Explanation of Usage
On
the following pages are summary level counts by ethnicity
which depict the actual record counts our ethnic system
stores and utilizes when analyzing an individuals
full name and address.
For each ethnicity there are columns for the number
of onomastic rules that apply to that ethnicity, the
number of unique first names applicable to that ethnicity,
and the number of surnames stored for that ethnicity.
ONOMASTIC RULES (Prefix & Suffix Rules)
There are 1,157 onomastic rules currently implemented
in our ethnic system. Each rule reaching implementation
level was hypothesized and tested to ensure validity.
Many hypothesized rules were not implemented as they
were found to be only partially valid. Implemented
rules apply to the examination of the prefix and suffix
of a surname. When an individuals ethnicity cannot
be determined by looking at the whole name, its component
parts, the prefix and suffix are analyzed and matched
against the rule files in a specific order. The order
is governed by length of argument, i.e., search five
character suffix before four character, the three
character etc.
Thus, all names ending in KOSKI not found
on the surname file or matched versus a unique fast
name file will be coded as Finn because of the onomastic
rule. Other names ending in SKI, but not
KOSKI will, after not being coded with
first and last name examination, result in the individual
being coded Polish.
Our system does not require all Polish names ending
with SKI to be on its surname file, nor does
it require all Finnish names ending in KOSKI
to be on the surname file. There are many advantages
to this open ended approach.
Misspelled names, hyphenated names, and names new
to this country are a few. Our Onomastic rules allow
our process to outperform other surname based systems
in all three above cases.
UNIQUE FIRST NAME FILE
The key operative word here is unique.
While Anthony is a name very commonly
used in Italian families, it is not unique to Italian
families. Hence, there is no Anthony in the unique
frost name file. Nor are Juan or Pablo
to be found in the unique first name file.
Because of the assimilation process that has occurred
in the U.S.A. there is not a single unique first name
stored under English. Our system currently recognizes
2 1,60 1 unique first names that can be pegged to
a specific ethnicity.
While there are no absolutes, the chance that a person
with a first name Fumihiko is other than
Japanese is statistically irrelevant. Likewise, a
person with the Igbo first name Ogochukwu
is statistically unlikely to be other than from Africa
or is an African American, even if their last name
is Smith.
SURNAME FILE
Our system currently has 129,76 1 surnames on its
surname file. Where surnames are useful due to numerous
variations in prefix and suffix spellings, such as
in Italian, there are a correspondingly large number
of surnames compiled for that ethnicity. There are
over 18,000 on file for Italian. Where a large proportion
of individuals can be determined by either unique
first names or onomastic rules, there are fewer names
needed. For instance, in Japanese there are only about
2500 surnames on file, but there are 182 onomastic
rules and another 500 plus unique first names.
| ETHNICITY |
ONOMASTIC
RULES |
UNIQUE
1ST NAMES |
SURNAMES |
| English |
54 |
0 |
12,688
|
| Scot |
3 |
71 |
3,628
|
| Dane |
1 |
71 |
608
|
| Swede |
71 |
129 |
1,279
|
| Norw |
7 |
58 |
927
|
| Finn |
52 |
165 |
1,732
|
| Icelandic |
3 |
1 |
108
|
| Dutch |
84 |
71 |
6,768
|
| Belgian |
0 |
4 |
632
|
| German |
78 |
27 |
13,188
|
| Austrian |
0 |
0 |
580
|
| Hungarian |
27 |
204 |
1,720
|
| Czech |
4 |
24 |
1,504
|
| Slovak |
1 |
6 |
160
|
| Irish |
21 |
13 |
3,792
|
| Welsh |
4 |
48 |
268
|
| French |
45 |
34 |
8,634
|
| Italian |
112 |
130 |
18,767
|
| Spanish |
65 |
1776 |
10,064
|
| Portuguese |
6 |
148 |
562
|
| Polish |
36 |
152 |
4,824
|
| Estonian |
1 |
17 |
184
|
| Latvian |
3 |
106 |
188
|
| Lithuanian |
8 |
67 |
492
|
| Ukranian |
6 |
65 |
748
|
| Georgian |
5 |
6 |
124
|
| Byelorus |
0 |
0 |
120
|
| Armenian |
1 |
1 |
908
|
| Russian |
42 |
0 |
6,304
|
| Turk |
1 |
189 |
340
|
| Greek |
57 |
168 |
3,448
|
| Persian |
0 |
94 |
668
|
| Moldavian |
0 |
0 |
20
|
| Bulgarian |
1 |
140 |
568
|
| Romanian |
9 |
147 |
972
|
| Albanian |
0 |
9 |
11
|
| Native
American |
0 |
400 |
39
|
| Slovene |
0 |
15 |
48
|
| Croatian |
0 |
66 |
523
|
| Serbian |
4 |
42 |
1,123
|
| Bosnian |
0 |
1 |
68
|
| Azerb |
0 |
1 |
19
|
| Kazakh |
0 |
4 |
51
|
| Afghan |
0 |
24 |
3
|
| Pakistani |
0 |
4 |
56
|
| Bengladesh |
0 |
0 |
10
|
| Indonesian |
0 |
15 |
62
|
| Indian |
84 |
1,228 |
2,472
|
| Burmese |
0 |
27 |
13
|
| Mongol |
0 |
41 |
59
|
| Chinese |
1 |
2,576 |
964
|
| Korean |
0 |
4,596 |
616
|
| Japanese |
182 |
760 |
3,284
|
| Thai |
55 |
860 |
1,148
|
| Malay |
0 |
2 |
18
|
| Laotian |
1 |
220 |
528
|
| Khmer |
10 |
36 |
236
|
| Vietnamese |
00 |
1,288 |
632
|
| Sri
Lanka |
0 |
11 |
20
|
| Uzbek |
0 |
1 |
28
|
| Misc
Orient |
0 |
6 |
16
|
| Jewish |
8 |
2,976 |
6,428
|
| Arab |
83 |
492 |
3,770
|
| Egyptian |
0 |
2 |
68
|
| Ruandan |
0 |
0 |
23
|
| Tonga |
0 |
1 |
2
|
| Senegal |
0 |
4 |
14
|
| Sudanese |
0 |
0 |
2
|
| Moroccan |
0 |
3 |
67
|
| Afric-Am |
4 |
224 |
680
|
| Kenyan |
0 |
160 |
144
|
| Nigerian |
0 |
304 |
236
|
| Ghana |
0 |
109 |
35
|
| Zambia |
0 |
0 |
20
|
| Zaire |
0 |
5 |
17
|
| Surinam |
0 |
0 |
4
|
| Mozambique |
0 |
0 |
3
|
| Ivory
Coast |
0 |
7 |
23
|
| Bhutanese |
0 |
0 |
3
|
| Ethiopian |
3 |
169 |
560
|
| Ugandan |
0 |
260 |
31
|
| Botswana |
0 |
0 |
3
|
| Cameroon |
0 |
1 |
16
|
| Zimbabwe |
0 |
58 |
28
|
| Congo |
0 |
0 |
3
|
| Cent
Af Rep |
0 |
0 |
1
|
| Togo |
0 |
1 |
1
|
| Bahrain |
0 |
0 |
1
|
| Qatar |
0 |
0 |
1
|
| Guyana |
0 |
0 |
0
|
| Tibetan |
0 |
1 |
1
|
| Fiji |
0 |
0 |
1
|
| Swaziland |
0 |
0 |
3
|
| Namibian |
0 |
0 |
3
|
| Burundi |
0 |
0 |
8
|
| Tanzania |
0 |
41 |
19
|
| Gambian |
0 |
0 |
3
|
| Somalia |
0 |
0 |
2
|
| Macedonia |
0 |
0 |
4
|
| Chad |
0 |
0 |
3
|
| Gabonese |
0 |
0 |
2
|
| Angola |
0 |
0 |
2
|
| Chech |
0 |
7 |
23
|
| Kirghiz |
0 |
2 |
2
|
| Tajik |
0 |
0 |
2
|
| Algerian |
0 |
2 |
34
|
| Phillipine |
0 |
6 |
8
|
| Lesotho |
0 |
3 |
7
|
| Tunisian |
0 |
0 |
16
|
| Hawaiin |
0 |
69 |
1,440
|
| Madagasgar |
0 |
2 |
7
|
| |