Specific Solutions
How to optimize cross-language search terms to improve e-discovery efficiency
As the head of the multilingual eDiscovery department at Arrow Translation Legal Solutions, I oversee hundreds of non-English cases and handle remediation efforts on multiple cases. In these cases, search terms are often the area where the attorney or vendor has issues and are more skewed. Why is this?
First, we need to understand mistranslated terms and their causes. Mistranslations usually lead to two results: one is that all relevant documents cannot be identified, and the other is that a large number of irrelevant documents are introduced, increasing the cost of review.
Common causes of mistranslated terminology
- Lack of context
Search terms are often words stripped of context. However, case teams often craft terms around specific situations, while translators have limited knowledge of the case context, issues, and industry. This leaves room for translation errors. For example, the word “close” can mean a deal, physical proximity, intimacy, etc. in different contexts. In many languages, they can be completely different words (e.g., “ Cerca ,” “ Íntimo ,” “ Cerrar ” in Spanish). Lack of context can cause antitrust teams to miss targets and only review emails related to extramarital affairs. - natural language terminology
misses the mark is that it fails to reflect natural language, the language people actually use. Take the above “close” example. While “ cerrar ” captures the correct concept, it is a mistranslation because people don’t use irregular verbs when writing. To complicate matters further, “ cerrar ” is an irregular verb with 30 different conjugations. - using
terms is almost entirely determined by search operators, so they are an important focus of translation efforts. E-discovery professionals spend years learning how to use search operators correctly, but linguists generally do not have this training. Therefore, collaboration with linguists is essential and requires a word-by-word analysis of the grammar. For example, the use of wildcards for short words is risky, and best practice is to not use wildcards for words with less than five characters to avoid thousands of irrelevant documents.
A more accurate search syntax for "off" translation is as follows: for many modern eDiscovery review applications, use ( ciero OR cierr * OR cerra * OR cerre OR cerro ); for some processing engines, use ( ciero OR cierr * OR cerrá * OR cerra * OR cerre OR cerré OR cerro OR cerró ). Different applications handle accented characters differently, so both are required when translating.
- Search Strategy
There are many strategic options when translating terms, and it is crucial to understand the overall strategy of the case and reflect it in the translation. We can discuss two scenarios: - Doing what is required
In some cases, the strategy is to do exactly what is required, choosing only the most appropriate term. - Leaving no dead ends
In investigative cases, strategies might include synonyms closely related to the main concept, such as translating "rain" as "rain/light rain/thunderstorm."
By focusing on the above four aspects, the effectiveness of cross-language search terms can be significantly improved, thereby increasing the efficiency of e-discovery.