NIST 20, Major Upgrade NIST/EPA/NIH EI & Tandem Libraries Announced 68th ASMS Conference

NIST 20, Major Upgrade NIST/EPA/NIH EI & Tandem Libraries Announced 68th ASMS Conference

In 1988, the National Bureau Standards (later named the National Institute of Standards and Technology), an agency of the United States Department of Commerce, took the stewardship of the electron ionization (EI) mass spectral library which was previously under the joint control of the U.S. Environmental Protection Agency and the National Institutes of Health. Shortly after, NIST created the Mass Spectrometry Data Center and changed the name of the library from the EPA/NIH Mass Spectral Library to the NIST/EPA/NIH Mass Spectral Library, in 1992, the Data Center released the first version of the library to have more than a single spectrum for each compound. Up to that time, starting from the formation of the library in 1973, each compound was represented by a single mass spectrum. The 1992 release also included chemical structures in an electronic format. In addition to the generic format of the Library used by instrument manufacturers for the purpose of creating libraries in their own proprietary format, the data was made available to end‑users in a format specific to a search program developed at NIST that initially ran on an IBM PC-XT. This program was first adopted by Perkin Elmer Corporation’s new transmission quadrupole Gas chromatography/Mass Spectrometry (GC/MS) system. Almost immediately thereafter, it was adopted by Varian, Inc. for their newly introduced quadrupole ion trap GC/MS using technology purchased from Finnigan Corp.

Until NIST’s software introduction, all commercial mass spectrometer software required a mass spectrum of a known compound to be acquired on the instrument and submitted to the library search to view the information contained about the compound (CAS registry number, elemental composition (Formula) nominal mass (MW), contributor, etc.) Most instrument systems did not offer a display of the entire mass spectrum (including isotope peaks); and, few, if any, had chemical structures. The NIST Mass Spectral Search Program (NIST MS Search) became very popular with end-users. Its ability to compare an acquired spectrum with the Library was soon recognized as being superior to anything available at that time. More manufacturers began to use the program or in the format of a DLL as part of their software. Over the years, this program evolved to the Microsoft® WindowsTM version that is being released with NIST 20 (also known as Standard Reference Database 1A).

From almost the beginning of the Mass Spectrometry Data Center, NIST acquired spectra. At first, these acquisitions were of EI spectra using GC/MS. Once the Retention Index (RI) data were introduced, using literature values, including acquisition methods, NIST started measuring the RI values and recording the GC methods related to the spectra they acquired and included these in the GC Method/RI data compilation. In early 2000, NIST also began the acquisition of data using tandem quadrupole LC-MS instruments and later accurate m/z spectra using tandem quadrupole–time-of-flight (Q–TOF) and Orbitraps®. Most of the new spectra in the NIST 20 (EI and Tandem) Libraries have been acquired by the NIST Data Center.

Over the next three decades following the creation of the NIST Mass Spectrometry Data Center, the volume and breadth of compound coverages has increased at a prestigious rate. NIST has now had eight releases; typically every three years, culminating with the release on June 2, 2020, of NIST 20. Beginning with the 2005 release of the NIST/EPA/NIH EI Library, NIST provided GC Methods and Kövats Retention Indexes (RI) from the scientific literature. The 2005 Release also included Tandem Library of product-ion mass spectra where the precursor-ions were formed using LC/MS ionization (mostly by electrospray ionization - ESI but, more recently, some by atmospheric pressure chemical ionization – APCI). Both Libraries have grown significantly in the number of compounds and spectra. The search algorithms for both EI and Tandem spectra have been enhanced. New tools like Mass Spec Interpreter have been developed and careful attention to what goes into the library has resulted in the NIST commercial offerings being one of the most powerful analytical tools available to mass spectrometry. Much of the software development grew out of the needs of the evaluators at NIST. These developments, like the Hybrid Searchand MS Interpreter have also proved to be extremely useful to the general mass spectrometrist.

In addition to increased coverage of the mass spectral Libraries and adding GC Method/RI data, NIST developed search algorithms that have greatly enhanced the utility of these Libraries. With the 2017 release, NIST introduced the Hybrid Search which greatly extends the ability to facilitate identification of an unidentified compound based on the structures in the Libraries that produce mass spectra and how those spectra compared to the spectrum of the unidentified compound. Information is provided from such a search that allows the analyst to purpose a structure for the unidentified compound. That proposed structure is then associated with the measured mass spectrum and the pair are examined relative to one another using the Mass Spec Interpreter tool developed by NIST, to see if the peaks in the spectrum could have come from the structure. MS Interpreter works with integer m/z or accurate m/z data from molecular ions (EI) or precursor-ions (LC-MS/MS). A new and enhanced version of MS Interpreteris introduced with NIST 20.

Usually, when there is an introduction of a new edition of a mass spectral library, the emphasis is on the size of the increase and the resulting number of overall compounds in the release. Yes, NIST 20 represents the largest single increase of both EI and Tandem Spectra in the history of the Mass Spectrometry Data Center; but that pales in compassion to what is being said by NIST about this release’s compound coverage. Steve Stein, in the American Society for Mass Spectrometry (ASMS) 2020 Annual Meeting on Mass Spectrometry and Allied Topics, REBOOT (a virtual conference), Wednesday, 3–June Workshop, entitled Compound Identification by Mass Spectral Libraries, said, “That the NIST 20 release was the most unique and important since the very beginning of the building of the Libraries.” He went on to emphasize that the size of the Libraries is not nearly as important as the relevance of the contents. He said Chemical Abstracts has over 150 million compounds. If you have [a mass spectral library of] only one percent, you would have over a million compounds and very few would be relevant to your analysis.” He then said, “The question you should ask is, what fraction of the compounds in your analyses are in the library. Compound selection coverage is really the key to building a viable mass spectral library.” NIST used InChI Keys to review over 40 different non-mass spectral collections of compounds (i.e., Wikipedia, Drug Bank, EPA/FDA Lists, etc.) and ranked each compound according to the frequency of occurrence and importance. This collection was then compared with NIST’s two major Libraries (NIST 17 EI and Tandem). The list from the multiple chemicals based on occurrence and importance compared against a list of commercially available chemicals. The chemicals in the intersection of available chemicals and the list of combined and ranked chemicals, excluding the chemical already in the NIST Libraries, was procured or put on a list to be procured. (Figure 1). Measurements were then begun.

Figure 1. Compound Selection for Inclusion in the Library

The next step in the process of building the NIST Libraries is to assure that the quality of each spectrum is such that it can be relied on for identifications of spectra of unidentified compounds. This is accomplished by a rigorous curation, which includes software and expert (human) evaluation. Figure 2 illustrates the path for the evaluation of the EI mass spectra acquired by NIST. NIST is still accepting donated spectra; however, this number represents a much smaller percentage than in the past. NIST now requests that the GC/MS data file, in an AMDIS (Automated Mass spectral Deconvolution and Identification System) format, rather than an individual EI spectrum, be submitted with standard information to allow AMDIS to calculate the RI value for the compound. The GC method should also be included. When data are submitted in this way, they can be subjected to the strict NIST review. 

Figure 2. Acquisition and Evaluation Process Leading to Library Inclusion

The NIST acquisition and review process for the EI data is illustrated in Figure 2. Once the compound is received by NIST, its name, structure, and other metadata are entered into the Chemical Inventory. The compound is then analyzed by GC/MS. The compound’s spectrum is extracted by the AMDIS program. AMDIS is also used to determine the compound’s RI value. The AMDIS curated spectrum, the compound’s RI value, and the GC method are all added to the entry that is made in a custom library, automatically. It is at this point that the first evaluation is performed by the acquisition analyst. The spectrum and associated data are then submitted to the second evaluator. Not only is their physical evaluation of the spectrum and data, but the spectrum is submitted to a HybridSearch (remember, spectra for these compounds is highly unlikely to be in the current distributed library) and the spectrum and structure undergo an evaluation with MS Interpreter. Each of these tools are part of the NIST 20 MS Search Program. In the event there appears to be a question on the part of the second evaluator, the spectrum is passed to a third evaluator who rejects or passes the spectrum/structure entry. Both rejected and accepted spectra are passed to the NIST Archive. After two and a half years, the archive is closed. Any spectra measured after that will be held until the next three-year cycle begins. The Archive is then evaluated by software to resolve issues in any observed inconsistencies. Where there are replicate spectra that have passed all evaluations, the best replicate(s) are selected and the result is the 2020 Release of the NIST/EP/NIH EI Library. There is definitely more effort required for the evaluation process than is required in the measurement phase. Spectra for inclusion in the Tandem Library are evaluated in a somewhat similar but different way; however, the curation is just as rigorous.

One of the challenges for the Tandem Library acquisition is the number of spectra for each compound. This results from the large number of different precursor ions and the large number of spectra measured at different collision energies for each of these precursor ions. The NIST 20 ESI Tandem Library has been divided into two parts. 

EXPERIMENTAL FOR HRAC ACQUISITION

About 90% of the compounds in the NIST 20 Tandem library are represented by high resolving power, accurate mass spectra. Most tandem data were acquired using a flow injection of pure samples dissolved in acetonitrile/H2O/formic acid (50/50/0.1). The instruments were operated in the data-dependent mode for MS2, MS3, and MS4; the most abundant ions were selected for further analysis. Dynamic exclusion was set at 90 seconds to allow sampling of other precursors. The resolution of MS1 and MSn were set as 60,000 and 30,000, respectively.

For MS2 analyses, precursor ions were fragmented by higher‑energy collision-induced dissociation (HCD) followed by Fourier transform analysis (FT). Ions were fragmented with a wide range of Normalized Collision Energies (NCE) (…,14, 20, 30, 40, 50, 65,…). Ion trap spectra MSn were acquired at a normalized collision energy of 35 %. Figure 3 is an illustration of the coverage represented by the 2020 version of the Library.

Figure 3. Illustration of the Breath of Coverage of the 30,999 Compounds in the NIST 20 Tandem Library

COMPOUND IDENTIFICATION

No matter how extensive the mass spectral library coverage is, it is only as good as the software used to match the acquired spectrum and the correct compound in the library. The NIST Mass Spectral search algorithms have proved to be the most effective in the matching of acquired spectra with library spectra when a spectrum of the unidentified compound is in the library. A common concern is what to do when there is not a spectrum of the unidentified compound in the library. NIST addressed this with the development of the Hybrid Search; introduced with NIST 17. This search is based on modifications of a molecule and has proved to be very effective. About a year ago Oliver Fiehn published a paper showing the Hybrid Search resulted in the identification of nearly all the compounds in a standard urine sample represent by quality spectra. Other software tested to accomplish this same task did not come close to the identification which resulted in the Hybrid Search (Figure 4). There have been other publications on the Hybrid Search that have shown just as dramatic results as the Fiehn publication. The Hybrid Search is dependent on knowing the m/z value of the precursor mass. This makes use with tandem data very easy and allows this search to be the default for these types of data. The Hybrid Searchworks equally well for EI and product-ion data. The Hybrid Searchalsoworks very well in conjunction with the MS Interpreter program.

Figure 4. Support of the Effectiveness of the Hybrid Search and How It Works

The Hybrid Search identifies the difference in the m/z values of the spectra of library compounds as compared to the spectrum of the unidentified compound. These differences can be traced to structural changes. These new structures are then associated with the spectrum of the unidentified compound and the pair are evaluated with MS Interpreter.

Using the approach described earlier in this presentation with regard to the compound selection, the EI Library has been expanded by more than 40,000 compounds. Previous NIST releases place in magnitude and breadth compare to the NIST 20 release. This is a direct consequence of the changes NIST has made in the software and to the evaluation process. Even more important than this increase in the number of compounds, is the fact this increase has been driven by the acquisition of compounds of chemical significance. Its breath has been greatly enhanced with this expansion of the compound selection process. Figure 5 is an example of the breath of this expansion. NIST still has large numbers of compounds to be measured once the Coronavirus shelter rules have been lifted and work can resume.

Figure 5. Increased Breath with Increase Volume Due to Compound Selection Process in the EI Library

The growth of the Tandem Library has been even more dramatic. For example, the NIST Tandem Library now contains spectra of a large number of antivirals. Figure 6 is an illustration of the increased breath that accompanies the doubling of the number of compounds and spectra since the last release. Metabolites, plants, and animals are a major class of compounds and are very heavily represented in the NIST 20 Tandem Library. The Tandem Library now includes a small APCI HRAM Library of extracables and leachables (248 compounds).

Figure 6. Increased Breath with Increase Volume Due to Compound Selection Process

Learn more on the development of the NIST 20 Tandem Library and the NIST 30 NIST/EPA/NIH Electron Ionization Library

Conclusion

In conclusion, this release of the commercially available NIST/EPA/NIH EI Library, the NIST Tandem Library, and the NIST GC Method/Retention Index database, with upgraded software and the freely downloadable data described above is truly one of the more significant events to take place in the field of mass spectrometry so far in this century (20% of which has now passed).

  • <<
  • >>