Chemometric Analysis for Identification of Botanical Raw Materials for Pharmaceutical Use

Chemometric Analysis for Identification of Botanical Raw Materials for Pharmaceutical Use: A Case Study Using Panax notoginseng


The overall control of the quality of botanical drugs starts from the botanical raw material, continues through preparation of the botanical drug substance and culminates with the botanical drug product. Chromatographic and spectroscopic fingerprinting has been widely used as a tool for the quality control of herbal/botanical medicines. However, discussions are still on-going on whether a single technique provides adequate information to control the quality of botanical drugs. In this study, high performance liquid chromatography (HPLC), ultra performance liquid chromatography (UPLC), capillary electrophoresis (CE) and near infrared spectroscopy (NIR) were used to generate fingerprints of different plant parts of Panax notoginseng. The power of these chromatographic and spectroscopic techniques to evaluate the identity of botanical raw materials were further compared and investigated in light of the capability to distinguishing different parts of Panax notoginseng. Principal component analysis (PCA) and clustering results showed that samples were classified better when UPLC- and HPLC-based fingerprints were employed, which suggested that UPLC- and HPLC-based fingerprinting are superior to CE- and NIR-based fingerprinting. The UPLC- and HPLC- based fingerprinting with PCA were able to correctly distinguish between samples sourced from rhizomes and main root. Using chemometrics and its ability to distinguish between different plant parts could be a powerful tool to help assure the identity and quality of the botanical raw materials and to support the safety and efficacy of the botanical drug products.


In recent years, there has been increased interest in the United States in developing botanical preparations as pharmaceutical products and not only as dietary supplements. Since it is known that different plant parts of a herbal medicine may possess different treatment effects, one hurdle has been to develop analytical methods to adequately identify the source, i.e., different plant parts, of the botanical raw material to ensure that the botanical drug substance and drug product can be reproducibly manufactured to provide the same safety and efficacy as the clinical trial supplies,. A typical example for dramatic differences in therapeutic activity is Ephedrae herba and Ephedrae Radix et Rhizoma. Ephedrae herba is the herbaceous stem part of Ephedra which can elevate blood pressure and Ephedrae Radix et Rhizoma is the root part, which can lower blood pressure [1]. In order to avoiding medication errors with herbal preparations, regulatory agencies, such as the US FDA [2], EMA [3]and China SFDA [4], recommend that herbal medicines are prepared from specific parts of the botanical raw material.

There are many reports about fingerprint techniques to address the identity and quality of botanicals, which are mainly chromatographic analysis, including high performance liquid chromatography (HPLC) [5], [6], gas chromatography (GC) [7], ultra performance liquid chromatography (UPLC) [8], [9] and capillary electrophoresis (CE) [10]. Spectroscopy methods are also applied to gain fingerprints. Near infrared spectroscopy [11] is a widely used technology in the pharmaceutical industry, which has advantages such as real-time measurement. These methods can be compared in order to determine their advantages and drawbacks and to provide assurance on how to obtain meaningful chromatographic fingerprints to identify the quality of botanical drug products. Furthermore, in combination with chemometric approaches, fingerprint technology can be applied as a powerful method for characterizing botanical drug of different origins and quality. For example, pattern recognition methods, such as principal component analysis (PCA), hierarchical cluster analysis(HCA), linear discriminant analysis (LDA), k-nearest neighbor (k-NN), soft independent modeling of class analogy(SIMCA), partial least squares-discrimination analysis (PLS-DA) are commonly applied for distinguishing different origins of botanical drugs.

In this study, Panax notoginseng (Burk.) F.H. Chen (Also named as Tianqi or Sanqi in China) was used for analysis. Not only is it an important Chinese herbal medicine which has a diversity of effects, including anticarcinogenic [12], hepatoprotective [13] and cardiovascular protective properties [14], [15], but the different plant parts are used for different therapeutic purposes. In China, the rhizome and the main root of Panax notoginseng are supplied separately in the market, with the rhizome parts extracted for “XUESAITONG” while the main root is used for “XUESHUANTONG”.

In this study, three chromatographic fingerprinting methods and one spectroscopic fingerprinting method were developed using high performance liquid chromatography (HPLC), ultra performance liquid chromatography (UPLC), capillary electrophoresis (CE), and near infrared spectroscopy (NIR). As illustrated in the workflow of study design shown in Fig.1, their power for distinguishing different parts of Panax notoginseng using chemoinformatics approaches were compared and investigated.

Materials and Reagents

HPLC grade acetonitrile was purchased from Merck (Darmstadt, Germany). Acetic acid glacial was obtained from Tedia (Fairfield, OH, USA). Distilled water was purified by Milli-Q system (Millipore, USA). Ginsenosides Rg1, Re, Rb1, Rd1 and notoginsenoside R1 were purchased from Jilin University (Changchun, China). The other chemicals were of analytical grade.

Plant Material

In total, 45 batches of dried Panax notoginseng samples were studied to build a model, which consists of 16 batches of rhizomes, and 29 batches of main roots. 6 additional batches of samples were used to test and validate the model. The main root parts of the botanical raw material Panax notoginseng were collected from Yunnan and Guangxi Province, and the rhizomes are collected from Yunan Province, China. The plant materials were collected within one year and used as commercial products. The botanical origin of materials was identified morphologically by Gan Pingyuan (Wenshan Institute for Drug Control, Yunnan Province, China) and Zhu Jieqiang (Zhejiang University).


No specific permissions were required for the described field studies. The locations are neither privately owned nor protected by the Chinese government. No endangered or protected species were sampled.

Sample Preparation

The Panax notoginseng sample was pulverized and passed through a 280 µm screen. 40 ml of 70% methanol (v/v) was added to 0.5 g powdered sample. The operating parameters were optimized according to reference [8] for high efficacy of extracting saponins. The suspension was extracted by an ultrasonicator (40 kHz, Shumei KQ250-E, Shanghai, China) for 60 min. During the sonication process, the temperature was controlled below 60°C. After cooling, the extracts were filtered and the filtrate was evaporated to dryness in vacuo. The residue was transferred into a 5 ml volumetric flask and diluted to the desired volume with 70% methanol. The solution was filtered through a 0.22 µm nylon membrane (ANPEL, Shanghai, China) before analysis.

HPLC Fingerprints of Panax notoginseng

The HPLC method conditions were optimized to get a robust separation, including columns, mobile phase, temperature and gradient. The HPLC system used was an Agilent 1100 instrument (Agilent Technologies, USA) which consisted of a quaternary solvent delivery system, an auto-sampler, an on-line degasser, a column temperature controller and ultraviolet detector. The chromatographic separation was performed using an Agilent Zorbax Eclipse Plus C18 column (4.6×50 mm i.d.; 1.8 µm particle size) (Agilent, USA). Flow rate was 0.8 ml/min and the detection wavelength was 203 nm. The column temperature was set at 35°C and the injection volume was 3 µl. The mobile phases consisted of water (solvent A) and acetonitrile (solvent B). The elution conditions were: 0–22 min, 17–19% B; 22–30 min, 19–27% B; 30–35 min, 73% B; 35–47 min, 27–46% B; 47–70 min, 46–90% B. The re-equilibrium was 15 min; the total run time was 85 min.

UPLC Fingerprints of Panax notoginseng

UPLC method was employed from [16]. UPLC was performed on a Waters ACQUITY UPLCTM system, equipped with a binary solvent delivery system and an auto sampler. Chromatographic separation was carried out on an ACQUITY UPLCTM CSH C18 column (2.1×50 mm i.d.; 1.7 µm particle size) (Waters Co., MA, USA). The mobile phase consisted of water-formic acid (A; 100∶0.01, v/v) and acetonitrile-acetic acid (B; 100∶0.01, v/v). The gradient elution was as follows: 19–20% B at 0–6 min; 20–31% B at 6–8.5 min; 31–33% B at 8.5–11 min; 33–90% B at 11–17 min; 90% B at 17–19 min, and a 10 min re-equilibrium was conducted before the next injection. The column was maintained at 45°C with the flow rate of 0.35 ml/min. The detection wavelength was set at 203 nm. The injection volume was 5 µl.

CE Fingerprints of Panax notoginseng

The capillary electrophoresis method was according to the method as described [17], with some parameter adjustment. In this study, an HP3D capillary electrophoresis system (Agilent, Waldbronn, Germany) equipped with diode-array detector was used. Capillary electrophoresis was performed on a 80.0 cm (71.5 cm to the detector) ×75 µm I.D. fused silica capillary (Polymicro Technologies, USA). The detection wavelength was 195 nm and the temperature was 25°C. The separation voltage was controlled at −27 kV. The running buffer solution was prepared by mixing 5.0 ml 280 mM SDS, 1.0 ml 200 mM H3PO4 in water, 2.0 ml acetonitrile and 1.5 ml 2-propanol in a 10 ml volumetric flask and dilute with water to volume. All solutions were filtered through a 0.22 µm nylon membrane. The injection mode was pressure injection, 50 mbar for 10 seconds.

HPLC-MSn Analysis

Analysis was performed on an Agilent 1100 series LC system equipped with a Finnigan LCQ Deca XPplus ion trap mass spectrometer (Thermo Finnigan, USA) via an ESI interface. The chromatographic conditions were the same as the HPLC fingerprint method. The tune method for MS were as follows: collision gas, ultra high purity helium (He); nebulizing gas, high purity nitrogen (N2); the source voltage for positive and negative mode were 4.0 kV and −3.0 kV, respectively; sheath gas (N2) at a flow rate of 60 arbitrary units; auxiliary gas (N2) at a flow rate of 10 arbitrary units; capillary temperature, 350°C; capillary voltage for positive and negative mode were 19 V and −15 V, respectively. The collision energy for MSn spectra was 30%.

NIR Analysis

An Antaris MX FT-NIR spectrophotometer (Thermo-Fisher Co., Madison, USA) equipped with integrating sphere was used to collect the NIR spectra. According to the reported method with slight adaption [11]. The wave number range is 4000–10,000 cm−1. Each spectrum was measured with 4 cm−1 data interval and obtained by averaging 64 times.

Chromatographic Method Validation

Five main chemicals (notoginsenoside R1, ginsenoside Re, ginsenoside Rg1, ginsenoside Rb1 and ginsenoside Rd) were selected as markers for chromatographic method validation. The instrument precision was tested by six consecutive injections of a sample solution; the RSD was below 3%. The inter-day precision was determined by six replicate measurements of a sample, the RSD was less than 3%. The samples were stable for 24 h.

Data Analysis

All the chromatographic peaks were integrated and aligned according to our laboratory standard practice [18]. Firstly, the chromatographic peaks were integrated. Then, the results were introduced into Similarity Evaluation System for Chromatographic Fingerprint of Traditional Chinese Medicine (Version 2004A, National Committee of Pharmacopoeia, China). After aligning all the peaks, the reference chromatogram was generated by reserving peaks above 0.1% of the area percent. Profiles containing 53, 39 and 28 peaks were selected from UPLC, HPLC and CE, respectively (Detailed in Figure S1). The NIR spectra were pretreated with moving average and 1st derivative. The resulting data was imported to ArrayTrack software 3.4.5 (NCTR, USA) for cluster analysis. The MATLAB was used to perform PCA analysis. The SIMCA-P software 11.0 (Umetrics, Sweden) was used to perform PLS-DA analysis.

Results and Discussion

The traditional method of characterization is through comparison of HPLC spectra. As shown in Fig.2, for the plant parts for Panax notoginseng, the HPLC fingerprints appear to be very similar. However, since these spectra are highly complex and contain many classes of compounds, the comparison is often highly qualitative which can lead to missed features or unnecessarily tight requirements. We believe that use of chemometric techniques to analyze the spectra would provide a higher level of assurance that important characteristics are not overlooked, and provide consistency in the final botanical drug products. A similar approach has been successfully applied to the complex naturally-derived molecule of heparin to provide classification of pure and impure heparin [19], as well as quantification of heparin impurities [20].

Chromatographic Fingerprints of Panax notoginseng

The typical chromatograms generated for rhizomes and main roots from UPLC, HPLC and CE are shown in Fig.3 and Fig.4, respectively. UPLC has a number of advantages over the other chromatographic methods. UPLC utilized the least run time among the three methods. Due to its higher peak capacity and greater resolution, it identified the most chemical information while the analysis time is only 1/3 of analysis time of HPLC, and 1/2 of the analysis time of CE. UPLC also separated more components from the mixture than the other techniques, coming closest to the earlier published reports [21], [22]. To date over 50 saponins in Panax notoginseng [23] have been identified, which occur in small amounts and vary widely. The UPLC has a higher column efficiency as a result of advancements in the particle size which has made it possible to distinguish small peaks from the baseline noise. Another advantage of UPLC was its reduction in the consumption of mobile phase, which is more friendly to the environment and more economical. Due to the smaller size of packing particles in column, the samples need more carefully pretreating for UPLC methods.

Source: PLoS One. 2014; 9(1): e87462

Philadelphia, USA | Blainville, Canada | Cambridge, UK | Semengoh, Malaysia | Hangzhou, China