*Eur Phys J E Soft Matter ; 44(10): 129, 2021 Oct 18.*

##### RESUMO

Electrostatic interactions among colloidal particles are often described using the venerable (two-particle) Derjaguin-Landau-Verwey-Overbeek (DLVO) approximation and its various modifications. However, until the recent development of a many-body theory exact at the Debye-Hückel level (Yu in Phys Rev E 102:052404, 2020), it was difficult to assess the errors of such approximations and impossible to assess the role of many-body effects. By applying the exact Debye-Hückel level theory, we quantify the errors inherent to DLVO and the additional errors associated with replacing many-particle interactions by the sum of pairwise interactions (even when the latter are calculated exactly). In particular, we show that: (1) the DLVO approximation does not provide sufficient accuracy at shorter distances, especially when there is an asymmetry in charges and/or sizes of interacting dielectric spheres; (2) the pairwise approximation leads to significant errors at shorter distances and at large and moderate Debye lengths and also gets worse with increasing asymmetry in the size of the spheres or magnitude or placement of the charges. We also demonstrate that asymmetric dielectric screening, i.e., the enhanced repulsion between charged dielectric bodies immersed in media with high dielectric constant, is preserved in the presence of free ions in the medium.

*Front Cell Infect Microbiol ; 11: 634215, 2021.*

##### RESUMO

Bloodstream infections (BSIs), the presence of microorganisms in blood, are potentially serious conditions that can quickly develop into sepsis and life-threatening situations. When assessing proper treatment, rapid diagnosis is the key; besides clinical judgement performed by attending physicians, supporting microbiological tests typically are performed, often requiring microbial isolation and culturing steps, which increases the time required for confirming positive cases of BSI. The additional waiting time forces physicians to prescribe broad-spectrum antibiotics and empirically based treatments, before determining the precise cause of the disease. Thus, alternative and more rapid cultivation-independent methods are needed to improve clinical diagnostics, supporting prompt and accurate treatment and reducing the development of antibiotic resistance. In this study, a culture-independent workflow for pathogen detection and identification in blood samples was developed, using peptide biomarkers and applying bottom-up proteomics analyses, i.e., so-called "proteotyping". To demonstrate the feasibility of detection of blood infectious pathogens, using proteotyping, Escherichia coli and Staphylococcus aureus were included in the study, as the most prominent bacterial causes of bacteremia and sepsis, as well as Candida albicans, one of the most prominent causes of fungemia. Model systems including spiked negative blood samples, as well as positive blood cultures, without further culturing steps, were investigated. Furthermore, an experiment designed to determine the incubation time needed for correct identification of the infectious pathogens in blood cultures was performed. The results for the spiked negative blood samples showed that proteotyping was 100- to 1,000-fold more sensitive, in comparison with the MALDI-TOF MS-based approach. Furthermore, in the analyses of ten positive blood cultures each of E. coli and S. aureus, both the MALDI-TOF MS-based and proteotyping approaches were successful in the identification of E. coli, although only proteotyping could identify S. aureus correctly in all samples. Compared with the MALDI-TOF MS-based approaches, shotgun proteotyping demonstrated higher sensitivity and accuracy, and required significantly shorter incubation time before detection and identification of the correct pathogen could be accomplished.

##### Assuntos

Bacteriemia , Infecções Estafilocócicas , Bacteriemia/diagnóstico , Candida albicans , Escherichia coli , Humanos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Infecções Estafilocócicas/diagnóstico , Staphylococcus aureus*J Proteome Res ; 20(3): 1476-1487, 2021 03 05.*

##### RESUMO

Simple light isotope metabolic labeling (SLIM labeling) is an innovative method to quantify variations in the proteome based on an original in vivo labeling strategy. Heterotrophic cells grown in U-[12C] as the sole source of carbon synthesize U-[12C]-amino acids, which are incorporated into proteins, giving rise to U-[12C]-proteins. This results in a large increase in the intensity of the monoisotope ion of peptides and proteins, thus allowing higher identification scores and protein sequence coverage in mass spectrometry experiments. This method, initially developed for signal processing and quantification of the incorporation rate of 12C into peptides, was based on a multistep process that was difficult to implement for many laboratories. To overcome these limitations, we developed a new theoretical background to analyze bottom-up proteomics data using SLIM-labeling (bSLIM) and established simple procedures based on open-source software, using dedicated OpenMS modules, and embedded R scripts to process the bSLIM experimental data. These new tools allow computation of both the 12C abundance in peptides to follow the kinetics of protein labeling and the molar fraction of unlabeled and 12C-labeled peptides in multiplexing experiments to determine the relative abundance of proteins extracted under different biological conditions. They also make it possible to consider incomplete 12C labeling, such as that observed in cells with nutritional requirements for nonlabeled amino acids. These tools were validated on an experimental dataset produced using various yeast strains of Saccharomyces cerevisiae and growth conditions. The workflows are built on the implementation of appropriate calculation modules in a KNIME working environment. These new integrated tools provide a convenient framework for the wider use of the SLIM-labeling strategy.

##### Assuntos

Proteoma , Proteômica , Sequência de Aminoácidos , Marcação por Isótopo , Espectrometria de Massas*J Am Soc Mass Spectrom ; 31(1): 85-102, 2020 01 02.*

##### RESUMO

Rapid and accurate identification of microorganisms and estimation of their biomasses are of extreme importance to public health. Mass spectrometry has become an important technique for these purposes. Previously we published a workflow named Microorganism Classification and Identification (MiCId v.12.26.2017) that was shown to perform no worse than other workflows. This manuscript presents MiCId v.12.13.2018 that, in comparison with the earlier version v.12.26.2017, allows for biomass estimates, provides more accurate microorganism identifications (better controls the number of false positives), and is robust against database size increase. This significant advance is made possible by several new ingredients introduced: first, we apply a modified expectation-maximization method to compute for each taxon considered a prior probability, which can be used for biomass estimate; second, we introduce a new concept called ownership, through which the participation ratio is computed and use it as the number of taxa to be kept within a cluster of closely related taxa; third, based on confidently identified peptides, we calculate for each taxon its degree of independence from the rest of taxa considered to determine whether or not to split this taxon off the cluster. Using 270 data files, each containing a large number of MS/MS spectra, we show that, in comparison with v.12.26.2017, version v.12.13.2018 yields superior retrieval results. We also show that MiCId v.12.13.2018 can estimate species biomass reasonably well. The new MiCId v.12.13.2018, designed to run in Linux environment, is freely available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.

*Phys Rev E ; 100(1-1): 012401, 2019 Jul.*

##### RESUMO

Given the crucial role of electrostatic forces in biological systems, accurate and rapid calculations of electrostatic forces are imperative in faithfully simulating biological systems. More than a decade ago, we proposed a surface charge method, applied it to a system of an arbitrary number of charged dielectric spheres, and obtained an exact solution for arbitrary configuration of the spheres. The precision depends only on the number of terms kept in a series expansion, and can therefore be controlled at will. However, the numerical implementation can be significantly slowed down by the need to compute a large number of Wigner rotation matrix elements each time the electrostatic energy is calculated. In this paper, we provide the proof of a formula introduced in 1992 and apply it to arrive at a useful closed form for the electrostatic interaction energy without computing any Wigner rotation matrix elements, hence significantly improving the efficiency for numeric implementation of the rigorous surface charge method. This new Wigner-matrix-free formalism may also be used to speed up the computation of the electrostatic energy in the presence of permanent and induced multipoles, which can be very important for atomic modeling with polarization effect included.

##### Assuntos

Rotação , Eletricidade Estática , Impedância Elétrica , Propriedades de Superfície*PLoS One ; 14(8): e0220742, 2019.*

##### RESUMO

Reprogramming of somatic cells to induced pluripotent stem cells, by overexpressing certain factors referred to as the reprogramming factors, can revolutionize regenerative medicine. To provide a coherent description of induced pluripotency from the gene regulation perspective, we use 35 microarray datasets to construct a reprogramming gene regulatory network. Comprising 276 nodes and 4471 links, the resulting network is, to the best of our knowledge, the largest gene regulatory network constructed for human fibroblast reprogramming and it is the only one built using a large number of experimental datasets. To build the network, a model that relates the expression profiles of the initial (fibroblast) and final (induced pluripotent stem cell) states is proposed and the model parameters (link strengths) are fitted using the experimental data. Twenty nine additional experimental datasets are collectively used to test the model/network, and good agreement between experimental and predicted gene expression profiles is found. We show that the model in conjunction with the constructed network can make useful predictions. For example, we demonstrate that our approach can incorporate the effect of reprogramming factor stoichiometry and that its predictions are consistent with the experimentally observed trends in reprogramming efficiency when the stoichiometric ratios vary. Using our model/network, we also suggest new (not used in training of the model) candidate sets of reprogramming factors, many of which have already been experimentally verified. These results suggest our model/network can potentially be used in devising new recipes for induced pluripotency with higher efficiencies. Additionally, we classify the links of the network into three classes of different importance, prioritizing them for experimental verification. We show that many of the links in the top ranked class are experimentally known to be important in reprogramming. Finally, comparing with other methods, we show that using our model is advantageous.

##### Assuntos

Reprogramação Celular/fisiologia , Fibroblastos/metabolismo , Redes Reguladoras de Genes , Células-Tronco Pluripotentes Induzidas/metabolismo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Humanos*Proteomics ; 19(14): e1800367, 2019 07.*

##### RESUMO

Mass spectrometry-based proteomics starts with identifications of peptides and proteins, which provide the bases for forming the next-level hypotheses whose "validations" are often employed for forming even higher level hypotheses and so forth. Scientifically meaningful conclusions are thus attainable only if the number of falsely identified peptides/proteins is accurately controlled. For this reason, RAId continued to be developed in the past decade. RAId employs rigorous statistics for peptides/proteins identification, hence assigning accurate P-values/E-values that can be used confidently to control the number of falsely identified peptides and proteins. The RAId web service is a versatile tool built to identify peptides and proteins from tandem mass spectrometry data. Not only recognizing various spectra file formats, the web service also allows four peptide scoring functions and choice of three statistical methods for assigning P-values/E-values to identified peptides. Users may upload their own protein database or use one of the available knowledge integrated organismal databases that contain annotated information such as single amino acid polymorphisms, post-translational modifications, and their disease associations. The web service also provides a friendly interface to display, sort using different criteria, and download the identified peptides and proteins. RAId web service is freely available at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/raid.

##### Assuntos

Bases de Dados de Proteínas , Espectrometria de Massas/métodos , Proteômica/métodos , Biologia Computacional*Cell Rep ; 26(10): 2580-2592.e7, 2019 03 05.*

##### RESUMO

Efficiency of reprogramming of human cells into induced pluripotent stem cells (iPSCs) has remained low. We report that individual adult human CD49f+ long-term hematopoietic stem cells (LT-HSCs) can be reprogrammed into iPSCs at close to 50% efficiency using Sendai virus transduction. This exquisite sensitivity to reprogramming is specific to LT-HSCs, since it progressively decreases in committed progenitors. LT-HSC reprogramming can follow multiple paths and is most efficient when transduction is performed after the cells have exited G0. Sequencing of 75 paired skin fibroblasts/LT-HSC samples collected from nine individuals revealed that LT-HSCs contain a lower load of somatic single-nucleotide variants (SNVs) and indels than skin fibroblasts and accumulate about 12 SNVs/year. Mutation analysis revealed that LT-HSCs and fibroblasts have very different somatic mutation signatures and that somatic mutations in iPSCs generally exist prior to reprogramming. LT-HSCs may become the preferred cell source for the production of clinical-grade iPSCs.

##### Assuntos

Células-Tronco Hematopoéticas/metabolismo , Células-Tronco Pluripotentes Induzidas/metabolismo , Adolescente , Adulto , Reprogramação Celular , Feminino , Voluntários Saudáveis , Humanos , Masculino , Pessoa de Meia-Idade , Adulto Jovem*Phys Rev Lett ; 121(18): 185505, 2018 Nov 02.*

##### RESUMO

Thermal expansion of H_{2}O and D_{2}O ice Ih with relative resolution of 1 ppb is reported. A large transition in the thermal expansion coefficient at 101 K in H_{2}O moves to 125 K in D_{2}O, revealing one of the largest-known isotope effects. Rotational oscillatory modes that couple poorly to phonons, i.e., lattice solitons, may be responsible.

*J Phys Condens Matter ; 30(43): 435801, 2018 Oct 31.*

##### RESUMO

Quantum spin chains with composite spins have been used to approximate conventional chains with higher spins. For instance, a spin 1 (or [Formula: see text]) chain was sometimes approximated by a chain with two (or three) spin [Formula: see text]'s per site. However, little examination has been given as to whether this approximation, effectively assuming the first Hund rule per site, is valid and why. In this paper, the validity of this approximation is investigated numerically. We diagonalize the Hamiltonians of spin chains with a spin 1 and [Formula: see text] per site and with two and three spin [Formula: see text]'s per site. The low energy excitation spectrum for the spin chain with M spin [Formula: see text]'s per site is found to coincide with that of the corresponding conventional chain with one spin [Formula: see text] per site. In particular, we find that as the system size increases, an increasingly larger block of consecutive lowest energy states with maximal spin per site is observed, robustly supporting the first Hund rule even though the exclusion principle does not apply and the system does not possess Coulomb repulsion. As for why this approximation works, we show that this effective Hund rule emerges as a plausible consequence when applying to composite spin systems the Lieb-Mattis theorem, which is originally for the ground state of ferrimagnetic and antiferromagnetic spin systems.

*J Am Soc Mass Spectrom ; 29(8): 1721-1737, 2018 Aug.*

##### RESUMO

Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract á .

*BMC Res Notes ; 11(1): 182, 2018 Mar 15.*

##### RESUMO

OBJECTIVE: RAId is a software package that has been actively developed for the past 10 years for computationally and visually analyzing MS/MS data. Founded on rigorous statistical methods, RAId's core program computes accurate E-values for peptides and proteins identified during database searches. Making this robust tool readily accessible for the proteomics community by developing a graphical user interface (GUI) is our main goal here. RESULTS: We have constructed a graphical user interface to facilitate the use of RAId on users' local machines. Written in Java, RAId_GUI not only makes easy executions of RAId but also provides tools for data/spectra visualization, MS-product analysis, molecular isotopic distribution analysis, and graphing the retrieval versus the proportion of false discoveries. The results viewer displays and allows the users to download the analyses results. Both the knowledge-integrated organismal databases and the code package (containing source code, the graphical user interface, and a user manual) are available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/raid.html .

##### Assuntos

Biologia Computacional/métodos , Proteoma/análise , Proteômica/métodos , Software , Interface Usuário-Computador , Bases de Dados de Proteínas , Humanos , Internet , Espectrometria de Massas em Tandem/métodos*J Phys Condens Matter ; 30(10): 105003, 2018 03 14.*

##### RESUMO

Parametrizing a curved surface with flat triangles in electrostatics problems creates a diverging electric field. One way to avoid this is to have curved areal elements. However, charge density integration over curved patches appears difficult. This paper, dealing with spherical triangles, is the first in a series aiming to solve this problem. Here, we lay the ground work for employing curved patches for applying the surface charge method to electrostatics. We show analytically how one may control the accuracy by expanding in powers of the the arc length (multiplied by the curvature). To accommodate not extremely small curved areal elements, we have provided enough details to include higher order corrections that are needed for better accuracy when slightly larger surface elements are used.

##### Assuntos

Eletricidade Estática , Eletricidade , Modelos Teóricos , Propriedades de Superfície*Eur J Phys ; 39(6)2018.*

##### RESUMO

A simple and easy to implement method for improving the convergence of a power series is presented. We observe that the most obvious or analytically convenient point about which to make a series expansion is not always the most computationally efficient. Series convergence can be dramatically improved by choosing the center of the series expansion to be at or near the average value at which the series is to be evaluated. For illustration, we apply this method to the well-known simple pendulum and to the Mexican hat type of potential. Large performance gains are demonstrated. While the method is not always the most computationally efficient on its own, it is effective, straightforward, quite general, and can be used in combination with other methods.

*Phys Rev E ; 96(6-1): 062414, 2017 Dec.*

##### RESUMO

A previously developed classical model of electrostatic interactions, based on a formalism of dielectric spheres, which has been found to have surprising accuracy for S state atoms, is extended by allowing higher-order moments of the intrinsic charge distribution. Two methods to introduce the charge distribution (point moments at the center vs surface charge) are shown to be equivalent and are compared with another common model for polarizable atoms that utilizes polarizable point dipoles. Unlike the polarizable point dipole model, the polarizable spheres models do not suffer from a divergence at small separation of atoms and are easily generalized to higher multipoles.

##### Assuntos

Modelos Moleculares , Eletricidade Estática , Modelos Biológicos*Bioinformatics ; 32(17): 2642-9, 2016 09 01.*

##### RESUMO

MOTIVATION: There is a growing trend for biomedical researchers to extract evidence and draw conclusions from mass spectrometry based proteomics experiments, the cornerstone of which is peptide identification. Inaccurate assignments of peptide identification confidence thus may have far-reaching and adverse consequences. Although some peptide identification methods report accurate statistics, they have been limited to certain types of scoring function. The extreme value statistics based method, while more general in the scoring functions it allows, demands accurate parameter estimates and requires, at least in its original design, excessive computational resources. Improving the parameter estimate accuracy and reducing the computational cost for this method has two advantages: it provides another feasible route to accurate significance assessment, and it could provide reliable statistics for scoring functions yet to be developed. RESULTS: We have formulated and implemented an efficient algorithm for calculating the extreme value statistics for peptide identification applicable to various scoring functions, bypassing the need for searching large random databases. AVAILABILITY AND IMPLEMENTATION: The source code, implemented in C ++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit CONTACT: yyu@ncbi.nlm.nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

##### Assuntos

Algoritmos , Espectrometria de Massas , Peptídeos , Proteômica , Bases de Dados de Proteínas , Humanos , Espectrometria de Massas em Tandem*J Rare Dis Res Treat ; 1(3): 1-4, 2016.*

##### RESUMO

In recent years several methods have been proposed to assign pairwise mechanism- based similarity scores to human diseases. Despite their differences in approach and performance, these methods work in a somewhat similar manner: first a set of biomolecules (genes, proteins, chemicals, etc.) is associated with each disease, and then a measure is defined to calculate the similarity between the sets assigned to a pair of diseases. Since the similarity score between two diseases is defined based on the underlying molecular processes, a high score may hint at a shared cause, and therefore a similar treatment, for both diseases. This is of great practical importance especially when a rare or newly-discovered disease, for which limited information is available, is found to be related to a disease with a known treatment. Thus, in this mini-review we briefly discuss the recently developed methods for computing mechanism-based disease- disease similarities.

*Europhys Lett ; 116(2)2016.*

##### RESUMO

We calculate the polarization portion of electrostatic interactions at the atomic scale using quantum mechanical methods such as density functional theories (DFT) and the coupled cluster approach, and using classical methods such as a surface charge method and a polarizable force field. The agreement among various methods is investigated. Using the coupled clusters method CCSD(T) with large basis sets as the reference, we find that for systems comprising two to six atoms and ions in S-states the classical surface charge method performs much better than commonly used DFT methods with moderate basis sets such as B3LYP/6-31G(d,p). The remarkable performance of the classical approach comes as a surprise. The present results indicate that the use of a rigorous formalism of classical electrostatics can be better justified for determining molecular interactions at intermediate distances than some of the widely used methods of quantum chemistry. PACS numbers: 41.20.Cv,32.10.Dk, 87.10.Tf.

*J Am Soc Mass Spectrom ; 27(2): 194-210, 2016 Feb.*

##### RESUMO

Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple 'fingerprinting'; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract á .

##### Assuntos

Bactérias/classificação , Espectrometria de Massas em Tandem/métodos , Espectrometria de Massas em Tandem/estatística & dados numéricos , Bactérias/química , Bases de Dados Factuais , Escherichia coli/classificação , Peptídeos/análise , Peptídeos/química , Pseudomonas aeruginosa/classificação , Software*Phys Rev E Stat Nonlin Soft Matter Phys ; 92(5): 052803, 2015.*

##### RESUMO

A general model for random walks (RWs) on networks is proposed. It incorporates damping and time-dependent links, and it includes standard (undamped, noninteracting) RWs (SRWs), coalescing RWs, and coalescing-branching RWs as special cases. The exact, time-dependent solutions for the average numbers of visits (w) to nodes and their fluctuations (σ2) are given, and the long-term σ-w relation is studied. Although σ â w(1/2) for SRWs, this power law can be fragile when coalescing-branching interaction is present. Damping, however, often strengthens it but with an exponent generally different from 1/2.