I was just trying to create an algorithm to solve Sudoku puzzles but it's not working out too well. So an algorithm is kind of like a cooking recipe a series of steps we follow in order to achieve a particular goal We've created efficient algorithms as solutions to so many problems But what I really want to know is is there a way to create an algorithm for something more abstract? That question is the core of the P versus NP problem So firstly computer scientists try to group problems based on how difficult they are to solve The easy problems are categorized into the P class. You need to try many before knowing the answer Solving problems through this way of guessing works when the problem is small But as it gets larger the options you need to try grow exponentially and to get every answer right on the first go you'd need to somehow be extremely lucky or engineer luck Checking a solved Sudoku grid is easy See if every column and row contain exactly one instant of the numbers 1 to 9 This sums up the P versus NP problem, can we somehow convert difficult NP problems into P problems we can solve efficiently if yes, how would we even do that?
These are problems that are easily solvable and whose answers are easily recognized like multiplication Computers can easily multiply two very big numbers in seconds Even if the numbers being multiplied grow exponentially the solving time does not in fact P stands for polynomial time meaning the solving time increases as a polynomial function of the problem To check an answer you just compare it to the correct solution Some harder problems are i n NP they're hard to solve but their answers are easily checked NP stands for non deterministic polynomial time Np-complete or NPC problems are the hardest problems in this class Non-determinism just means you can't find an answer without trial and error. let me explain with an analogy a person that can solve a math problem at certain difficulty can solve any problem below that difficulty similarly, if an algorithm can efficiently solve an NPC problem every NP problem below it in terms of complexity can be solved efficiently by a similar algorithm pushing NP into P We just don't know if such an algorithm exists or not The Clay Institute of Mathematics will award you with a million dollars if you do have a definite answer though So what are the implications if P is equal to NP well, we suddenly get answers to problems we've considered too difficult to solve overnight Protein folding becomes easier to understand helping us cure cancer Mathematicians and scientists become redundant because making breakthroughs is no longer a function of luck or creativity But following an algorithm that anyone could do Is that a good or bad thing?
For example, if similar UMIs appear in transcript-disjoint equivalence classes (even if all of the transcripts labeling both classes belong to the same gene), then they The one small thing? the authors never checked whether the claim at the end, namely that “accounting for such cases is especially important”, is actually true.
In our paper “Modular and efficient pre-processing of single-cell RNA-seq” we checked.
They introduce the notion of “monochromatic arborescences” on a graph, where these objects correspond to what is, in the language of the previous post, elements of the set .
They explain that the combinatorial optimization formulation of UMI collapsing in this framework is to find a minimum cardinality covering of a certain graph by monochromatic arboresences.One of the computational genomics areas where an NP-complete formulation for a key problem was recently proposed is in single-cell RNA-seq pre-processing.After RNA molecules are captured from cells, they are amplified by PCR, and it is possible, in principle, to account for the PCR duplicates of the molecules by making use of unique molecular identifiers (UMIs).2019 (boldface and strikethrough is mine): …gene-level deduplication provides a conservative approach and assumes that it is highly unlikely for molecules that are distinct transcripts of the same gene to be tagged with a similar UMI (within an edit distance of 1 from another UMI from the same gene).However, entirely discarding transcript-level information will mask true UMI collisions to some degree, even when there is direct evidence that similar UMIs must have arisen from distinct transcripts.The naïve algorithm not only suffices, it is sensible to apply it.And the great thing about naïve collapsing is that it’s straightforward to implement and run; the algorithm is linear. question of what is the “minimum number of UMIs, along with their counts, required to explain the set of mapped reads” is a precise, but wrong question.As for UMI collapsing, the naïve algorithm has been used for almost every experiment to date as it is the method that was implemented in the Cell Ranger software, and subsequently adopted in other software packages.This was done without any consideration of whether it is appropriate. paper shows, intuition is not to be relied upon, but fortunately, in this case, the naïve approach is the right one.The gist of the talk was summarized by Tse as follows: “In computational genomics there’s been a lot of problems where the formulation is combinatorial optimization.Usually they come from some maximum likelihood formulation of some inference problem and those problems end up being mostly NP-hard.