Ideas and summing up of MySQL, SQLite and PostgreSQL in special applications
a careless long, this article mainly uses ideas and summaries about MySQL, SQLite and PostgreSQL in my special applications. MySQL is part of last month's practice. PostgreSQL and non database solutions are my experience these days.
`
this article tries to compare the advantages and disadvantages of MySQL memory database and SQLite database under specific application, MySQL general database. It was so stupid that I was given up. After that, PostgreSQL and non database solutions were simply added.
`
the conclusion of this article is the non database solution.Gt; PostgreSQL.gt; SQLite~MySQL memory scheme, P in all aspects should be better than S, but the gap is not large, and S and M have the advantages and disadvantages of each other, and in this application I prefer S.
`
0. originated
before the first time, she queried MySQL Select Limit performance. In addition to some test data, most of the complaints are actually just the following fact: MySQL's order by statements do not use index; limit paging is poor; so MySQL has a poor overall performance.
`
I personally highly praised SQLite for this data test, but last month there were several accidents that made me more dissatisfied with SQLite.
`
it is this, and after the establishment of SimpleCD web site, it has an unexpected effect: VeryCD has been increasingly harmonized, censored, films, and TV dramas since the end of last year's radio license, and SimpleCD has become an unexpected rescue team, and the equivalent of SimpleCD. Most of the resource indexes have been saved, if VeryCD becomes "all age edition" (error), SimpleCD and so on if changed into "18X version". For example, the two days of the two days of the site near the flow of the "2010 Spring Festival Gala" is brought, because VC has deleted the Spring Festival Gala resources, only to send a few small items. Other examples are 2012, avatar and so on.
`
anyway, since SimpleCD, since the speed of access and search speed is much faster, and the results of the search directly contain the results of VeryCD, I find the resources of the donkey directly to SimpleCD instead of VeryCD. But at that time SimpleCD did not have a picture, the resources were not commented, the interface was ugly (now it's not where to go.), often found or had to be chain to VeryCD to see comments and introductions. So naturally, I plan to improve SimpleCD.
`
- - it's a long story, and again, it's not advertising, it's not advertising. Again, the part of the origin is half - -
`
finally speaking of the problem, in the reversion, there are some problems, the most serious is the database damage error. At that time, the flow of SimpleCD was probably 5000PV every day, plus a lot of people using the search function, in fact, it would lock the database, not to write; then the backstage crawler process is constantly updating the data, and it will lock the database; most seriously, we need to join the new work energy, so we need to catch some numbers that have not been caught before. According to this, the reptile needs to write the database almost always, which is the most important reason for the tragedy.
`
these processes are all multithreading, SQLite will inevitably have a long time in the lock state, I am dead and I am debugging SimpleCD to release the resources of the code, the need to unlock, I very hooligan to kill the lock process. Several times after the tragedy, the web page was paralyzed, showing database is malformed. Fortunately, there was a backup, so I resumed it, grabbed it for a second, and got it done. After a while, it's tragic again. Come again.
`
more and more tragedies, which led me to finally write a solve_mal.py script, every time I got out of the question: ah La, Tea Time, and then very smart to enter Python solve_mal.py, and hang out with tea.
`
as a matter of fact, SQLite has a better concurrency than I imagined. If not to upgrade the database, this problem should not appear even in the case of 50WPV, far better than I had expected. This problem should be more like a bug on a processing mechanism. Because even if the process is terminated, the database should not be written bad, it is not a problem of concurrency, but a serious problem of data security, imagine what if it is not to kill the process, but what to do? Will the database be damaged? Obviously, SQLite designers could not expect to have the database program running in the background every half hour to kill the reopened metamorphosis, or not to take into account the data protection of the power off, so it was negligent.
`
this is so dissatisfied with SQLite, plus try WSGI (because I don't know why FCGI takes up a lot of my memory, and spawn-fcgi leaks all day long, and I want kill to drop spawn-fcgi every other hour and re spawn, or the most exaggerated time they're going to eat less than 4 hours. " Drop all the memory to make the website paralyzed. And, regardless of the mod_wsgi of nginx or the mod_ of Apache, there are odd bug for WSGI web.py SQLite, and the SQLite database can only read and can't write. Lite database only read strange things, so it is temporarily put aside.
`
it's so annoying the low performance of MySQL, but that's not because my data structure is bad, and it's added to suspect SQLite "cheating" to use memory to get high performance. MySQL is a background program, which is basically not closed for hundreds of years. Do I have to generate a search dynamic memory database at initialization run?
`
above, the end of the origin.
`
1.MySQL database structure optimization
to make MySQL version of SimpleCD running, big surgery is essential. First of all, we need to optimize the data structure. Secondly, we need to use the memory database. At this time, I made a guess.
`
conjecture 1: MySQL automatically compresses text type, which is the reason why search performance is low.
`
read a bit of MySQL on and on. First of all, complain about his document. This is the most messy document I've ever seen. It's not easy to do this at this point. See the document is very tired, often read the document confused, linked links to completely unrelated places, have to Google to other people's blog to see the experience of others. I don't want to be as good as a python document. Do you do the same thing as your SQLite document? Damn it。
`
originally this conjecture can be verified by the official web inspection document, but I did not find the relevant content in a half day circle in the document area, so there is no way, or continue to guess, when it comes to compression, because MySQL has a lot of data types, what CHAR, VARCHAR; and SQLite is. Only text, so I didn't realize in SQLite that different text types might be different. (please give my best to the ignorance of the database, I thought that the data type of the database was only null, text, integer, real, blob, like SQLite). OK, then the improvement is to change the TEXT type, such as the title, to the VARCHAR type
`
guess two: MySQL is not indexed because of the data type, and MySQL thinks that the designer of the database will not be stupid to use the text type to do order by, because they wouldn't have thought someone would write the date as a text. Save instead of the specialized date type or int/real type.
`
555, SQLite inside there is no date type ah, I was also thinking about whether to turn int, later feel turned to int memory to check mktime, strptime, gmtime and a heap of the use of a mess of functions, and turn to go very troublesome. Add to the feeling that the amount of data on the 20W, the two forked tree is 10 times to compare the thing, 10 multiple integer comparison can be more than 10 times the length of the length of 20 faster than how many seconds? I'm afraid I'll have to write it in nanosecond. Although I often haggle over this one when I write a program, I am really too lazy to spend time optimizing this in the obvious application of IO-bound. I do not know whether I think too much or MySQL think too much, but the result is that MySQL is likely to default to text to do order by is not indexed, MySQL is to catch all the opportunities to use filesort, how low efficiency how to come.
`
conjecture three: MySQL because my data tables are too bloated and not all loaded into memory.
`
because I am on a table, and the general design is a big block to put another table, what title, category, and so on can be put in a table with the number of characters of the char limit, so the size of the table used as the index display is small, and the table is obviously a lot of benefits, such as To allow more frequent reading and writing, etc. (because the number of data to be written less, then the time of the lock can be significantly reduced) ashamed, I began to design as lazy and simple, without thinking about the size of the problem, so there is this problem. But it's okay, now it's the database again, and then you want to get a memory table, then you MySQL even if it's really filesort magic, you are in the memory filesort, can't escape my five fingers.
`
all changes: `
analysis for half a day, pain, and finally I finalized the change scheme, that is, the changes in the data type of the memory table as described above.