I have read recently an interesting paper
"A New Approach for Identifying Manipulated Online Reviews using Decision Tree"
Rajashree S. Jadhav , Prof. Deipali V. Gore,
Department of Computer Engineering, Pune University, Shivaji Nagar, Pune, India.
This paper introduces eight potential factors for identifying manipulated reviews using correlation analysis and extracted knowledge rules.
1. Text Difficulty:
Following formula gives text difficulty of the sentence.
206.835 – (1.015 x ASL) - (84.6 x ASW)
ASL= Average Sentence Length
AWS= Average number of syllables per word
2. TTR(Type-Token Ratio):
Type-Token Ratio (TTR) is another index to measure the readability of one text comment.
TTR = Types / Tokens
“Tokens” is the number of individual words in the text and
“Types” is the number word types in text comment.
According to research, length of sentence also affects the readability of the sentence.
4. Positive sentiment:
Positive comment has great impact on user. If user really wants to purchase the product, he simply ignores the negative comment.
5. Negative Sentiment
Researchers have shown that negative reviews increase the sales as compare to the products which haven’t discussed.
Sentiment affects the behavior of customer, no matter it is positive or negative.
7. Product Characteristics:
Authors of manipulated reviews try to focus on the product specifications. Hence there will be more number of product characteristic and specification will be mentioned in the comment.
If a comment or review contains too many terms of domain knowledge then that comment is treated as expertise.